Musing 120: Robin: A multi-agent system for automating scientific discovery
Paper out of Oxford and FutureHouse, San Francisco
Today’s paper: Robin: A multi-agent system for automating scientific discovery. Ghareeb et al. 19 May 2025. https://arxiv.org/pdf/2505.13400
I’ve covered the topic of AI-assisted science several times in this substack, but no coverage on this topic is too much coverage, especially with the exciting advancements on the horizons. Today’s paper looks at an important application in medicine: the repurposing of existing drugs for new indications. The history of drug repurposing often shows a pattern: while insights often existed in scientific literature, only after a significant lag did that knowledge crystallize into a new treatment. For example, dabrafenib, an inhibitor of BRAF kinase that is used in various cancers with mitogenic mutations in BRAF, is being repurposed to prevent hearing loss. While its molecular action was well characterized by 2010, dabrafenib’s otoprotective effects were only discovered 10 years later via unbiased high-throughput screening.
Such delays are clearly unacceptable if we can use modern AI to do something about them. Trained on data across many fields, large language models (LLMs) are able to store and recall information on a wide variety of scientific topics and thus transcend the limitations of individual human knowledge. Scientific discovery is driven by the iterative process of background research, hypothesis generation, experimentation, and data analysis. The authors present Robin, the first multi-agent system capable of fully automating the key intellectual steps of the scientific process.
Robin integrates multiple language agents in a structured workflow to generate therapeutic candidates for a given disease (Figure 1A,B). Crow and Falcon are literature search agents based on PaperQA2 that conduct concise and deep literature summaries, respectively. PaperQA2 achieves expert-level performance in information retrieval and summarization, with access to scientific literature, clinical trial reports, and the Open Targets Platform. Finch is a scientific data analysis agent that performs analyses of experimental data from assays, such as RNA-seq and flow cytometry (Figure 1C) . By coordinating these agents to identify novel therapeutics, Robin enables an experimentally-guided system that drives the process of scientific discovery.
When provided with a disease name, Robin formulates a series of general questions about the disease pathology and queries Crow to answer each question. Using the reports from Crow as context, Robin next identifies 10 potential causal disease mechanisms. For each mechanism, Robin again deploys Crow to prepare a detailed report describing an in vitro model of the disease mechanism and corresponding assay that can be used to test drug efficacy. Robin uses an LLM judge to make pairwise comparisons between reports, which are used to calculate their relative rankings. The top-ranked in vitro model is used by Robin to define the experimental strategy for therapeutic candidate hypothesis generation.
The authors applied Robin to generate therapeutic candidates for dAMD as an initial proof-of-concept (Figure 2A above). Robin began the therapeutic hypothesis generation workflow by identifying and reviewing 151 papers to propose ten biologically relevant dAMD mechanisms to assay. After ranking the disease mechanisms and corresponding experimental strategies, Robin proposed treating dAMD by increasing RPE cell phagocytosis, and suggested testing how well drugs increase the phagocytic capacity of either patient-derived RPE cells or ARPE-19 cells in a flow cytometry assay. Next, it deployed Crow to conduct a literature review of about 400 papers about RPE phagocytosis and the therapeutic landscape of dry AMD and synthesized the results to propose 30 existing drug candidates for experimental testing in the phagocytosis assay. Then, it called Falcon to produce comprehensive evaluation reports on each of these molecules, which were ranked in an LLM-judged tournament.
Ultimately, the results shown in the proof of concept study in Figure 2 were confirmed by a human analysis of the same data. Preclinical models have demonstrated that Y-27632 can restore phagocytic efficiency in RPE cells, confirming Robin’s literature-based rationale for suggesting this candidate. In Figure 3 below, Robin recommends RNA sequencing of Y-27632-treated RPE cells to investigate the transcriptional effects of ROCK inhibition. The results suggest that Y-27632 enhances the initial uptake phase of phagocytosis through cytoskeletal rearrangements and promotes clearance of internalized material through transcriptional regulation of autophagy. Further work is necessary to confirm whether these changes are specific to Y-27632 or are generalizable to any intervention that significantly increases phagocytic capacity.
That’s a lot of biology, but to make the point: the mechanistic insights, derived from experiments proposed by Robin and analyzed by Finch, demonstrate how AI-driven scientific discovery can not only identify effective therapeutic compounds but also reveal novel molecular targets within disease pathways that might have otherwise remained unexplored. Figure 4 below further buttresses this point.
In closing this musing, Robin seems to be capable of automated hypothesis generation and experimental data analysis for scientific discovery, at least in the practical domain of drug repurposing. When tasked with identifying novel therapeutics for dAMD, Robin proposed enhancing RPE cell phagocytosis using ROCK inhibitors, and discovered ripasudil as the most potent enhancer of RPE phagocytosis among tested compounds through an iterative lab-in-the-loop discovery cycle. Ripasudil’s established safety profile and clinical approval for ocular use present a promising drug repurposing opportunity that could significantly accelerate the development pathway for dAMD treatment.
Robin addresses a broader challenge in therapeutic development. With FDA approvals stagnating at approximately 50 novel drugs annually over the past decade, new approaches to scale therapeutic discovery are urgently needed. As the first system to automate both literature-grounded hypothesis generation and experimental data analysis, Robin is poised to accelerate the pace of drug discovery compared to traditional approaches.
The authors state that future iterations of Robin will aim to provide detailed methodologies that require minimal human translation for laboratory execution. The Finch data analysis agent is also heavily reliant on prompt engineering by domain experts to produce reliable analytical results. Adapting Finch to independently generate or adapt prompts to specific data modalities would enable a more autonomous discovery pipeline. This is where the authors are focused next.