Treffer: Accelerating primer design for amplicon sequencing using large language model-powered agents.

Title:

Accelerating primer design for amplicon sequencing using large language model-powered agents.

Authors:

Wang Y; MGI Tech, Shenzhen, China., Hou Y; MGI Tech, Shenzhen, China., Yang L; MGI Tech, Shenzhen, China.; College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China., Li S; αLab AI department, MGI Tech R&D, Hongkong, China., Tang W; MGI Tech, Shenzhen, China., Tang H; MGI Tech, Shenzhen, China., He Q; MGI Tech, Shenzhen, China., Lin S; MGI Tech, Shenzhen, China.; αLab AI department, MGI Tech R&D, Hongkong, China., Zhang Y; MGI Tech, Shenzhen, China., Li X; MGI Tech, Shenzhen, China., Chen S; MGI Tech, Shenzhen, China., Huang Y; MGI Tech, Shenzhen, China., Kong L; MGI Tech, Shenzhen, China., Zhang H; MGI Tech, Shenzhen, China., Yu D; MGI Tech, Shenzhen, China.; αLab AI department, MGI Tech R&D, Hongkong, China., Mu F; MGI Tech, Shenzhen, China., Yang H; BGI, Shenzhen, China.; Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, China.; James D. Watson Institute of Genome Sciences, Hangzhou, China., Wang J; MGI Tech, Shenzhen, China.; BGI, Shenzhen, China., Hirankarn N; Graduate Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand. nattiyap@gmail.com., Yang M; MGI Tech, Shenzhen, China. yangmeng1@mgi-tech.com.; αLab AI department, MGI Tech R&D, Hongkong, China. yangmeng1@mgi-tech.com.; Graduate Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand. yangmeng1@mgi-tech.com.

Source:

Nature biomedical engineering [Nat Biomed Eng] 2026 Feb; Vol. 10 (2), pp. 338-353. Date of Electronic Publication: 2025 Jul 30.

Publication Type:

Journal Article

Language:

English

Journal Info:

Publisher: Springer Nature Country of Publication: England NLM ID: 101696896 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 2157-846X (Electronic) Linking ISSN: 2157846X NLM ISO Abbreviation: Nat Biomed Eng Subsets: MEDLINE

Imprint Name(s):

Publication: London : Springer Nature
Original Publication: [London] : Macmillan Publishers Limited, [2016]-

MeSH Terms:

High-Throughput Nucleotide Sequencing*/methods , DNA Primers*/genetics , Sequence Analysis, DNA*/methods , Programming Languages*, Humans ; Software ; Algorithms ; Robotics ; Large Language Models

References:

Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023). (PMID: 380307201070013110.1038/s41586-023-06735-9)
Bennett, J. A. et al. Autonomous reaction Pareto-front mapping with a self-driving catalysis laboratory. Nat. Chem. Eng. 1, 240–250 (2024). (PMID: 10.1038/s44286-024-00033-5)
Slattery, A. et al. Automated self-optimization, intensification, and scale-up of photocatalysis in flow. Science 383, eadj1817 (2024). (PMID: 3827152910.1126/science.adj1817)
Bryant, J. A. Jr, Kellinger, M., Longmire, C., Miller, R. & Wright, R. C. AssemblyTron: flexible automation of DNA assembly with Opentrons OT-2 lab robots. Synth. Biol. 8, ysac032 (2023). (PMID: 10.1093/synbio/ysac032)
Volk, A. A. et al. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning. Nat. Commun. 14, 1403 (2023). (PMID: 369185611001500510.1038/s41467-023-37139-y)
Wierenga, R. P., Golas, S. M., Ho, W., Coley, C. W. & Esvelt, K. M. PyLabRobot: an open-source, hardware-agnostic interface for liquid-handling robots and accessories. Device 1, 100111 (2023). (PMID: 10.1016/j.device.2023.100111)
Liu, L., Huang, Y. & Wang, H. H. Fast and efficient template-mediated synthesis of genetic variants. Nat. Methods 20, 841–848 (2023). (PMID: 371276661206617210.1038/s41592-023-01868-1)
Huang, Y. et al. High-throughput microbial culturomics using automation and machine learning. Nat. Biotechnol. 41, 1424–1433 (2023). (PMID: 368055591056756510.1038/s41587-023-01674-2)
Dama, A. C. et al. BacterAI maps microbial metabolism without prior knowledge. Nat. Microbiol. 8, 1018–1025 (2023). (PMID: 3714277510.1038/s41564-023-01376-0)
Vemprala, S. H., Bonatti, R., Bucker, A. & Kapoor, A. ChatGPT for robotics: design principles and model abilities. IEEE Access 12, 55682–55696 (2024). (PMID: 10.1109/ACCESS.2024.3387941)
Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Yao, S. et al. ReAct: synergizing reasoning and acting in language models. In Eleventh International Conference on Learning Representations (ICLR, 2023).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature 625, 468–475 (2024). (PMID: 3809690010.1038/s41586-023-06924-6)
Yang, C. et al. Large language models as optimizers. In Twelfth International Conference on Learning Representations (ICLR, 2024).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459–9474 (2020).
Xi, Z. et al. The rise and potential of large language model based agents: a survey. Sci. China Inf. Sci. 68, 121101 (2025). (PMID: 10.1007/s11432-024-4222-0)
Zhou, W. et al. Agents: an open-source framework for autonomous language agents. In Twelfth International Conference on Learning Representations (ICLR, 2024).
Shen, Y. et al. HuggingGPT: solving AI tasks with ChatGPT and its friends in hugging face. Adv. Neural Inf. Process. Syst. 36, 38154–38180 (2023).
Qian, C. et al. Communicative agents for software development. Preprint at https://arxiv.org/abs/2307.07924 (2023).
Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations. In First Conference on Language Modeling (2024).
Ghafarollahi, A. & Buehler, M. J. SciAgents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning. Adv. Mater. 37, 2413523 (2025). (PMID: 3969689810.1002/adma.202413523)
Yang, Z. et al. MM-REACT: prompting ChatGPT for multimodal reasoning and action. Preprint at https://arxiv.org/abs/2303.11381 (2023).
Patil, S. G., Zhang, T., Wang, X. & Gonzalez, J. E. Gorilla: large language model connected with massive APIs. Adv. Neural Inf. Process. Syst. 37, 126544–126565 (2024).
Messeri, L. & Crockett, M. J. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024). (PMID: 3844869310.1038/s41586-024-07146-0)
Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4, 761–769 (2022). (PMID: 36247217955214510.1038/s42254-022-00518-3)
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023). (PMID: 381238061073313610.1038/s41586-023-06792-0)
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024). (PMID: 10.1038/s42256-024-00832-8)
Darvish, K. et al. ORGANA: a robotic assistant for automated chemistry experimentation and characterization. Matter 8, 101897 (2025). (PMID: 10.1016/j.matt.2024.10.015)
Dai, T. et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 635, 890–897 (2024). (PMID: 395061221160272110.1038/s41586-024-08173-7)
Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024). (PMID: 3948639910.1016/j.cell.2024.09.022)
Xiao, M. et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med. 12, 57 (2020). (PMID: 32605661732519410.1186/s13073-020-00751-4)
Kunasol, C. et al. Comparative analysis of targeted next-generation sequencing for Plasmodium falciparum drug resistance markers. Sci. Rep. 12, 5563 (2022). (PMID: 35365711897480710.1038/s41598-022-09474-5)
Nozawa, A. et al. Comprehensive targeted next-generation sequencing in patients with slow-flow vascular malformations. J. Hum. Genet. 67, 721–728 (2022). (PMID: 3617129510.1038/s10038-022-01081-6)
Rawat, A. et al. Utility of targeted next generation sequencing for inborn errors of immunity at a tertiary care centre in North India. Sci. Rep. 12, 10416 (2022). (PMID: 35729272921341310.1038/s41598-022-14522-1)
Jan, Y.-H. et al. Comprehensive assessment of actionable genomic alterations in primary colorectal carcinoma using targeted next-generation sequencing. Br. J. Cancer 127, 1304–1311 (2022). (PMID: 35842545951987110.1038/s41416-022-01913-4)
Xie, N. G. et al. Designing highly multiplex PCR primer sets with simulated annealing design using dimer likelihood estimation (SADDLE). Nat. Commun. 13, 1881 (2022). (PMID: 35410464900168410.1038/s41467-022-29500-4)
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. & Yao, S. Reflexion: language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 36, 8634–8652 (2023).
Madaan, A. et al. Self-refine: iterative refinement with self-feedback. Adv. Neural Inf. Process. Syst. 36, 46534–46594 (2023).
Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023). (PMID: 380307211070013310.1038/s41586-023-06734-w)
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015). (PMID: 2542834910.1093/nar/gku1205)
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024). (PMID: 381832041076797210.1093/nar/gkad986)
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020). (PMID: 31777943694304010.1093/nar/gkz972)
UniProt Consortium UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015). (PMID: 10.1093/nar/gku989)
Federhen, S. The NCBI taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012). (PMID: 2213991010.1093/nar/gkr1178)
The WHO Global Tuberculosis Report 2022 (WHO, 2022); https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2022.
McArthur, A. G. et al. The comprehensive antibiotic resistance database. Antimicrob. Agents Chemother. 57, 3348–3357 (2013). (PMID: 23650175369736010.1128/AAC.00419-13)
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 1–9 (2009). (PMID: 10.1186/1471-2105-10-421)
Wang, M. X. et al. Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens. Nat. Commun. 15, 6306 (2024). (PMID: 390602541128222110.1038/s41467-024-49957-9)
Xia, H. et al. MultiPrime: a reliable and efficient tool for targeted next-generation sequencing. iMeta 2, e143 (2023). (PMID: 388682271098983610.1002/imt2.143)
Wang, K. et al. MFEprimer-3.0: quality control for PCR primers. Nucleic Acids Res. 47, W610–W613 (2019). (PMID: 31066442660248510.1093/nar/gkz351)
Dreier, M., Berthoud, H., Shani, N., Wechsler, D. & Junier, P. SpeciesPrimer: a bioinformatics pipeline dedicated to the design of qPCR primers for the quantification of bacterial species. PeerJ 8, e8544 (2020). (PMID: 32110486703437910.7717/peerj.8544)
Yang, L. et al. A tool to automatically design multiplex PCR primer pairs for specific targets using diverse templates. Sci. Rep. 13, 16451 (2023). (PMID: 377775801054235910.1038/s41598-023-43825-0)
Yuan, J. et al. The web-based multiplex PCR primer design software Ultiplex and the associated experimental workflow: up to 100-plex multiplicity. BMC Genom. 22, 835 (2021). (PMID: 10.1186/s12864-021-08149-1)
Ghezzi, H. et al. PUPpy: a primer design pipeline for substrain-level microbial detection and absolute quantification. mSphere 9, e00360–00324 (2024). (PMID: 389800721128801610.1128/msphere.00360-24)
dnasoftware (dnasoftware); https://www.dnasoftware.com/ (2025).
SantaLucia, J. Jr & Hicks, D. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 33, 415–440 (2004). (PMID: 1513982010.1146/annurev.biophys.32.110601.141800)
Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012). (PMID: 22730293342458410.1093/nar/gks596)
Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).
Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning (Addison Wesley Publishing Company, 1989).
hCoV-2019/nCoV-2019 Version 3 Amplicon Set (ARTIC, 2020); https://artic.network/resources/ncov/ncov-amplicon-v3.pdf.
ARTIC v5.3.2 (ARTIC); https://github.com/quick-lab/SARS-CoV-2/blob/main/400/v5.3.2_400/pooling.
Quick, J. et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 12, 1261–1276 (2017). (PMID: 28538739590202210.1038/nprot.2017.066)
Edwards, J. G. et al. Expanded carrier screening in reproductive medicine—points to consider: a joint statement of the American College of Medical Genetics and Genomics, American College of Obstetricians and Gynecologists, National Society of Genetic Counselors, Perinatal Quality Foundation, and Society for Maternal-Fetal Medicine. Obstet. Gynecol. 125, 653–662 (2015). (PMID: 2573023010.1097/AOG.0000000000000666)
Goldberg, J. D., Pierson, S. & Johansen Taber, K. Expanded carrier screening: what conditions should we screen for? Prenat. Diagn. 43, 496–505 (2023). (PMID: 3662455210.1002/pd.6306)
Cabibbe, A. M. et al. Application of targeted next-generation sequencing assay on a portable sequencing platform for culture-free detection of drug-resistant tuberculosis from clinical samples. J. Clin. Microbiol. 58, 10–1128 (2020). (PMID: 10.1128/JCM.00632-20)
Dookie, N., Khan, A., Padayatchi, N. & Naidoo, K. Application of next generation sequencing for diagnosis and clinical management of drug-resistant tuberculosis: updates on recent developments in the field. Front. Microbiol. 13, 775030 (2022). (PMID: 35401475898819410.3389/fmicb.2022.775030)
Catalogue of Mutations in Mycobacterium tuberculosis Complex and Their Association with Drug Resistance (WHO, 2021); https://www.who.int/publications/i/item/9789240028173.
Butler, W. R. & Guthertz, L. S. Mycolic acid analysis by high-performance liquid chromatography for identification of Mycobacterium species. Clin. Microbiol. Rev. 14, 704–726 (2001). (PMID: 115857828899410.1128/CMR.14.4.704-726.2001)
Ni, G. et al. Novel multiplexed amplicon-based sequencing to quantify SARS-CoV-2 RNA from wastewater. Environ. Sci. Technol. Lett. 8, 683–690 (2021). (PMID: 3756637510.1021/acs.estlett.1c00408)
Vanella, R., Kovacevic, G., Doffini, V., de Santaella, J. F. & Nash, M. A. High-throughput screening, next generation sequencing and machine learning: advanced methods in enzyme engineering. Chem. Commun. 58, 2455–2467 (2022). (PMID: 10.1039/D1CC04635G)
Nakatsu, T. et al. Structural basis for the spectral difference in luciferase bioluminescence. Nature 440, 372–376 (2006). (PMID: 1654108010.1038/nature04542)
Hashimoto, H. et al. Crystal structure of DNA polymerase from hyperthermophilic archaeon Pyrococcus kodakaraensis KOD1. J. Mol. Biol. 306, 469–477 (2001). (PMID: 1117890610.1006/jmbi.2000.4403)
Lunde, B. M., Magler, I. & Meinhart, A. Crystal structures of the Cid1 poly (U) polymerase reveal the mechanism for UTP selectivity. Nucleic Acids Res. 40, 9815–9824 (2012). (PMID: 22885303347919610.1093/nar/gks740)
Lu, X. et al. Enzymatic DNA synthesis by engineering terminal deoxynucleotidyl transferase. ACS Catal. 12, 2988–2997 (2022). (PMID: 10.1021/acscatal.1c04879)
MGI AlphaTool (MGI); https://www.mgi-tech.com/647 (2024).
Khot, T. et al. Decomposed prompting: a modular approach for solving complex tasks. In Eleventh International Conference on Learning Representations (ICLR, 2023).
Li, C. et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. Adv. Neural Inf. Process. Syst. 36, 28541–28564 (2023).
Jetson Nano (NVIDIA); https://developer.nvidia.com/embedded/jetson-nano (2019).
Taymans, W., Baker, S., Wingo, A., Bultje, R. S. & Kost, S. Gstreamer application development manual (1.2.3). https://gstreamer.freedesktop.org/ (2013).
Hong, Y. et al. 3D-LLM: injecting the 3D world into large language models. Adv. Neural Inf. Process. Syst. 36, 20482–20494 (2023).
Wang, P. et al. Qwen2-vl: enhancing vision-language model’s perception of the world at any resolution. Preprint at https://arxiv.org/abs/2409.12191 (2024).
Hu, E. J. et al. LoRA: low-rank adaptation of large language models. In Tenth International Conference on Learning Representations 1, 3 (ICLR, 2022).
Yao, S. et al. Tree of thoughts: deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 36, 11809–11822 (2023).
Zelikman, E. et al. Quiet-STaR: language models can teach themselves to think before speaking. Preprint at https://arxiv.org/abs/2403.09629 (2024).
Liu, Z. et al. Inference-time scaling for generalist reward modeling. Preprint at https://arxiv.org/abs/2504.02495 (2025).
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023). (PMID: 36420896982546210.1093/nar/gkac1071)
Wright, C. F., FitzPatrick, D. R., Ware, J. S., Rehm, H. L. & Firth, H. V. Importance of adopting standardized MANE transcripts in clinical reporting. Genet. Med. 25, 100331 (2023). (PMID: 3644116910.1016/j.gim.2022.10.013)
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022). (PMID: 35388217900774110.1038/s41586-022-04558-8)
SARS-CoV-2 Variants Overview (NCBI Virus, 2004–2024); https://www.ncbi.nlm.nih.gov/activ.
Katoh, K., Misawa, K., Kuma, K. I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). (PMID: 1213608813575610.1093/nar/gkf436)
Abdin, M. et al. Phi-3 technical report: a highly capable language model locally on your phone. Preprint at https://arxiv.org/abs/2404.14219 (2024).
Yao, Y. et al. MiniCPM-V: a GPT-4V level MLLM on your phone. Preprint at https://arxiv.org/abs/2408.01800 (2024).
Hui, T. Gene data from Clinvar. figshare https://doi.org/10.6084/m9.figshare.28876808.v4 (2025).
Hui, T. Species identification data from NCBI. figshare https://doi.org/10.6084/m9.figshare.28877087.v1 (2025).
Hui, T. PrimeGen Figs. 2–4 Source Data. figshare https://doi.org/10.6084/m9.figshare.28876844.v1 (2025).
melobio. melobio/PrimeGen: V1.0.1 (V1.0.1). Zenodo https://doi.org/10.5281/zenodo.15279353 (2025).

Substance Nomenclature:

0 (DNA Primers)

Entry Date(s):

Date Created: 20250730 Date Completed: 20260220 Latest Revision: 20260220

Update Code:

20260220

DOI:

10.1038/s41551-025-01455-z

PMID:

40738975

Database:

MEDLINE

Weitere Informationen

The pre-trained knowledge compressed in large language models is addressing diverse scientific challenges and catalysing the progression of autonomous laboratory systems, synergized with liquid handling robots. Here we introduce PrimeGen, an orchestrated multi-agent system powered by large language models, designed to streamline labour-intensive primer design tasks for targeted next-generation sequencing. PrimeGen uses GPT-4o as a central controller to engage with experimentalists for task planning and decomposition, coordinating various specialized agents to execute distinct subtasks. These include an interactive search agent for retrieving gene targets from databases, a primer agent for designing primer sequences across multiple scenarios, a protocol agent for generating executable robot scripts through retrieval-augmented generation and prompt engineering, and an experiment agent equipped with a vision language model for detecting and reporting anomalies. We experimentally demonstrate the effectiveness of PrimeGen across a variety of applications. PrimeGen can accommodate up to 955 amplicons, ensuring high amplification uniformity and minimizing dimer formation. Our development underscores the potential of collaborative agents, coordinated by generalist foundation models, as intelligent tools for advancing biomedical research.
(© 2025. The Author(s), under exclusive licence to Springer Nature Limited.)

Competing interests: J.W., D.Y. and F.M. declare stock holdings in MGI. The remaining authors declare no competing interests.

Treffer: Accelerating primer design for amplicon sequencing using large language model-powered agents.

Weitere Informationen

Links

Zusatz-Funktionen