*Result*: LitSumm: large language models for literature summarization of noncoding RNAs.

Title:
LitSumm: large language models for literature summarization of noncoding RNAs.
Authors:
Green A; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK., Ribas CE; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK., Ontiveros-Palacios N; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK., Griffiths-Jones S; School of Biological Sciences, Faculty of Medicine, Biology and Health, Michael Smith Building, The University of Manchester, Manchester M13 9NT, UK., Petrov AI; Riboscope Ltd, 23 King St, Cambridge CB1 1AH, UK., Bateman A; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK., Sweeney B; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.
Source:
Database : the journal of biological databases and curation [Database (Oxford)] 2025 Feb 05; Vol. 2025.
Publication Type:
Journal Article; Research Support, Non-U.S. Gov't
Language:
English
Journal Info:
Publisher: Oxford Journals Country of Publication: England NLM ID: 101517697 Publication Model: Print Cited Medium: Internet ISSN: 1758-0463 (Electronic) Linking ISSN: 17580463 NLM ISO Abbreviation: Database (Oxford) Subsets: MEDLINE
Imprint Name(s):
Original Publication: Oxford : Oxford Journals, 2009-
References:
Nucleic Acids Res. 2019 Jan 8;47(D1):D155-D162. (PMID: 30423142)
ACS Synth Biol. 2023 Oct 20;12(10):2973-2982. (PMID: 37682043)
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. (PMID: 36408920)
Nucleic Acids Res. 2021 Jan 8;49(D1):D899-D907. (PMID: 33219682)
Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200. (PMID: 33211869)
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:1387-1407. (PMID: 39629494)
Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. (PMID: 36243972)
Bioinformatics. 2023 Sep 2;39(9):. (PMID: 37682111)
Trans Assoc Comput Linguist. 2024;12:1043-1062. (PMID: 40740856)
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:9871-9889. (PMID: 39629493)
Nucleic Acids Res. 2023 Jan 6;51(D1):D121-D125. (PMID: 36399492)
Nucleic Acids Res. 2021 Jan 8;49(D1):D212-D220. (PMID: 33106848)
PLoS Biol. 2018 Apr 16;16(4):e2002846. (PMID: 29659566)
Genetics. 2023 May 4;224(1):. (PMID: 36607068)
Nucleic Acids Res. 2022 Jan 7;50(D1):D204-D210. (PMID: 34850127)
PLoS One. 2013 Jun 18;8(6):e65390. (PMID: 23823062)
Nucleic Acids Res. 2023 Jan 6;51(D1):D291-D296. (PMID: 36165892)
Grant Information:
United Kingdom WT_ Wellcome Trust; 218302/Z/19/Z United Kingdom WT_ Wellcome Trust; 945405 H2020 Marie Sklodowska-Curie Actions
Substance Nomenclature:
0 (RNA, Untranslated)
Entry Date(s):
Date Created: 20250205 Date Completed: 20250205 Latest Revision: 20260309
Update Code:
20260309
PubMed Central ID:
PMC11833236
DOI:
10.1093/database/baaf006
PMID:
39908113
Database:
MEDLINE

*Further Information*

*Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.
(© The Author(s) 2025. Published by Oxford University Press.)*