Treffer: Development of Large Language Model Specialized into Microbiome Datasets: an Application of Self-Evaluation and Scoring Comparison with Conventional Natural Language Processing Markers.

Title:
Development of Large Language Model Specialized into Microbiome Datasets: an Application of Self-Evaluation and Scoring Comparison with Conventional Natural Language Processing Markers.
Authors:
Park CK; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Bae SH; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Park HW; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Oh NS; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Kim YJ; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Kim YW; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Cho TJ; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea., Li Y; Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, Foshan University, Foshan 528225, P.R. China., Chai J; Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, Foshan University, Foshan 528225, P.R. China., Zhao J; Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China., Cho HT; The Bioinformatix, Gwangmyeong 14348, Republic of Korea., Jung JH; College of Korean Medicine, Kyung Hee University, Seoul 02447, Republic of Korea., Park J; College of Korean Medicine, Kyung Hee University, Seoul 02447, Republic of Korea., Kim TG; The Bioinformatix, Gwangmyeong 14348, Republic of Korea., Kim JK; Department of Food and Biotechnology, Korea University, Sejong 30019, Republic of Korea.; Department of Health Behavior and Nutrition Sciences, University of Delaware, Newark, DE 19711, USA.
Source:
Journal of microbiology and biotechnology [J Microbiol Biotechnol] 2026 Jan 26; Vol. 36, pp. e2511050. Date of Electronic Publication: 2026 Jan 26.
Publication Type:
Journal Article; Comparative Study
Language:
English
Journal Info:
Publisher: Korean Society for Microbiology and Biotechnology Country of Publication: Korea (South) NLM ID: 9431852 Publication Model: Electronic Cited Medium: Internet ISSN: 1738-8872 (Electronic) Linking ISSN: 10177825 NLM ISO Abbreviation: J Microbiol Biotechnol Subsets: MEDLINE
Imprint Name(s):
Publication: 2002- : Seoul : Korean Society for Microbiology and Biotechnology
Original Publication: Seoul : Korean Society for Applied Microbiology, 1991-2001.
References:
Clin Liver Dis. 2020 Aug;24(3):493-520. (PMID: 32620285)
Nat Rev Gastroenterol Hepatol. 2018 Jul;15(7):397-411. (PMID: 29748586)
Front Immunol. 2020 Oct 06;11:569104. (PMID: 33123141)
Blood Purif. 2024;53(11-12):871-883. (PMID: 39217985)
Medicina (Kaunas). 2022 Mar 22;58(4):. (PMID: 35454298)
Trends Plant Sci. 2024 Oct;29(10):1145-1155. (PMID: 38797656)
Front Immunol. 2022 Nov 11;13:966329. (PMID: 36439097)
Curr Opin Pharmacol. 2019 Dec;49:76-81. (PMID: 31670055)
OMICS. 2022 Aug;26(8):415-421. (PMID: 35925812)
Front Neurosci. 2023 Dec 20;17:1327499. (PMID: 38178834)
NPJ Digit Med. 2024 Sep 28;7(1):258. (PMID: 39333376)
Contributed Indexing:
Keywords: Bioinformatics; Domain adaptation; Human evaluation; Large language model; Microbiome; Phi-4 metric
Entry Date(s):
Date Created: 20260128 Date Completed: 20260128 Latest Revision: 20260206
Update Code:
20260206
PubMed Central ID:
PMC12868943
DOI:
10.4014/jmb.2511.11050
PMID:
41605796
Database:
MEDLINE

Weitere Informationen

The gut microbiome plays a fundamental role in host metabolism, immune regulation, and disease development. With the rapid accumulation of multi-omics and literature data, the microbiome field now faces the challenge of efficiently extracting scientific insights from massive, heterogeneous datasets. Artificial intelligence (AI) and large language models (LLMs) provide promising tools to address this complexity by enabling integrative analysis and knowledge synthesis across diverse biological sources. In this study, we developed METABOLISM, a microbiome-specialized LLM fine-tuned on 160,000 scientific abstracts to enhance literature-based contextual understanding of microbiome-liver interactions and related biological mechanisms. Using LoRA-based parameter-efficient training, METABOLISM was optimized for domain-specific reasoning and response generation. Model performance was evaluated through both automated Phi-4 scoring (a large language model-based evaluator for relevance, informativeness, and fluency) and structured human expert rubric assessments involving 20 domain specialists. The fine-tuned METABOLISM achieved superior relevance and clarity scores (mean > 7.5 ± 0.06) compared with general-purpose LLMs such as Gemma-3-12B-IT and ChatGPT-4o. Correlation analysis revealed weak to moderate negative relationships (R = -0.65, p < 0.0001) between traditional NLP metrics (BLEU, ROUGE) and human expert rubric scores, with a similar trend observed for correlations with Phi-4-based automated evaluation scores, indicating the limitations of surface-level similarity measures in biomedical contexts. Overall, our findings demonstrate that microbiome-adapted LLMs can effectively distill high-volume scientific data into biologically meaningful insights, supporting more efficient and interpretable research in microbiology and systems biology.