*Result*: MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation.

Title:
MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation.
Authors:
Wang B; School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China., Cui B; School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China., Chen S; School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China., Wang X; School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China., Wang Y; Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.; Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China., Li J; School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.; Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Source:
Bioinformatics (Oxford, England) [Bioinformatics] 2025 May 06; Vol. 41 (5).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: Oxford : Oxford University Press, c1998-
References:
Proc Natl Acad Sci U S A. 2008 Sep 2;105(35):12763-8. (PMID: 18725631)
Bioinformatics. 2014 May 1;30(9):1236-40. (PMID: 24451626)
Bioinformatics. 2021 Aug 25;37(16):2414-2422. (PMID: 33576802)
Trends Genet. 2013 Nov;29(11):609-10. (PMID: 24138813)
Bioinformatics. 2024 Nov 1;40(11):. (PMID: 39499152)
Bioinformatics. 2023 Oct 3;39(10):. (PMID: 37847755)
KDD. 2016 Aug;2016:855-864. (PMID: 27853626)
Brief Bioinform. 2023 Jul 20;24(4):. (PMID: 37401369)
Nat Commun. 2021 May 26;12(1):3168. (PMID: 34039967)
Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
Nat Methods. 2021 Apr;18(4):366-368. (PMID: 33828273)
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. (PMID: 34791371)
Nat Commun. 2025 Jan 2;16(1):70. (PMID: 39746897)
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. (PMID: 30476243)
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1713-1724. (PMID: 36251905)
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. (PMID: 25378336)
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):. (PMID: 33876751)
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. (PMID: 33237286)
Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271. (PMID: 34252926)
Commun Biol. 2024 Dec 27;7(1):1705. (PMID: 39730886)
Brief Bioinform. 2023 May 19;24(3):. (PMID: 36964722)
Grant Information:
2021YFA0910700 National Key R&D Program of China; 32470704 National Natural Science Foundation of China; 2022B1212010005 Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
Substance Nomenclature:
0 (Proteins)
Entry Date(s):
Date Created: 20250506 Date Completed: 20250529 Latest Revision: 20250601
Update Code:
20260130
PubMed Central ID:
PMC12122197
DOI:
10.1093/bioinformatics/btaf285
PMID:
40327458
Database:
MEDLINE

*Further Information*

*Motivation: In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO (Multi-species protein Structures and Network to predict GO terms) model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction.
Results: We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and protein-protein networks.
Availability and Implementation: https://github.com/blingbell/MSNGO.
(© The Author(s) 2025. Published by Oxford University Press.)*