*Result*: MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation.
Bioinformatics. 2014 May 1;30(9):1236-40. (PMID: 24451626)
Bioinformatics. 2021 Aug 25;37(16):2414-2422. (PMID: 33576802)
Trends Genet. 2013 Nov;29(11):609-10. (PMID: 24138813)
Bioinformatics. 2024 Nov 1;40(11):. (PMID: 39499152)
Bioinformatics. 2023 Oct 3;39(10):. (PMID: 37847755)
KDD. 2016 Aug;2016:855-864. (PMID: 27853626)
Brief Bioinform. 2023 Jul 20;24(4):. (PMID: 37401369)
Nat Commun. 2021 May 26;12(1):3168. (PMID: 34039967)
Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
Nat Methods. 2021 Apr;18(4):366-368. (PMID: 33828273)
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. (PMID: 34791371)
Nat Commun. 2025 Jan 2;16(1):70. (PMID: 39746897)
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. (PMID: 30476243)
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1713-1724. (PMID: 36251905)
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. (PMID: 25378336)
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):. (PMID: 33876751)
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. (PMID: 33237286)
Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271. (PMID: 34252926)
Commun Biol. 2024 Dec 27;7(1):1705. (PMID: 39730886)
Brief Bioinform. 2023 May 19;24(3):. (PMID: 36964722)
*Further Information*
*Motivation: In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO (Multi-species protein Structures and Network to predict GO terms) model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction.
Results: We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and protein-protein networks.
Availability and Implementation: https://github.com/blingbell/MSNGO.
(© The Author(s) 2025. Published by Oxford University Press.)*