*Result*: An end-to-end framework for data lineage analysis covering link pattern recognition, fault diagnosis, and early warning.

Title:
An end-to-end framework for data lineage analysis covering link pattern recognition, fault diagnosis, and early warning.
Authors:
Hou R; School of Computer Science and Technology, Shenyang Institute of Engineering, Shenyang, 110136, China., Zhang S; China Gridcom Co., Ltd, Shenzhen, 518109, China., Wang H; School of Computer Science and Technology, Shenyang Institute of Engineering, Shenyang, 110136, China., Li S; Beijing Fibrlink Communications Co., Ltd., Beijing, 100071, China., Zhang Y; College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, 300457, China. yiyingzhang@tust.edu.cn.
Source:
Scientific reports [Sci Rep] 2026 Jan 07; Vol. 16 (1), pp. 4430. Date of Electronic Publication: 2026 Jan 07.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
References:
Sci Rep. 2024 Aug 26;14(1):19756. (PMID: 39187569)
Sci Rep. 2025 Aug 2;15(1):28284. (PMID: 40753259)
Sci Rep. 2025 May 29;15(1):18804. (PMID: 40442263)
IEEE Trans Neural Netw. 2009 Jan;20(1):61-80. (PMID: 19068426)
Sci Rep. 2025 Apr 23;15(1):14059. (PMID: 40269056)
Grant Information:
LJ242511632001 Basic scientific research projects of higher education institutions in Liaoning Province in 2025
Contributed Indexing:
Keywords: Data lineage; Fault diagnosis; Pattern recognition
Entry Date(s):
Date Created: 20260107 Date Completed: 20260202 Latest Revision: 20260205
Update Code:
20260205
PubMed Central ID:
PMC12864967
DOI:
10.1038/s41598-025-34522-1
PMID:
41501163
Database:
MEDLINE

*Further Information*

*With the increasing complexity of data platforms, achieving real-time prediction and tracing of data link failures has become a critical issue that needs to be addressed. We proposes an End-to-End Full-Link intelligent analysis framework (EEFL) based on data lineage. This framework combines graph structures with deep learning algorithms to achieve link pattern recognition and fault warning. First, a dynamic data lineage graph model is constructed and topological features are extracted using a graph neural network (GNN). Through temporal edge weight optimization and semi-supervised clustering, typical link patterns are automatically classified. Second, a hybrid fault diagnosis model is designed, using a temporal convolutional network (TCN) to capture long-term dependencies between link metrics and combining it with a GNN to analyze topological mutations. This model accurately classifies various fault types, including data outages, latency anomalies, and data contamination. Finally, a dynamic threshold warning mechanism is introduced, combining Bayesian optimization and online learning to adaptively adjust alarm triggering conditions and effectively reduce false alarm rates. We verifies the generalization ability of the model using actual enterprise data and simulation data. Experimental results show that EEFL can achieve an average Acc of 92.73% across two datasets, which is significantly better than traditional methods and provides intelligent decision for data governance.
(© 2025. The Author(s).)*

*Declarations. Competing interests: The authors declare no competing interests.*