Abstract:
The development of next-generation sequencing and single-cell technology has generated vast genome-scale multi-OMICS datasets. One central goal of systems biology is to leverage these datasets to infer biochemical regulations.
For this inference, previous studies have proposed a convenient mathematical method, which addresses this problem using metabolomics data for the inverse calculation of biochemical Jacobian matrices revealing regulatory checkpoints of biochemical regulations. However, these algorithms are limited by two issues: they rely on manually assembled structural network information, and they are numerically unstable due to ill-conditioned regression problems for largescale metabolic networks.
This dissertation comprises different efforts to develop a more stable inverse Jacobian algorithm and related automated workflow. Starting on the two mentioned bottlenecks, we firstly develop a novel regression loss-based inverse Jacobian algorithm, combining metabolomics COVariance and genome-scale metabolic RECONstruction (COVRECON). This automated approach comprises two main parts: Sim-Network and inverse differential Jacobian evaluation. Sim-Network automatically generates an organism-specific enzyme and reaction dataset from Bigg and KEGG databases, which is then used to reconstruct the Jacobian’s structure for a specific metabolomics dataset.
Instead of directly solving a regression problem as in the previous workflow, the new inverse differential Jacobian is based on a substantially more robust approach and rates the biochemical interactions according to their relevance from large-scale metabolomics data.
In conclusion, COVRECON automatically reconstructs a data-driven superpathway model, accommodates more general network structures, improves stability, reduces computation time, and extends to large-scale models. Traditionally, these studies assumed metabolomics variations solely resulted from metabolic system fluctuations, acting independently on each metabolite. However, emerging evidence highlights internal network fluctuations, particularly from the gene regulatory network, leading to a non-diagonal fluctuation matrix D. In a second work, we propose an approach where enzymes with significant variances in activity values serve as indicators of large non-diagonal fluctuations within matrix D.
After a comprehensive assessment of three critical factors with the affecting its accuracy, we conclude that integrating non-diagonal D structure information significantly enhances the inverse Jacobian algorithm's performance. When the number of the non-diagonal fluctuations is large or their magnitude is relatively small, assuming a diagonal D fluctuation matrix remains feasible. Finally, we apply COVRECON to analyze the metabolomics measurements from the active aging project. This analysis involves 263 plasma samples from elderly individuals and aims to identify biomarkers linking metabolomics to body activity.
Initially, we utilize several automatic machine learning classification approaches to identify key metabolite biomarkers. Subsequently, we apply the classifier results aided COVRECON inverse differential Jacobian analysis. This process identifies specific bloodmarkers as most critical metabolic network regulatory factors of active aging dynamics. In summary, we combine machine learning classification and the inverse differential Jacobian to identify key biomarkers and important regulatory processes within the metabolic interaction network.
Corresponding publications:
COVRECON: automated integration of genome- and metabolome-scale network reconstruction and data-driven inverse modeling of metabolic interaction networks Jiahang Li, Steffen Waldherr, Wolfram Weckwerth, Bioinformatics, Volume 39, Issue 7, July 2023, btad397, doi.org/10.1093/bioinformatics/btad397
Enzyme fluctuations data improve inference of metabolic interaction networks with an inverse differential Jacobian approach Jiahang Li, Wolfram Weckwerth, Steffen Waldherr doi: doi.org/10.1101/2023.12.11.570118