Bioinformatics and mathematical modelling
Xiaoliang Sun group

Room: 1.316
Althanstraße 14 (UZA I)
1090 Vienna, Austria
M: +43-677-6264-9211
xiaoliang.sun@univie.ac.at


Research Focus

The bioinformatics group supports experimental groups from raw OMICS data processing, multivariate statistical analysis, machine learning solutions to mathematical modelling and network analysis. The group develops bioinformatics software and OMICS database to construct automatic bioinformatics platform for OMICS data.

1. Raw OMICS data processing:

The group develops several raw OMICS data processing pipelines for

1) Next-generation genome sequencing. We annotated a plant growth-promoting endophyte Paenibacillus sp. P22 and identified the existence of several key nitrogen fixation genes.

2) Metagenomics barcoding data. We collected over 1 million xxx from the benthic animals in river estuaries, clustered over 17000 OTUs (operational taxonomy units) spanning 60 phylum and build statistical models on their distribution and abundance with environmental factors.

3) LC-MS metabolomics. We developed software mzFun to process the high-throughput high-volumes chromatof data including raw data deconvolution, peak identification, compound matching against in-house and external MS/MS libraries, peak alignment across multiple samples and pathway mining in secondary metabolism.

2. Statistics and machine learning:

The group closely collaborates with the experimental groups and provides data analysis solutions on OMICS data. The high-dimensional and noisy metabolomics and proteomics data calls for careful data preprocessing and suitable multivariate statistics.

We developed and are constantly improving an all-in-one statistical toolbox COVAIN with GUI (graphical user interface) that provides essential preprocessing steps and many popular methods including correlation, PCA/ICA, clustering, Granger causality and regression analysis for OMICS data.

In addition, we apply cutting-edge feature construction, feature selection and classification methodologies on more complex data such as medical and nutritional researches in predicting diseases from molecular level such as metabolomics data. We also pioneered in associating chaotic theory with nonlinear dynamics of gene expression time-series. 

3. Mathematical modelling:

Traditional mathematical modelling often faces the difficulties in collecting kinetic parameters of biochemical reactions and validating the model with experimental data, thus usually is limited in a small scale. We apply the concept of the inverse approach that constructs the mathematical model directly from large-scale OMICS data.

The group developed the inverse Jacobian approach that infers the Jacobian matrix where reaction-level regulation is recorded from metabolomics data by using the stoichiometric matrix available from genome annotation. Thus the inverse Jacobian directly connects genotype (genomic data) with phenotype (metabolomics data).

4. Network analysis:

The network studies of our group have two folds.

1) The relevant network in associating the experimental factors (such as different treatments, environmental factors or body characteristics, etc.) with OMICS features by using correlation coefficients, Granger causality analysis, regression analysis and generalized linear models.

2) The genome-scale metabolic network reconstruction from genome sequence, functional annotation and proteomics/metabolomics data.

We develop algorithms to fill incomplete pathways and simplify the network to cover the “measurable” metabolites. Upon validating with experimental data, flux balance analysis is applied to predict the biomass production from different growth conditions and thus helpful to understand important flux control reactions.

5. Software and database:

The group develops and maintains: COVAIN, mzGroupAnalyzer, mzFun and ProMex.

Event:

We will present a machine learning workshop in AMPRS 2018. The focus will be multivariate feature selection methods (such as genetic algorithms) and supervised classification on mining combined metabolomics and clinical datasets.