TiMEG: an integrative statistical method for partially missing multi-omics data

AbstractMulti-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.

Download Full-text

TiMEG: an integrative approach for partially missing multi-omics data with an application to tuberous sclerosis

10.1101/2020.12.10.420638 ◽

2020 ◽

Author(s):

Sarmistha Das ◽

Indranil Mukhopadhyay

Keyword(s):

Data Integration ◽

Tuberous Sclerosis ◽

Analysis Data ◽

Integrative Approach ◽

Omics Data ◽

Data Types ◽

Individual Level ◽

Control Paradigm ◽

Likelihood Approach ◽

Omics Data Integration

1AbstractMulti-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omics analyses. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.

Download Full-text

0311 - Effect of ammonia on the dynamics of anaerobic digestion microbiome: omics data integration in a time-course context

10.26226/morressier.5b5199beb1b87b000ecee94b ◽

2018 ◽

Author(s):

Olivier Chapleur

Keyword(s):

Anaerobic Digestion ◽

Data Integration ◽

Time Course ◽

Omics Data ◽

Omics Data Integration

Download Full-text

Multi-omics data integration reveals correlated regulatory features of triple negative breast cancer

Molecular Omics ◽

10.1039/d1mo00117e ◽

2021 ◽

Author(s):

Kevin Chappell ◽

Kanishka Manna ◽

Charity L. Washam ◽

Stefan Graw ◽

Duah Alkam ◽

...

Keyword(s):

Breast Cancer ◽

Data Integration ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Biological Pathways ◽

Omics Data ◽

Insight Into ◽

Omics Data Integration

Multi-omics data integration of triple negative breast cancer (TNBC) provides insight into biological pathways.

Download Full-text

MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies

Scientific Reports ◽

10.1038/s41598-021-81200-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mario Zanfardino ◽

Rossana Castaldo ◽

Katia Pane ◽

Ornella Affinito ◽

Marco Aiello ◽

...

Keyword(s):

User Interface ◽

Data Integration ◽

Graphical User Interface ◽

Data Science ◽

Heterogeneous Data ◽

Biological Information ◽

Omics Data ◽

Correlation Clustering ◽

Downstream Analysis ◽

Omics Data Integration

AbstractAnalysis of large-scale omics data along with biomedical images has gaining a huge interest in predicting phenotypic conditions towards personalized medicine. Multiple layers of investigations such as genomics, transcriptomics and proteomics, have led to high dimensionality and heterogeneity of data. Multi-omics data integration can provide meaningful contribution to early diagnosis and an accurate estimate of prognosis and treatment in cancer. Some multi-layer data structures have been developed to integrate multi-omics biological information, but none of these has been developed and evaluated to include radiomic data. We proposed to use MultiAssayExperiment (MAE) as an integrated data structure to combine multi-omics data facilitating the exploration of heterogeneous data. We improved the usability of the MAE, developing a Multi-omics Statistical Approaches (MuSA) tool that uses a Shiny graphical user interface, able to simplify the management and the analysis of radiogenomic datasets. The capabilities of MuSA were shown using public breast cancer datasets from TCGA-TCIA databases. MuSA architecture is modular and can be divided in Pre-processing and Downstream analysis. The pre-processing section allows data filtering and normalization. The downstream analysis section contains modules for data science such as correlation, clustering (i.e., heatmap) and feature selection methods. The results are dynamically shown in MuSA. MuSA tool provides an easy-to-use way to create, manage and analyze radiogenomic data. The application is specifically designed to guide no-programmer researchers through different computational steps. Integration analysis is implemented in a modular structure, making MuSA an easily expansible open-source software.

Download Full-text