scholarly journals DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays

2019 ◽  
Vol 35 (17) ◽  
pp. 3055-3062 ◽  
Author(s):  
Amrit Singh ◽  
Casey P Shannon ◽  
Benoît Gautier ◽  
Florian Rohart ◽  
Michaël Vacher ◽  
...  

Abstract Motivation In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. Results Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. Availability and implementation DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters’ choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (14) ◽  
pp. i510-i519 ◽  
Author(s):  
Soufiane Mourragui ◽  
Marco Loog ◽  
Mark A van de Wiel ◽  
Marcel J T Reinders ◽  
Lodewyk F A Wessels

Abstract Motivation Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data are often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting. Results We show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors. Availability and implementation PRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Michael Altenbuchinger ◽  
Helena U. Zacharias ◽  
Stefan Solbrig ◽  
Andreas Schäfer ◽  
Mustafa Büyüközkan ◽  
...  

Abstract Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To this end, omics data are integrated with other data types, e.g., clinical, phenotypic, and demographic parameters of categorical or continuous nature. We exemplify this data integration issue for a chronic kidney disease (CKD) study, comprising complex clinical, demographic, and one-dimensional 1H nuclear magnetic resonance metabolic variables. Routine analysis screens for associations of single metabolic features with clinical parameters while accounting for confounders typically chosen by expert knowledge. This knowledge can be incomplete or unavailable. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and evaluate its performance by sanity checks and predictive performance assessment on independent test data. Particularly, we show that discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that associations discovered in routine univariate screening approaches can be biased by incorrect or incomplete expert knowledge. Our data integration approach reveals important associations between CKD comorbidities and metabolites, including novel associations of the plasma metabolite trimethylamine-N-oxide with cardiac arrhythmia and infarction in CKD stage 3 patients.


2020 ◽  
Vol 36 (11) ◽  
pp. 3393-3400 ◽  
Author(s):  
V Fortino ◽  
G Scala ◽  
D Greco

Abstract Motivation Omics technologies have the potential to facilitate the discovery of new biomarkers. However, only few omics-derived biomarkers have been successfully translated into clinical applications to date. Feature selection is a crucial step in this process that identifies small sets of features with high predictive power. Models consisting of a limited number of features are not only more robust in analytical terms, but also ensure cost effectiveness and clinical translatability of new biomarker panels. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems. Results Compared to existing methods, GARBO enables the identification of biomarker sets that best optimize the trade-off between classification accuracy and number of biomarkers. We tested GARBO and six alternative selection methods with two high relevant topics in precision medicine: cancer patient stratification and drug sensitivity prediction. We found multivariate biomarker models from different omics data types such as mRNA, miRNA, copy number variation, mutation and DNA methylation. The top performing models were evaluated by using two different strategies: the Pareto-based selection, and the weighted sum between accuracy and set size (w = 0.5). Pareto-based preferences show the ability of the proposed algorithm to search minimal subsets of relevant features that can be used to model accurate random forest-based classification systems. Moreover, GARBO systematically identified, on larger omics data types, such as gene expression and DNA methylation, biomarker panels exhibiting higher classification accuracy or employing a number of features much lower than those discovered with other methods. These results were confirmed on independent datasets. Availability and implementation github.com/Greco-Lab/GARBO. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Amrit Singh ◽  
Casey P. Shannon ◽  
Benoît Gautier ◽  
Florian Rohart ◽  
Michaël Vacher ◽  
...  

AbstractSystems biology approaches, leveraging multi-omics measurements, are needed to capture the complexity of biological networks while identifying the key molecular drivers of disease mechanisms. We present DIABLO, a novel integrative method to identify multi-omics biomarker panels that can discriminate between multiple phenotypic groups. In the multi-omics analyses of simulated and real-world datasets, DIABLO resulted in superior biological enrichment compared to other integrative methods, and achieved comparable predictive performance with existing multi-step classification schemes. DIABLO is a versatile approach that will benefit a diverse range of research areas, where multiple high dimensional datasets are available for the same set of specimens. DIABLO is implemented along with tools for model selection, and validation, as well as graphical outputs to assist in the interpretation of these integrative analyses (http://mixomics.org/).


2019 ◽  
Author(s):  
Soufiane Mourragui ◽  
Marco Loog ◽  
Marcel JT Reinders ◽  
Lodewyk FA Wessels

AbstractMotivationCell lines and patient-derived xenografts (PDX) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response, remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data is often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting.ResultsWe show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors.AvailabilityPRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE)[email protected] informationSupplementary data are available. online.


Author(s):  
So Yeon Kim ◽  
Eun Kyung Choe ◽  
Manu Shivakumar ◽  
Dokyoon Kim ◽  
Kyung-Ah Sohn

Abstract Motivation To better understand the molecular features of cancers, a comprehensive analysis using multi-omics data has been conducted. In addition, a pathway activity inference method has been developed to facilitate the integrative effects of multiple genes. In this respect, we have recently proposed a novel integrative pathway activity inference approach, iDRW and demonstrated the effectiveness of the method with respect to dichotomizing two survival groups. However, there were several limitations, such as a lack of generality. In this study, we designed a directed gene–gene graph using pathway information by assigning interactions between genes in multiple layers of networks. Results As a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. The proposed integrative approach achieved improved outcome prediction performances compared with a single genomic profile alone and other existing pathway activity inference methods. The integrative approach also identified common/cancer-specific candidate driver pathways as predictive prognostic features in urologic cancers. Furthermore, it provides better biological insights into the prioritized pathways and genes in an integrated view using a multi-layered gene–gene network. Our framework is not specifically designed for urologic cancers and can be generally applicable for various datasets. Availability and implementation iDRW is implemented as the R software package. The source codes are available at https://github.com/sykim122/iDRW. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Sarmistha Das ◽  
Indranil Mukhopadhyay

1AbstractMulti-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omics analyses. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.


Author(s):  
Antoine Bodein ◽  
Marie-Pier Scott-Boyer ◽  
Olivier Perin ◽  
Kim-Anh Lê Cao ◽  
Arnaud Droit

Abstract Motivation Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. Results We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. Availabilityand implementation timeOmics is available on Bioconductor and github.com/abodein/timeOmics. Supplementary information Supplementary data are available at Bioinformatics online.


Epigenomics ◽  
2021 ◽  
Author(s):  
Amy L Non

Aim: Social scientists have placed particularly high expectations on the study of epigenomics to explain how exposure to adverse social factors like poverty, child maltreatment and racism – particularly early in childhood – might contribute to complex diseases. However, progress has stalled, reflecting many of the same challenges faced in genomics, including overhype, lack of diversity in samples, limited replication and difficulty interpreting significance of findings. Materials & methods: This review focuses on the future of social epigenomics by discussing progress made, ongoing methodological and analytical challenges and suggestions for improvement. Results & conclusion: Recommendations include more diverse sample types, cross-cultural, longitudinal and multi-generational studies. True integration of social and epigenomic data will require increased access to both data types in publicly available databases, enhanced data integration frameworks, and more collaborative efforts between social scientists and geneticists.


Sign in / Sign up

Export Citation Format

Share Document