scholarly journals Evaluation and comparison of multi-omics data integration methods for cancer subtyping

2021 ◽  
Vol 17 (8) ◽  
pp. e1009224
Author(s):  
Ran Duan ◽  
Lin Gao ◽  
Yong Gao ◽  
Yuxuan Hu ◽  
Han Xu ◽  
...  

Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.

Author(s):  
Takoua Jendoubi

Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the genome, the transcriptome and the proteome, but also as the closest layer to the phenome. The combination of metabolomics data with the information available from genomics, transcriptomics, and proteomics offers unprecedented possibilities to enhance current understanding of biological functions, elucidate their underlying mechanisms and uncover hidden associations between omics variables. As a result, a vast array of computational tools have been developed to assist with integrative analysis of metabolomics data with different omics. Here, we review and propose five criteria – hypothesis, data types, strategies, study design and study focus – to classify statistical multi-omics data integration approaches into state-of-the-art classes under which all existing statistical methods fall. The purpose of this review is to look at various aspects that lead the choice of the statistical integrative analysis pipeline in terms of the different classes. We will draw a particular attention to metabolomics and genomics data to assist those new to this field in the choice of the integrative analysis pipeline.


Metabolites ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 184
Author(s):  
Takoua Jendoubi

Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the genome, the transcriptome and the proteome, but also as the closest layer to the phenome. The combination of metabolomics data with the information available from genomics, transcriptomics, and proteomics offers unprecedented possibilities to enhance current understanding of biological functions, elucidate their underlying mechanisms and uncover hidden associations between omics variables. As a result, a vast array of computational tools have been developed to assist with integrative analysis of metabolomics data with different omics. Here, we review and propose five criteria—hypothesis, data types, strategies, study design and study focus— to classify statistical multi-omics data integration approaches into state-of-the-art classes under which all existing statistical methods fall. The purpose of this review is to look at various aspects that lead the choice of the statistical integrative analysis pipeline in terms of the different classes. We will draw particular attention to metabolomics and genomics data to assist those new to this field in the choice of the integrative analysis pipeline.


2020 ◽  
Author(s):  
Sarmistha Das ◽  
Indranil Mukhopadhyay

1AbstractMulti-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omics analyses. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sarmistha Das ◽  
Indranil Mukhopadhyay

AbstractMulti-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case–control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.


2010 ◽  
Vol 2010 ◽  
pp. 1-19 ◽  
Author(s):  
Chuming Chen ◽  
Peter B. McGarvey ◽  
Hongzhan Huang ◽  
Cathy H. Wu

High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.


The theory of the vibrations of the pianoforte string put forward by Kaufmann in a well-known paper has figured prominently in recent discussions on the acoustics of this instrument. It proceeds on lines radically different from those adopted by Helmholtz in his classical treatment of the subject. While recognising that the elasticity of the pianoforte hammer is not a negligible factor, Kaufmann set out to simplify the mathematical analysis by ignoring its effect altogether, and treating the hammer as a particle possessing only inertia without spring. The motion of the string following the impact of the hammer is found from the initial conditions and from the functional solutions of the equation of wave-propagation on the string. On this basis he gave a rigorous treatment of two cases: (1) a particle impinging on a stretched string of infinite length, and (2) a particle impinging on the centre of a finite string, neither of which cases is of much interest from an acoustical point of view. The case of practical importance treated by him is that in which a particle impinges on the string near one end. For this case, he gave only an approximate theory from which the duration of contact, the motion of the point struck, and the form of the vibration-curves for various points of the string could be found. There can be no doubt of the importance of Kaufmann’s work, and it naturally becomes necessary to extend and revise his theory in various directions. In several respects, the theory awaits fuller development, especially as regards the harmonic analysis of the modes of vibration set up by impact, and the detailed discussion of the influence of the elasticity of the hammer and of varying velocities of impact. Apart from these points, the question arises whether the approximate method used by Kaufmann is sufficiently accurate for practical purposes, and whether it may be regarded as applicable when, as in the pianoforte, the point struck is distant one-eighth or one-ninth of the length of the string from one end. Kaufmann’s treatment is practically based on the assumption that the part of the string between the end and the point struck remains straight as long as the hammer and string remain in contact. Primâ facie , it is clear that this assumption would introduce error when the part of the string under reference is an appreciable fraction of the whole. For the effect of the impact would obviously be to excite the vibrations of this portion of the string, which continue so long as the hammer is in contact, and would also influence the mode of vibration of the string as a whole when the hammer loses contact. A mathematical theory which is not subject to this error, and which is applicable for any position of the striking point, thus seems called for.


Sign in / Sign up

Export Citation Format

Share Document