scholarly journals Kernel Conditional Embeddings for Associating Omic Data Types

Author(s):  
Ferran Reverter ◽  
Esteban Vegas ◽  
Josep M. Oller
Keyword(s):  
2020 ◽  
Author(s):  
Camden Jansen ◽  
Kitt D. Paraiso ◽  
Jeff J. Zhou ◽  
Ira L. Blitz ◽  
Margaret B. Fish ◽  
...  

SummaryMesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low-throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data comprised of more than two data types is challenging. Here, we use linked self-organizing maps to combine ChIP-seq/ATAC-seq with temporal, spatial and perturbation RNA-seq data from Xenopus tropicalis mesendoderm development to build a high resolution genome scale mechanistic GRN. We recovered both known and previously unsuspected TF-DNA/TF-TF interactions and validated through reporter assays. Our analysis provides new insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly-dimensional multi-omic data sets.HighlightsBuilt a generally applicable pipeline to creating GRNs using highly-dimensional multi-omic data setsPredicted new TF-DNA/TF-TF interactions during mesendoderm developmentGenerate the first genome scale GRN for vertebrate mesendoderm and expanded the core mesendodermal developmental network with high fidelityDeveloped a resource to visualize hundreds of RNA-seq and ChIP-seq data using 2D SOM metaclusters.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Shobana V. Stassen ◽  
Gwinky G. K. Yip ◽  
Kenneth K. Y. Wong ◽  
Joshua W. K. Ho ◽  
Kevin K. Tsia

AbstractInferring cellular trajectories using a variety of omic data is a critical task in single-cell data science. However, accurate prediction of cell fates, and thereby biologically meaningful discovery, is challenged by the sheer size of single-cell data, the diversity of omic data types, and the complexity of their topologies. We present VIA, a scalable trajectory inference algorithm that overcomes these limitations by using lazy-teleporting random walks to accurately reconstruct complex cellular trajectories beyond tree-like pathways (e.g., cyclic or disconnected structures). We show that VIA robustly and efficiently unravels the fine-grained sub-trajectories in a 1.3-million-cell transcriptomic mouse atlas without losing the global connectivity at such a high cell count. We further apply VIA to discovering elusive lineages and less populous cell fates missed by other methods across a variety of data types, including single-cell proteomic, epigenomic, multi-omics datasets, and a new in-house single-cell morphological dataset.


2021 ◽  
Author(s):  
Shobana V. Stassen ◽  
Gwinky G. K. Yip ◽  
Kenneth K. Y. Wong ◽  
Joshua W. K. Ho ◽  
Kevin K. Tsia

AbstractInferring cellular trajectories using a variety of omic data is a critical task in single-cell data science. However, accurate prediction of cell fates, and thereby biologically meaningful discovery, is challenged by the sheer size of single-cell data, the diversity of omic data types, and the complexity of their topologies. We present VIA, a scalable trajectory inference algorithm that overcomes these limitations by using lazy-teleporting random walks to accurately reconstruct complex cellular trajectories beyond tree-like pathways (e.g. cyclic or disconnected structures). We show that VIA robustly and efficiently unravels the fine-grained sub-trajectories in a 1.3-million-cell transcriptomic mouse atlas without losing the global connectivity at such a high cell count. We further apply VIA to discovering elusive lineages and less populous cell fates missed by other methods across a variety of data types, including single-cell proteomic, epigenomic, multi-omics datasets, and a new in-house single-cell morphological dataset.


2018 ◽  
Author(s):  
Nimrod Rappoport ◽  
Ron Shamir

AbstractHigh throughput experimental methods developed in recent years have been used to collect large biomedical omics datasets. Clustering of such datasets has proven invaluable for biological and medical research, and helped reveal structure in data from several domains. Such analysis is often based on investigation of a single omic. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for multi-omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types.In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic benchmark comparison of leading multi-omics and multiview clustering algorithms. The results highlight several key questions regarding the use of single-vs. multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the rapidly increasing use of multi-omics data, these issues may be important for future progress in the field.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e16063-e16063
Author(s):  
Silvia von der Heyde ◽  
Margarita Krawczyk ◽  
Julia Bischof ◽  
Thomas Corwin ◽  
Peter Frommolt ◽  
...  

e16063 Background: Cancer is a highly heterogeneous disease, both intra- and inter-individually consisting of complex phenotypes and systems biology. Although genomic data has contributed greatly towards the identification of cancer-specific mutations and the progress of precision medicine, genomic alterations are only one of several important biological drivers of cancer. Furthermore, single-layer omics represent only a small piece of the cancer biology puzzle and provide only partial clues to connecting genotype with clinically relevant phenotypic data. A more integrated approach is urgently needed to unravel the underpinnings of molecular signatures and the phenotypic manifestation of cancer hallmarks. Methods: Here we characterize a colorectal cancer (CRC) cohort of 500 patients across multiple distinct omic data types. Across this CRC cohort, we defined clinically relevant whole genome sequencing based metrics such as micro-satellite-instability (MSI) status, and furthermore investigate gene expression at the transcript level using RNA-Seq, as well as at the proteomic level using tandem mass spectrometry. We further characterized a subgroup of 100 of these patients through 16s rRNA sequencing to identify associated microbiome profiles. Results: We combined these analyses with comprehensive clinical data to observe the impact of ascertained molecular signatures on the CRC patient cohort. Here, we report how patient survival correlates both with specific molecular events across individual omic data types, as well as with combined multi-omic analyses. Conclusions: This project highlights the utility of integrating multiple distinct data types to obtain a more comprehensive overview of the molecular mechanisms underpinning colo-rectal cancer. Furthermore, through combining identified aberrant molecular mechanisms with clinical reports, multi-omic data can be prioritized through their impact on patient cohort survival.


Metabolites ◽  
2019 ◽  
Vol 9 (6) ◽  
pp. 117 ◽  
Author(s):  
Su Chu ◽  
Mengna Huang ◽  
Rachel Kelly ◽  
Elisa Benedetti ◽  
Jalal Siddiqui ◽  
...  

It is not controversial that study design considerations and challenges must be addressed when investigating the linkage between single omic measurements and human phenotypes. It follows that such considerations are just as critical, if not more so, in the context of multi-omic studies. In this review, we discuss (1) epidemiologic principles of study design, including selection of biospecimen source(s) and the implications of the timing of sample collection, in the context of a multi-omic investigation, and (2) the strengths and limitations of various techniques of data integration across multi-omic data types that may arise in population-based studies utilizing metabolomic data.


2018 ◽  
Author(s):  
Andrew E Teschendorff ◽  
Jing Han ◽  
Dirk S Paul ◽  
Joni Virta ◽  
Klaus Nordhausen

AbstractThere is an increased need for integrative analyses of multi-omic data. Although several algorithms for analysing multi-omic data exist, no study has yet performed a detailed comparison of these methods in biologically relevant contexts. Here we benchmark a novel tensorial independent component analysis (tICA) algorithm against current state-of-the-art methods. Using simulated and real multi-omic data, we find that tICA outperforms established methods in identifying biological sources of data variation at a significantly reduced computational cost. Using two independent multi cell-type EWAS, we further demonstrate how tICA can identify, in the absence of genotype information, mQTLs at a higher sensitivity than competing multi-way algorithms. We validate mQTLs found with tICA in an independent set, and demonstrate that approximately 75% of mQTLs are independent of blood cell subtype. In an application to multi-omic cancer data, tICA identifies many gene modules whose expression variation across tumors is driven by copy number or DNA methylation changes, but whose deregulation relative to the normal state is independent such alterations, an important finding that we confirm by direct analysis of individual data types. In summary, tICA is a powerful novel algorithm for decomposing multi-omic data, which will be of great value to the research community.


2020 ◽  
Vol 29 (10) ◽  
pp. 2851-2864
Author(s):  
Manuel Ugidos ◽  
Sonia Tarazona ◽  
José M Prats-Montalbán ◽  
Alberto Ferrer ◽  
Ana Conesa

Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform—i.e. gene expression— is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.


2018 ◽  
Author(s):  
Prathiba Natesan ◽  
Smita Mehta

Single case experimental designs (SCEDs) have become an indispensable methodology where randomized control trials may be impossible or even inappropriate. However, the nature of SCED data presents challenges for both visual and statistical analyses. Small sample sizes, autocorrelations, data types, and design types render many parametric statistical analyses and maximum likelihood approaches ineffective. The presence of autocorrelation decreases interrater reliability in visual analysis. The purpose of the present study is to demonstrate a newly developed model called the Bayesian unknown change-point (BUCP) model which overcomes all the above-mentioned data analytic challenges. This is the first study to formulate and demonstrate rate ratio effect size for autocorrelated data, which has remained an open question in SCED research until now. This expository study also compares and contrasts the results from BUCP model with visual analysis, and rate ratio effect size with nonoverlap of all pairs (NAP) effect size. Data from a comprehensive behavioral intervention are used for the demonstration.


Sign in / Sign up

Export Citation Format

Share Document