scholarly journals Aristotle: stratified causal discovery for omics data

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Mehrdad Mansouri ◽  
Sahand Khakabimamaghani ◽  
Leonid Chindelevitch ◽  
Martin Ester

Abstract Background There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. Methods To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. Results Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle’s predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.

Author(s):  
Peter D Karp ◽  
Peter E Midford ◽  
Richard Billington ◽  
Anamika Kothari ◽  
Markus Krummenacker ◽  
...  

Abstract Motivation Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. Results In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. Availability The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. Contact [email protected] Supplementary information Supplementary data are available at Briefings in Bioinformatics online.


Metabolites ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 407
Author(s):  
Marie Tremblay-Franco ◽  
Cécile Canlet ◽  
Philippe Pinton ◽  
Yannick Lippi ◽  
Roselyne Gautier ◽  
...  

The effects of low doses of toxicants are often subtle and information extracted from metabolomic data alone may not always be sufficient. As end products of enzymatic reactions, metabolites represent the final phenotypic expression of an organism and can also reflect gene expression changes caused by this exposure. Therefore, the integration of metabolomic and transcriptomic data could improve the extracted biological knowledge on these toxicants induced disruptions. In the present study, we applied statistical integration tools to metabolomic and transcriptomic data obtained from jejunal explants of pigs exposed to the food contaminant, deoxynivalenol (DON). Canonical correlation analysis (CCA) and self-organizing map (SOM) were compared for the identification of correlated transcriptomic and metabolomic features, and O2-PLS was used to model the relationship between exposure and selected features. The integration of both ‘omics data increased the number of discriminant metabolites discovered (39) by about 10 times compared to the analysis of the metabolomic dataset alone (3). Besides the disturbance of energy metabolism previously reported, assessing correlations between both functional levels revealed several other types of damage linked to the intestinal exposure to DON, including the alteration of protein synthesis, oxidative stress, and inflammasome activation. This confirms the added value of integration to enrich the biological knowledge extracted from metabolomics.


2021 ◽  
Vol 22 (6) ◽  
pp. 2822
Author(s):  
Efstathios Iason Vlachavas ◽  
Jonas Bohn ◽  
Frank Ückert ◽  
Sylvia Nürnberg

Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Kalpana Raja ◽  
Matthew Patrick ◽  
Yilin Gao ◽  
Desmond Madu ◽  
Yuyang Yang ◽  
...  

In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.


2021 ◽  
Author(s):  
Félix Raimundo ◽  
Laetitia Papaxanthos ◽  
Céline Vallot ◽  
Jean-Philippe Vert

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).


BMJ Open ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. e053674
Author(s):  
Enrico Glaab ◽  
Armin Rauschenberger ◽  
Rita Banzi ◽  
Chiara Gerardi ◽  
Paula Garcia ◽  
...  

ObjectiveTo review biomarker discovery studies using omics data for patient stratification which led to clinically validated FDA-cleared tests or laboratory developed tests, in order to identify common characteristics and derive recommendations for future biomarker projects.DesignScoping review.MethodsWe searched PubMed, EMBASE and Web of Science to obtain a comprehensive list of articles from the biomedical literature published between January 2000 and July 2021, describing clinically validated biomarker signatures for patient stratification, derived using statistical learning approaches. All documents were screened to retain only peer-reviewed research articles, review articles or opinion articles, covering supervised and unsupervised machine learning applications for omics-based patient stratification. Two reviewers independently confirmed the eligibility. Disagreements were solved by consensus. We focused the final analysis on omics-based biomarkers which achieved the highest level of validation, that is, clinical approval of the developed molecular signature as a laboratory developed test or FDA approved tests.ResultsOverall, 352 articles fulfilled the eligibility criteria. The analysis of validated biomarker signatures identified multiple common methodological and practical features that may explain the successful test development and guide future biomarker projects. These include study design choices to ensure sufficient statistical power for model building and external testing, suitable combinations of non-targeted and targeted measurement technologies, the integration of prior biological knowledge, strict filtering and inclusion/exclusion criteria, and the adequacy of statistical and machine learning methods for discovery and validation.ConclusionsWhile most clinically validated biomarker models derived from omics data have been developed for personalised oncology, first applications for non-cancer diseases show the potential of multivariate omics biomarker design for other complex disorders. Distinctive characteristics of prior success stories, such as early filtering and robust discovery approaches, continuous improvements in assay design and experimental measurement technology, and rigorous multicohort validation approaches, enable the derivation of specific recommendations for future studies.


2020 ◽  
Author(s):  
Valentina S. Klaus ◽  
Sonja C. Schriever ◽  
Andreas Peter ◽  
José Manuel Monroy Kuhn ◽  
Martin Irmler ◽  
...  

ABSTRACTThe steadily increasing amount of newly generated omics data of various types from genomics to metabolomics is a chance and a challenge to systems biology. To fully use its potential, one key is the meaningful integration of different types of omics. We here present a fully unsupervised and versatile correlation-based method, termed Correlation guided Network Integration (CoNI), to integrate multi-omics data into a hypergraph structure that allows for identification of effective regulators. Our approach further unravels single transcripts mapped to specific densely connected metabolic sub-graphs or pathways. By applying our method on transcriptomics and metabolomics data from murine livers under standard chow or high-fat-diet, we isolated eleven genes with a regulatory effect on hepatic metabolism. Subsequent in vitro and ex vivo experiments in human liver cells and human obtained liver biopsies validated seven candidates including INHBE and COBLL1, to alter lipid metabolism and to correlate with diabetes related traits such as overweight, hepatic fat content and insulin resistance (HOMA-IR). Last, we successfully applied our methods to an independent data-set to confirm its versatile and transferable character.


2019 ◽  
pp. 1-9 ◽  
Author(s):  
Yize Zhao ◽  
Changgee Chang ◽  
Qi Long

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.


2020 ◽  
Vol 19 (5-6) ◽  
pp. 364-376
Author(s):  
Vinay Randhawa ◽  
Shivalika Pathania

Abstract Prediction of biological interaction networks from single-omics data has been extensively implemented to understand various aspects of biological systems. However, more recently, there is a growing interest in integrating multi-omics datasets for the prediction of interactomes that provide a global view of biological systems with higher descriptive capability, as compared to single omics. In this review, we have discussed various computational approaches implemented to infer and analyze two of the most important and well studied interactomes: protein–protein interaction networks and gene co-expression networks. We have explicitly focused on recent methods and pipelines implemented to infer and extract biologically important information from these interactomes, starting from utilizing single-omics data and then progressing towards multi-omics data. Accordingly, recent examples and case studies are also briefly discussed. Overall, this review will provide a proper understanding of the latest developments in protein and gene network modelling and will also help in extracting practical knowledge from them.


PLoS Biology ◽  
2020 ◽  
Vol 18 (11) ◽  
pp. e3000999
Author(s):  
Simon Kasif ◽  
Richard J. Roberts

How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.


Sign in / Sign up

Export Citation Format

Share Document