scholarly journals struct: an R/Bioconductor-based framework for standardized metabolomics data analysis and beyond

Author(s):  
Gavin Rhys Lloyd ◽  
Andris Jankevics ◽  
Ralf J M Weber

Abstract Summary Implementing and combining methods from a diverse range of R/Bioconductor packages into ‘omics’ data analysis workflows represents a significant challenge in terms of standardization, readability and reproducibility. Here, we present an R/Bioconductor package, named struct (Statistics in R using Class-based Templates), which defines a suite of class-based templates that allows users to develop and implement highly standardized and readable statistical analysis workflows. Struct integrates with the STATistics Ontology to ensure consistent reporting and maximizes semantic interoperability. We also present a toolbox, named structToolbox, which includes an extensive set of commonly used data analysis methods that have been implemented using struct. This toolbox can be used to build data-analysis workflows for metabolomics and other omics technologies. Availability and implementation struct and structToolbox are implemented in R, and are freely available from Bioconductor (http://bioconductor.org/packages/struct and http://bioconductor.org/packages/structToolbox), including documentation and vignettes. Source code is available and maintained at https://github.com/computational-metabolomics.

Author(s):  
Miroslava Cuperlovic-Culf

Metabolomics or metababonomics is one of the major high throughput analysis methods that endeavors holistic measurement of metabolic profiles of biological systems. Data analysis approaches in metabolomics can broadly be divided into qualitative – analysis of spectral data and quantitative – analysis of individual metabolite concentrations. In this work, the author will demonstrate the benefits and limitations of different unsupervised analysis tools currently utilized in qualitative and quantitative metabolomics data analysis. Following a detailed literature review outlining different applications of unsupervised methods in metabolomics, the author shows examples of an application of the major previously utilized unsupervised analysis methods. The testing of these methods was performed using qualitative as well as corresponding quantitative metabolite data derived to represent a large set of 2,000 objects. Spectra of mixtures were obtained from different combinations of experimental NMR measurements of 13 prevalent metabolites at five different groups of concentrations representing different phenotypes. The analysis shows advantages and disadvantages of standard tools when applied specifically to metabolomics.


2020 ◽  
Author(s):  
Lauren M. McIntyre ◽  
Francisco Huertas ◽  
Olexander Moskalenko ◽  
Marta Llansola ◽  
Vicente Felipo ◽  
...  

AbstractGalaxy is a user-friendly platform with a strong development community and a rich set of tools for omics data analysis. While multi-omics experiments are becoming popular, tools for multi-omics data analysis are poorly represented in this platform. Here we present GAIT-GM, a set of new Galaxy tools for integrative analysis of gene expression and metabolomics data. In the Annotation Tool, features are mapped to KEGG pathway using a text mining approach to increase the number of mapped metabolites. Several interconnected databases are used to maximally map gene IDs across species. In the Integration Tool, changes in metabolite levels are modelled as a function of gene expression in a flexible manner. Both unbiased exploration of relationships between genes and metabolites and biologically informed models based on pathway data are enabled. The GAIT-GM tools are freely available at https://github.com/SECIMTools/gait-gm.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lu Li ◽  
Huub Hoefsloot ◽  
Albert A. de Graaf ◽  
Evrim Acar ◽  
Age K. Smilde

Abstract Background Analysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. To explore the performance of multiway data analysis methods in terms of revealing the underlying mechanisms in dynamic metabolomics data, simulated data with known ground truth can be studied. Results We focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth. Conclusion Our numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.


2005 ◽  
Vol 33 (6) ◽  
pp. 1427-1429 ◽  
Author(s):  
P. Mendes ◽  
D. Camacho ◽  
A. de la Fuente

The advent of large data sets, such as those produced in metabolomics, presents a considerable challenge in terms of their interpretation. Several mathematical and statistical methods have been proposed to analyse these data, and new ones continue to appear. However, these methods often disagree in their analyses, and their results are hard to interpret. A major contributing factor for the difficulties in interpreting these data lies in the data analysis methods themselves, which have not been thoroughly studied under controlled conditions. We have been producing synthetic data sets by simulation of realistic biochemical network models with the purpose of comparing data analysis methods. Because we have full knowledge of the underlying ‘biochemistry’ of these models, we are better able to judge how well the analyses reflect true knowledge about the system. Another advantage is that the level of noise in these data is under our control and this allows for studying how the inferences are degraded by noise. Using such a framework, we have studied the extent to which correlation analysis of metabolomics data sets is capable of recovering features of the biochemical system. We were able to identify four major metabolic regulatory configurations that result in strong metabolite correlations. This example demonstrates the utility of biochemical simulation in the analysis of metabolomics data.


Author(s):  
Peter D Karp ◽  
Peter E Midford ◽  
Richard Billington ◽  
Anamika Kothari ◽  
Markus Krummenacker ◽  
...  

Abstract Motivation Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. Results In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. Availability The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. Contact [email protected] Supplementary information Supplementary data are available at Briefings in Bioinformatics online.


Author(s):  
Chao Li ◽  
Zhenbo Gao ◽  
Benzhe Su ◽  
Guowang Xu ◽  
Xiaohui Lin

Sign in / Sign up

Export Citation Format

Share Document