Miso: an R package for multiple isotope labeling assisted metabolomics data analysis

Yonghui Dong; Liron Feldberg; Asaph Aharoni

doi:10.1093/bioinformatics/btz092

Miso: an R package for multiple isotope labeling assisted metabolomics data analysis

Bioinformatics ◽

10.1093/bioinformatics/btz092 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3524-3526 ◽

Cited By ~ 3

Author(s):

Yonghui Dong ◽

Liron Feldberg ◽

Asaph Aharoni

Keyword(s):

Data Analysis ◽

Isotope Labeling ◽

R Package ◽

Mass Spectrometry Data ◽

Data Matrix ◽

Supplementary Information ◽

Metabolomics Data ◽

Biological Studies ◽

Analysis Workflow ◽

Efficient Data

Abstract Motivation The use of stable isotope labeling is highly advantageous for structure elucidation in metabolomics studies. However, computational tools dealing with multiple-precursor-based labeling studies are still missing. Hence, we developed Miso, an R package providing automated and efficient data analysis workflow to detect the complete repertoire of labeled molecules from multiple-precursor-based labeling experiments. Results The capability of Miso is demonstrated by the analysis of liquid chromatography-mass spectrometry data obtained from duckweed plants fed with one unlabeled and two differently labeled tyrosine (unlabeled tyrosine, tyrosine-2H4 and tyrosine-13C915N1). The resulting data matrix generated by Miso contains sets of unlabeled and labeled ions with their retention time, m/z values and number of labeled atoms that can be directly utilized for database query and biological studies. Availability and implementation Miso is publicly available on the CRAN repository (https://cran.r-project.org/web/packages/Miso). A reproducible case study and a detailed tutorial are available from GitHub (https://github.com/YonghuiDong/Miso_example). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Interoperable and scalable data analysis with microservices: applications in metabolomics

Bioinformatics ◽

10.1093/bioinformatics/btz160 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3752-3760 ◽

Cited By ~ 10

Author(s):

Payam Emami Khoonsari ◽

Pablo Moreno ◽

Sven Bergmann ◽

Joachim Burman ◽

Marco Capuccini ◽

...

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Large Scale ◽

Scientific Discipline ◽

Supplementary Information ◽

Resonance Spectroscopy ◽

Research Environment ◽

Metabolomics Data ◽

Analysis Workflow ◽

Virtual Research Environment

Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

Bioinformatics ◽

10.1093/bioinformatics/btz248 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4356-4363 ◽

Cited By ~ 7

Author(s):

Gaëlle Lefort ◽

Laurence Liaubet ◽

Cécile Canlet ◽

Patrick Tardivel ◽

Marie-Christine Père ◽

...

Keyword(s):

Metabolic Pathways ◽

Nmr Spectra ◽

Complex Mixture ◽

R Package ◽

Statistical Analyses ◽

Supplementary Information ◽

Automatic Identification ◽

Analysis Workflow ◽

Expert Analysis ◽

New Biomarkers

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Advancements in capturing and mining mass spectrometry data are transforming natural products research

Natural Product Reports ◽

10.1039/d1np00040c ◽

2021 ◽

Author(s):

Scott A. Jarmusch ◽

Justin J. J. van der Hooft ◽

Pieter C. Dorrestein ◽

Alan K. Jarmusch

Keyword(s):

Mass Spectrometry ◽

Data Mining ◽

Natural Products ◽

Data Analysis ◽

Community Participation ◽

Mass Spectrometry Data ◽

Metabolomics Data ◽

Analysis Tools ◽

Public Data ◽

Potential Use

This review covers the current and potential use of mass spectrometry-based metabolomics data mining in natural products. Public data, metadata, databases and data analysis tools are critical. The value and success of data mining rely on community participation.

Download Full-text

An open-source high-content analysis workflow for CFTR function measurements using the forskolin-induced swelling assay

Bioinformatics ◽

10.1093/bioinformatics/btaa1073 ◽

2020 ◽

Author(s):

Marne C Hagemeijer ◽

Annelotte M Vonk ◽

Nikhil T Awatade ◽

Iris A L Silva ◽

Christian Tischer ◽

...

Keyword(s):

Content Analysis ◽

Statistical Analysis ◽

Open Source ◽

R Package ◽

Supplementary Information ◽

Image Quantification ◽

Quantification Method ◽

Analysis Workflow ◽

High Content Analysis ◽

Microscopy Images

Abstract Motivation The forskolin-induced swelling (FIS) assay has become the preferential assay to predict the efficacy of approved and investigational CFTR-modulating drugs for individuals with cystic fibrosis (CF). Currently, no standardized quantification method of FIS data exists thereby hampering inter-laboratory reproducibility. Results We developed a complete open-source workflow for standardized high-content analysis of CFTR function measurements in intestinal organoids using raw microscopy images as input. The workflow includes tools for (i) file and metadata handling; (ii) image quantification and (iii) statistical analysis. Our workflow reproduced results generated by published proprietary analysis protocols and enables standardized CFTR function measurements in CF organoids. Availability All workflow components are open-source and freely available: the htmrenamer R package for file handling https://github.com/hmbotelho/htmrenamer; CellProfiler and ImageJ analysis scripts/pipelines https://github.com/hmbotelho/FIS_image_analysis; the Organoid Analyst application for statistical analysis https://github.com/hmbotelho/organoid_analyst; detailed usage instructions and a demonstration dataset https://github.com/hmbotelho/FIS_analysis. Distributed under GPL v3.0. Supplementary information Supplementary information and a stepwise guide for software installation and data analysis for training purposes are available at Bioinformatics online.

Download Full-text

Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle

Bioinformatics ◽

10.1093/bioinformatics/btab629 ◽

2021 ◽

Author(s):

Tobias Tekath ◽

Martin Dugas

Keyword(s):

Single Cell ◽

Transcript Level ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Rna Seq ◽

Cell Type ◽

Gene Level ◽

Analysis Workflow ◽

Usage Analysis

Abstract Motivation Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers. Availability The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition

Bioinformatics ◽

10.1093/bioinformatics/btaa139 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3156-3161 ◽

Cited By ~ 9

Author(s):

Chong Chen ◽

Changjing Wu ◽

Linjie Wu ◽

Xiaochen Wang ◽

Minghua Deng ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Transcriptome Profiling ◽

R Package ◽

Supplementary Information ◽

Downstream Analysis

Abstract Motivation Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. Results In this article, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. Availability and implementation The R package scRMD is available at https://github.com/XiDsLab/scRMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

rCASC: reproducible Classification Analysis of Single Cell sequencing data

10.1101/430967 ◽

2018 ◽

Cited By ~ 1

Author(s):

Luca Alessandrì ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

Greta Romano ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

R Package ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Analysis Workflow ◽

User Friendly ◽

Bioinformatics Workflows

AbstractSummarySingle-cell RNA sequencing has emerged as an essential tool to investigate cellular heterogeneity, and highlighting cell sub-population specific signatures. Nowadays, dedicated and user-friendly bioinformatics workflows are required to exploit the deconvolution of single-cells transcriptome. Furthermore, there is a growing need of bioinformatics workflows granting both functional, i.e. saving information about data and analysis parameters, and computation reproducibility, i.e. storing the real image of the computation environment. Here, we present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility.Availability and ImplementationrCASC is part of the reproducible bioinfomatics project. rCASC is a docker based application controlled by a R package available at https://github.com/kendomaniac/rCASC.Supplementary informationSupplementary data are available at rCASC github

Download Full-text

Using expert driven machine learning to enhance dynamic metabolomics data analysis

10.1101/482224 ◽

2018 ◽

Author(s):

Charlie Beirnaert ◽

Laura Peeters ◽

Pieter Meysman ◽

Wout Bittremieux ◽

Kenn Foubert ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Expert Knowledge ◽

Sequence Data ◽

Ground Truth ◽

R Package ◽

Metabolomics Data ◽

Additional Information ◽

Shiny App ◽

Improved Performance

AbstractData analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. However, as datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficient. In addition, as the ground truth for metabolomics experiments is intrinsically unknown, there is no way to critically evaluate the performance of tools. Here, we investigate the problem of dynamic multi-class metabolomics experiments using a simulated dataset with a known ground truth and evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning, and compare it to EDGE, a statistical method for sequence data. This paper presents three novel outcomes. First we present a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, we show that the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs control metabolomics data. Last, we introduce the tinderesting method to analyse more complex dynamic metabolomics experiments that performs on par with statistical methods. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.

Download Full-text

New Benthic Cyanobacteria from Guadeloupe Mangroves as Producers of Antimicrobials

Marine Drugs ◽

10.3390/md18010016 ◽

2019 ◽

Vol 18 (1) ◽

pp. 16 ◽

Cited By ~ 1

Author(s):

Sébastien Duperron ◽

Mehdi A. Beniddir ◽

Sylvain Durand ◽

Arlette Longeon ◽

Charlotte Duval ◽

...

Keyword(s):

New Species ◽

Data Analysis ◽

Biological Activities ◽

Antimicrobial Activities ◽

Chemical Diversity ◽

Specialized Metabolites ◽

Benthic Cyanobacteria ◽

Biological Studies ◽

Analysis Workflow ◽

First Time

Benthic cyanobacteria strains from Guadeloupe have been investigated for the first time by combining phylogenetic, chemical and biological studies in order to better understand the taxonomic and chemical diversity as well as the biological activities of these cyanobacteria through the effect of their specialized metabolites. Therefore, in addition to the construction of the phylogenetic tree, indicating the presence of 12 potentially new species, an LC-MS/MS data analysis workflow was applied to provide an overview on chemical diversity of 20 cyanobacterial extracts, which was linked to antimicrobial activities evaluation against human pathogenic and ichtyopathogenic environmental strains.

Download Full-text