ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

Gaëlle Lefort; Laurence Liaubet; Cécile Canlet; Patrick Tardivel; Marie-Christine Père; Hélène Quesnel; Alain Paris; Nathalie Iannuccelli; Nathalie Vialaneix; Rémi Servien

doi:10.1093/bioinformatics/btz248

ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

Bioinformatics ◽

10.1093/bioinformatics/btz248 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4356-4363 ◽

Cited By ~ 7

Author(s):

Gaëlle Lefort ◽

Laurence Liaubet ◽

Cécile Canlet ◽

Patrick Tardivel ◽

Marie-Christine Père ◽

...

Keyword(s):

Metabolic Pathways ◽

Nmr Spectra ◽

Complex Mixture ◽

R Package ◽

Statistical Analyses ◽

Supplementary Information ◽

Automatic Identification ◽

Analysis Workflow ◽

Expert Analysis ◽

New Biomarkers

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

10.1101/407924 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gaëlle Lefort ◽

Laurence Liaubet ◽

Cécile Canlet ◽

Patrick Tardivel ◽

Marie-Christine Pére ◽

...

Keyword(s):

Metabolic Pathways ◽

Nmr Spectra ◽

Complex Mixture ◽

R Package ◽

Statistical Analyses ◽

Automatic Identification ◽

Factors Associated ◽

Analysis Workflow ◽

Expert Analysis ◽

New Biomarkers

AbstractIn metabolomics, the detection of new biomarkers from NMR spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses.We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independant datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketting. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the conditions.This workflow is available through the R package ASICS, available on the Bioconductor platform.

Download Full-text

An open-source high-content analysis workflow for CFTR function measurements using the forskolin-induced swelling assay

Bioinformatics ◽

10.1093/bioinformatics/btaa1073 ◽

2020 ◽

Author(s):

Marne C Hagemeijer ◽

Annelotte M Vonk ◽

Nikhil T Awatade ◽

Iris A L Silva ◽

Christian Tischer ◽

...

Keyword(s):

Content Analysis ◽

Statistical Analysis ◽

Open Source ◽

R Package ◽

Supplementary Information ◽

Image Quantification ◽

Quantification Method ◽

Analysis Workflow ◽

High Content Analysis ◽

Microscopy Images

Abstract Motivation The forskolin-induced swelling (FIS) assay has become the preferential assay to predict the efficacy of approved and investigational CFTR-modulating drugs for individuals with cystic fibrosis (CF). Currently, no standardized quantification method of FIS data exists thereby hampering inter-laboratory reproducibility. Results We developed a complete open-source workflow for standardized high-content analysis of CFTR function measurements in intestinal organoids using raw microscopy images as input. The workflow includes tools for (i) file and metadata handling; (ii) image quantification and (iii) statistical analysis. Our workflow reproduced results generated by published proprietary analysis protocols and enables standardized CFTR function measurements in CF organoids. Availability All workflow components are open-source and freely available: the htmrenamer R package for file handling https://github.com/hmbotelho/htmrenamer; CellProfiler and ImageJ analysis scripts/pipelines https://github.com/hmbotelho/FIS_image_analysis; the Organoid Analyst application for statistical analysis https://github.com/hmbotelho/organoid_analyst; detailed usage instructions and a demonstration dataset https://github.com/hmbotelho/FIS_analysis. Distributed under GPL v3.0. Supplementary information Supplementary information and a stepwise guide for software installation and data analysis for training purposes are available at Bioinformatics online.

Download Full-text

Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle

Bioinformatics ◽

10.1093/bioinformatics/btab629 ◽

2021 ◽

Author(s):

Tobias Tekath ◽

Martin Dugas

Keyword(s):

Single Cell ◽

Transcript Level ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Rna Seq ◽

Cell Type ◽

Gene Level ◽

Analysis Workflow ◽

Usage Analysis

Abstract Motivation Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers. Availability The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Miso: an R package for multiple isotope labeling assisted metabolomics data analysis

Bioinformatics ◽

10.1093/bioinformatics/btz092 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3524-3526 ◽

Cited By ~ 3

Author(s):

Yonghui Dong ◽

Liron Feldberg ◽

Asaph Aharoni

Keyword(s):

Data Analysis ◽

Isotope Labeling ◽

R Package ◽

Mass Spectrometry Data ◽

Data Matrix ◽

Supplementary Information ◽

Metabolomics Data ◽

Biological Studies ◽

Analysis Workflow ◽

Efficient Data

Abstract Motivation The use of stable isotope labeling is highly advantageous for structure elucidation in metabolomics studies. However, computational tools dealing with multiple-precursor-based labeling studies are still missing. Hence, we developed Miso, an R package providing automated and efficient data analysis workflow to detect the complete repertoire of labeled molecules from multiple-precursor-based labeling experiments. Results The capability of Miso is demonstrated by the analysis of liquid chromatography-mass spectrometry data obtained from duckweed plants fed with one unlabeled and two differently labeled tyrosine (unlabeled tyrosine, tyrosine-2H4 and tyrosine-13C915N1). The resulting data matrix generated by Miso contains sets of unlabeled and labeled ions with their retention time, m/z values and number of labeled atoms that can be directly utilized for database query and biological studies. Availability and implementation Miso is publicly available on the CRAN repository (https://cran.r-project.org/web/packages/Miso). A reproducible case study and a detailed tutorial are available from GitHub (https://github.com/YonghuiDong/Miso_example). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

rCASC: reproducible Classification Analysis of Single Cell sequencing data

10.1101/430967 ◽

2018 ◽

Cited By ~ 1

Author(s):

Luca Alessandrì ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

Greta Romano ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

R Package ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Analysis Workflow ◽

User Friendly ◽

Bioinformatics Workflows

AbstractSummarySingle-cell RNA sequencing has emerged as an essential tool to investigate cellular heterogeneity, and highlighting cell sub-population specific signatures. Nowadays, dedicated and user-friendly bioinformatics workflows are required to exploit the deconvolution of single-cells transcriptome. Furthermore, there is a growing need of bioinformatics workflows granting both functional, i.e. saving information about data and analysis parameters, and computation reproducibility, i.e. storing the real image of the computation environment. Here, we present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility.Availability and ImplementationrCASC is part of the reproducible bioinfomatics project. rCASC is a docker based application controlled by a R package available at https://github.com/kendomaniac/rCASC.Supplementary informationSupplementary data are available at rCASC github

Download Full-text

Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa198 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4291-4295

Author(s):

Philipp Angerer ◽

David S Fischer ◽

Fabian J Theis ◽

Antonio Scialdone ◽

Carsten Marr

Keyword(s):

Single Cell ◽

Principal Component ◽

R Package ◽

Ease Of Use ◽

Supplementary Information ◽

Automatic Identification ◽

Biological Processes ◽

Rna Seq ◽

Sequencing Data ◽

Low Dimensional

Abstract Motivation Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single-cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it difficult to characterize the underlying biological processes. Results In this article, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined sub-region. We apply our method to single-cell RNA-seq datasets from different experimental protocols and to different low-dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes. Availability and implementation To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Inferring perturbation profiles of cancer samples

Bioinformatics ◽

10.1093/bioinformatics/btab113 ◽

2021 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Indirect Evidence ◽

R Package ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Patient Specific ◽

Driver Genes ◽

Cancer Driver ◽

Molecular Alterations ◽

Incomplete Coverage ◽

Gene Perturbations

Abstract Motivation Cancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g. when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations. Results We have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens and breast cancer samples from The Cancer Genome Atlas. Availability and implementation The method is available as the R-package nempi at https://github.com/cbg-ethz/nempi and http://bioconductor.org/packages/nempi. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text