A compositional mediation model for a binary outcome: Application to microbiome studies

Binary Outcomes ◽

Mediation Model ◽

Probit Regression ◽

Mediation Effects ◽

Microbiome Research ◽

Zero Sum ◽

Abstract Motivation The delicate balance of the microbiome is implicated in our health and is shaped by external factors, such as diet and xenobiotics. Therefore, understanding the role of the microbiome in linking external factors and our health conditions is crucial to translate microbiome research into therapeutic and preventative applications. Results We introduced a sparse compositional mediation model for binary outcomes to estimate and test the mediation effects of the microbiome utilizing the compositional algebra defined in the simplex space and a linear zero-sum constraint on probit regression coefficients. For this model with the standard causal assumptions, we showed that both the causal direct and indirect effects are identifiable. We further developed a method for sensitivity analysis for the assumption of the no unmeasured confounding effects between the mediator and the outcome. We conducted extensive simulation studies to assess the performance of the proposed method and applied it to real microbiome data to study mediation effects of the microbiome on linking fat intake to overweight/obesity. Availability and implementation An R package can be downloaded from https://github.com/mbsohn/cmmb. Supplementary information Supplementary files are available at Bioinformatics online.

BiomeHorizon: visualizing microbiome time series data in R

10.1101/2021.08.29.458140 ◽

2021 ◽

Author(s):

Isaac Fink ◽

Richard J. Abdill ◽

Ran Blekhman ◽

Laura Grieneisen

Keyword(s):

Time Series ◽

Open Source ◽

Time Series Data ◽

R Package ◽

Series Data ◽

Link Type ◽

Microbiome Research ◽

Microbiome Data ◽

Over Time

AbstractSummaryA key aspect of microbiome research is analysis of longitudinal dynamics using time series data. A method to visualize both the proportional and absolute change in the abundance of multiple taxa across multiple subjects over time is needed. We developed BiomeHorizon, an open-source R package that visualizes longitudinal compositional microbiome data using horizon plots.Availability and ImplementationBiomeHorizon is available at https://github.com/blekhmanlab/biomehorizon/ and released under the MIT license. A guide with step-by-step instructions for using the package is provided at https://blekhmanlab.github.io/biomehorizon/. The guide also provides code to reproduce all plots in this [email protected], [email protected], [email protected] informationNone

Compositional Mediation Analysis for Microbiome Studies

10.1101/149419 ◽

2017 ◽

Cited By ~ 3

Author(s):

Michael B. Sohn ◽

Hongzhe Li

Keyword(s):

Mediation Analysis ◽

Compositional Data ◽

Fat Intake ◽

Simulation Studies ◽

Mediation Model ◽

Mediation Effects ◽

Microbiome Composition ◽

Causal Mediation Analysis ◽

Simplex Space ◽

AbstractMotivated by recent advances in causal mediation analysis and problems in the analysis of microbiome data, we consider the setting where the effect of a treatment on an outcome is transmitted through perturbing the microbial communities or compositional mediators. Compositional and high-dimensional nature of such mediators makes the standard mediation analysis not directly applicable to our setting. We propose a sparse compositional mediation model that can be used to estimate the causal direct and indirect (or mediation) effects utilizing the algebra for compositional data in the simplex space. We also propose tests of total and component-wise mediation effects using bootstrap. We conduct extensive simulation studies to assess the performance of the proposed method and apply the method to a real metagenomic dataset to investigate the effect of fat intake on body mass index mediated through the gut microbiome composition.

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data

Bioinformatics ◽

10.1093/bioinformatics/btz565 ◽

2019 ◽

Author(s):

Chan Wang ◽

Jiyuan Hu ◽

Martin J Blaser ◽

Huilin Li

Keyword(s):

Regression Model ◽

Association Studies ◽

Statistical Significance ◽

Mediation Effect ◽

High Dimensional ◽

Model Framework ◽

Mediation Effects ◽

Causal Mediation ◽

Abstract Motivation Recent microbiome association studies have revealed important associations between microbiome and disease/health status. Such findings encourage scientists to dive deeper to uncover the causal role of microbiome in the underlying biological mechanism, and have led to applying statistical models to quantify causal microbiome effects and to identify the specific microbial agents. However, there are no existing causal mediation methods specifically designed to handle high dimensional and compositional microbiome data. Results We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing. Finally, we showcase the utility of the proposed SparseMCMM method in a study which the murine microbiome has been manipulated by providing a clear and sensible causal path among antibiotic treatment, microbiome composition and mouse weight. Availability and implementation https://sites.google.com/site/huilinli09/software and https://github.com/chanw0/SparseMCMM. Supplementary information Supplementary data are available at Bioinformatics online.

Robust logistic zero-sum regression for microbiome compositional data

Advances in Data Analysis and Classification ◽

10.1007/s11634-021-00465-4 ◽

2021 ◽

Author(s):

G. S. Monti ◽

P. Filzmoser

Keyword(s):

Compositional Data ◽

R Package ◽

High Dimensional ◽

Simulation Studies ◽

Contrast Model ◽

Sparse Logistic Regression ◽

Regression Estimators ◽

The Stability ◽

Zero Sum ◽

AbstractWe introduce the Robust Logistic Zero-Sum Regression (RobLZS) estimator, which can be used for a two-class problem with high-dimensional compositional covariates. Since the log-contrast model is employed, the estimator is able to do feature selection among the compositional parts. The proposed method attains robustness by minimizing a trimmed sum of deviances. A comparison of the performance of the RobLZS estimator with a non-robust counterpart and with other sparse logistic regression estimators is conducted via Monte Carlo simulation studies. Two microbiome data applications are considered to investigate the stability of the estimators to the presence of outliers. Robust Logistic Zero-Sum Regression is available as an R package that can be downloaded at https://github.com/giannamonti/RobZS.

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

Bioinformatics ◽

10.1093/bioinformatics/btz248 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4356-4363 ◽

Cited By ~ 7

Author(s):

Gaëlle Lefort ◽

Laurence Liaubet ◽

Cécile Canlet ◽

Patrick Tardivel ◽

Marie-Christine Père ◽

...

Keyword(s):

Metabolic Pathways ◽

Nmr Spectra ◽

Complex Mixture ◽

R Package ◽

Statistical Analyses ◽

Automatic Identification ◽

Analysis Workflow ◽

Expert Analysis ◽

New Biomarkers

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases

Nature Communications ◽

10.1038/s41467-021-23821-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Verónica Lloréns-Rico ◽

Sara Vieira-Silva ◽

Pedro J. Gonçalves ◽

Gwen Falony ◽

Jeroen Raes

Keyword(s):

Microbial Communities ◽

Quantitative Methods ◽

Microbial Load ◽

Metagenomic Sequencing ◽

Sampling Depth ◽

Microbiome Research ◽

Microbiome Data ◽

The Impact ◽

Analytical Approaches ◽

Quantitative Approaches

AbstractWhile metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains challenging due to the sparsity and compositionality of sequence matrices. Here, we evaluate both computational and experimental approaches proposed to mitigate the impact of these outstanding issues. Generating fecal metagenomes drawn from simulated microbial communities, we benchmark the performance of thirteen commonly used analytical approaches in terms of diversity estimation, identification of taxon-taxon associations, and assessment of taxon-metadata correlations under the challenge of varying microbial ecosystem loads. We find quantitative approaches including experimental procedures to incorporate microbial load variation in downstream analyses to perform significantly better than computational strategies designed to mitigate data compositionality and sparsity, not only improving the identification of true positive associations, but also reducing false positive detection. When analyzing simulated scenarios of low microbial load dysbiosis as observed in inflammatory pathologies, quantitative methods correcting for sampling depth show higher precision compared to uncorrected scaling. Overall, our findings advocate for a wider adoption of experimental quantitative approaches in microbiome research, yet also suggest preferred transformations for specific cases where determination of microbial load of samples is not feasible.