MiRKAT: kernel machine regression-based global association tests for the microbiome

Bioinformatics ◽

10.1093/bioinformatics/btaa951 ◽

2020 ◽

Author(s):

Nehemiah Wilson ◽

Ni Zhao ◽

Xiang Zhan ◽

Hyunwook Koh ◽

Weijia Fu ◽

...

Keyword(s):

R Package ◽

Effect Sizes ◽

Supplementary Information ◽

Time To Event ◽

Kernel Machine ◽

Association Testing ◽

Higher Power ◽

Kernel Machine Regression ◽

Two Measures ◽

Rv Coefficient

Abstract Summary Distance-based tests of microbiome beta diversity are an integral part of many microbiome analyses. MiRKAT enables distance-based association testing with a wide variety of outcome types, including continuous, binary, censored time-to-event, multivariate, correlated and high-dimensional outcomes. Omnibus tests allow simultaneous consideration of multiple distance and dissimilarity measures, providing higher power across a range of simulation scenarios. Two measures of effect size, a modified R-squared coefficient and a kernel RV coefficient, are incorporated to allow comparison of effect sizes across multiple kernels. Availability and implementation MiRKAT is available on CRAN as an R package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

iTOP: Inferring the Topology of Omics Data

10.1101/293993 ◽

2018 ◽

Author(s):

Nanne Aben ◽

Johan A. Westerhuis ◽

Yipeng Song ◽

Henk A.L. Kiers ◽

Magali Michaut ◽

...

Keyword(s):

Gene Expression ◽

Binary Data ◽

Drug Response ◽

Response Prediction ◽

R Package ◽

Supplementary Information ◽

Omics Data ◽

Reconstruction Algorithms ◽

Phenotypic Data ◽

Rv Coefficient

AbstractMotivationIn biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets.ResultsWe present iTOP, a methodology to infera topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics.AvailabilityAn implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary [email protected] and [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

An empirical Bayesian ranking method, with applications to high throughput biology

Bioinformatics ◽

10.1093/bioinformatics/btz471 ◽

2019 ◽

Vol 36 (1) ◽

pp. 177-185

Author(s):

John Ferguson ◽

Joseph Chang

Keyword(s):

Empirical Bayes ◽

Final Analysis ◽

R Package ◽

Effect Sizes ◽

Supplementary Information ◽

P Value ◽

Ranking Algorithm ◽

Computationally Efficient ◽

P Values ◽

Bayesian Ranking

Abstract Motivation In bioinformatics, genome-wide experiments look for important biological differences between two groups at a large number of locations in the genome. Often, the final analysis focuses on a P-value-based ranking of locations which might then be investigated further in follow-up experiments. However, this strategy may result in small effect sizes, with low P-values, being ranked more favorably than larger more scientifically important effects. Bayesian ranking techniques may offer a solution to this problem provided a good prior distribution for the collective distribution of effect sizes is available. Results We develop an Empirical Bayes ranking algorithm, using the marginal distribution of the data over all locations to estimate an appropriate prior. In simulations and analysis using real datasets, we demonstrate favorable performance compared to ordering P-values and a number of other competing ranking methods. The algorithm is computationally efficient and can be used to rank the entirety of genomic locations or to rank a subset of locations, pre-selected via traditional FWER/FDR methods in a 2-stage analysis. Availability and implementation An R-package, EBrank, implementing the ranking algorithm is available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing

Genetic Epidemiology ◽

10.1002/gepi.20567 ◽

2011 ◽

pp. n/a-n/a ◽

Cited By ~ 1

Author(s):

Wei Pan

Keyword(s):

Genomic Distance ◽

Kernel Machine ◽

Association Testing ◽

Kernel Machine Regression

Download Full-text

The Orchard Plot: Cultivating a Forest Plot for Use in Ecology, Evolution and Beyond

10.32942/osf.io/epqa7 ◽

2019 ◽

Author(s):

Shinichi Nakagawa ◽

Malgorzata Lagisz ◽

Rose E O'Dea ◽

Joanna Rutkowska ◽

Yefeng Yang ◽

...

Keyword(s):

Meta Analysis ◽

R Package ◽

Effect Sizes ◽

Forest Plot ◽

Point Estimates ◽

Aggregate Effect ◽

The Individual ◽

Meta Analyses ◽

Heterogeneous Effect ◽

Intuitive Interpretation

‘Classic’ forest plots show the effect sizes from individual studies and the aggregate effect from a meta-analysis. However, in ecology and evolution meta-analyses routinely contain over 100 effect sizes, making the classic forest plot of limited use. We surveyed 102 meta-analyses in ecology and evolution, finding that only 11% use the classic forest plot. Instead, most used a ‘forest-like plot’, showing point estimates (with 95% confidence intervals; CIs) from a series of subgroups or categories in a meta-regression. We propose a modification of the forest-like plot, which we name the ‘orchard plot’. Orchard plots, in addition to showing overall mean effects and CIs from meta-analyses/regressions, also includes 95% prediction intervals (PIs), and the individual effect sizes scaled by their precision. The PI allows the user and reader to see the range in which an effect size from a future study may be expected to fall. The PI, therefore, provides an intuitive interpretation of any heterogeneity in the data. Supplementing the PI, the inclusion of underlying effect sizes also allows the user to see any influential or outlying effect sizes. We showcase the orchard plot with example datasets from ecology and evolution, using the R package, orchard, including several functions for visualizing meta-analytic data using forest-plot derivatives. We consider the orchard plot as a variant on the classic forest plot, cultivated to the needs of meta-analysts in ecology and evolution. Hopefully, the orchard plot will prove fruitful for visualizing large collections of heterogeneous effect sizes regardless of the field of study.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Environmental contaminant body burdens and the relationship with blood pressure measures among Indigenous adolescents using Bayesian Kernel Machine Regression: Results from the Nituuchischaayihtitaau Aschii: Multi-Community Environment-and-Health Study in Eeyou Istchee, Quebec, Canada, 2005-2009

Environmental Advances ◽

10.1016/j.envadv.2021.100048 ◽

2021 ◽

pp. 100048

Author(s):

Aleksandra M. Zuk ◽

Eric N. Liberda ◽

Leonard J.S. Tsuji

Keyword(s):

Blood Pressure ◽

Environmental Contaminant ◽

Health Study ◽

Community Environment ◽

Kernel Machine ◽

Environment And Health ◽

Kernel Machine Regression ◽

Indigenous Adolescents ◽

The Relationship

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ASICS: an R package for a whole analysis workflow of 1D 1H NMR spectra

Bioinformatics ◽

10.1093/bioinformatics/btz248 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4356-4363 ◽

Cited By ~ 7

Author(s):

Gaëlle Lefort ◽

Laurence Liaubet ◽

Cécile Canlet ◽

Patrick Tardivel ◽

Marie-Christine Père ◽

...

Keyword(s):

Metabolic Pathways ◽

Nmr Spectra ◽

Complex Mixture ◽

R Package ◽

Statistical Analyses ◽

Supplementary Information ◽

Automatic Identification ◽

Analysis Workflow ◽

Expert Analysis ◽

New Biomarkers

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Inferring perturbation profiles of cancer samples

Bioinformatics ◽

10.1093/bioinformatics/btab113 ◽

2021 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Indirect Evidence ◽

R Package ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Patient Specific ◽

Driver Genes ◽

Cancer Driver ◽

Molecular Alterations ◽

Incomplete Coverage ◽

Gene Perturbations

Abstract Motivation Cancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g. when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations. Results We have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens and breast cancer samples from The Cancer Genome Atlas. Availability and implementation The method is available as the R-package nempi at https://github.com/cbg-ethz/nempi and http://bioconductor.org/packages/nempi. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text