Differential Analysis of Microbial Community grown with different DO levels Using R Package

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.

Download Full-text

DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

International Journal of Molecular Sciences ◽

10.3390/ijms22031399 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1399

Author(s):

Salim Ghannoum ◽

Waldir Leoncio Netto ◽

Damiano Fantini ◽

Benjamin Ragan-Kelley ◽

Amirabbas Parizadeh ◽

...

Keyword(s):

Single Cell ◽

Biomarker Discovery ◽

Enrichment Analysis ◽

Myxoid Liposarcoma ◽

R Package ◽

Differential Analysis ◽

A Cell ◽

Reproducible Analysis ◽

Transcriptomic Level ◽

User Friendly

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.

Download Full-text

theseus - An R package for the analysis and visualization of microbial community data

10.1101/295675 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jacob R. Price ◽

Stephen Woloszynek ◽

Gail Rosen ◽

Christopher M. Sales

Keyword(s):

Microbial Community ◽

Community Composition ◽

Microbial Community Composition ◽

R Package ◽

Composition Data ◽

Programming Framework ◽

Community Data ◽

R Programming

Abstracttheseus is a collection of functions within the R programming framework [1] to assist microbiologists and molecular biologists in the interpretation of microbial community composition data.

Download Full-text

microeco: An R package for data mining in microbial community ecology

FEMS Microbiology Ecology ◽

10.1093/femsec/fiaa255 ◽

2020 ◽

Author(s):

Chi Liu ◽

Yaoming Cui ◽

Xiangzhen Li ◽

Minjie Yao

Keyword(s):

Data Mining ◽

Microbial Community ◽

Community Ecology ◽

Amplicon Sequencing ◽

R Package ◽

Environmental Data ◽

Venn Diagram ◽

Diversity Analysis ◽

Sequencing Data ◽

Microbial Community Ecology

Abstract A large amount of sequencing data is produced in microbial community ecology studies using the high-throughput sequencing technique, especially amplicon-sequencing-based community data. After conducting the initial bioinformatic analysis of amplicon sequencing data, performing the subsequent statistics and data mining based on the operational taxonomic unit and taxonomic assignment tables is still complicated and time-consuming. To address this problem, we present an integrated R package-‘microeco’ as an analysis pipeline for treating microbial community and environmental data. This package was developed based on the R6 class system and combines a series of commonly used and advanced approaches in microbial community ecology research. The package includes classes for data preprocessing, taxa abundance plotting, venn diagram, alpha diversity analysis, beta diversity analysis, differential abundance test and indicator taxon analysis, environmental data analysis, null model analysis, network analysis and functional analysis. Each class is designed to provide a set of approaches that can be easily accessible to users. Compared with other R packages in the microbial ecology field, the microeco package is fast, flexible and modularized to use, and provides powerful and convenient tools for researchers. The microeco package can be installed from CRAN (The Comprehensive R Archive Network) or github (https://github.com/ChiLiubio/microeco).

Download Full-text

Robust Identification of Temporal Biomarkers in Longitudinal Omics Studies

10.1101/2021.11.19.469350 ◽

2021 ◽

Author(s):

Ahmed A. Metwally ◽

Tom Zhang ◽

Si Wu ◽

Ryan Kellogg ◽

Wenyu Zhou ◽

...

Keyword(s):

Empirical Distribution ◽

Temporal Patterns ◽

R Package ◽

Smoothing Splines ◽

Omics Data ◽

Differential Analysis ◽

Time Intervals ◽

Robust Identification ◽

Data Points ◽

Subject Dropout

Longitudinal studies increasingly collect rich 'omics' data sampled frequently over time and across large cohorts to capture dynamic health fluctuations and disease transitions. However, the generation of longitudinal omics data has preceded the development of analysis tools that can efficiently extract insights from such data. In particular, there is a need for statistical frameworks that can identify not only which omics features are differentially regulated between groups but also over what time intervals. Additionally, longitudinal omics data may have inconsistencies, including nonuniform sampling intervals, missing data points, subject dropout, and differing numbers of samples per subject. In this work, we developed a statistical method that provides robust identification of time intervals of temporal omics biomarkers. The proposed method is based on a semi-parametric approach, in which we use smoothing splines to model longitudinal data and infer significant time intervals of omics features based on an empirical distribution constructed through a permutation procedure. We benchmarked the proposed method on five simulated datasets with diverse temporal patterns, and the method showed specificity greater than 0.99 and sensitivity greater than 0.72. Applying the proposed method to the Integrative Personal Omics Profiling (iPOP) cohort revealed temporal patterns of amino acids, lipids, and hormone metabolites that are differentially regulated in male versus female subjects following a respiratory infection. In addition, we applied the longitudinal multi-omics dataset of pregnant women with and without preeclampsia, and the method identified potential lipid markers that are temporally significantly different between the two groups. We provide an open-source R package, OmicsLonDA (Omics Longitudinal Differential Analysis): https://bioconductor.org/packages/OmicsLonDA to enable widespread use.

Download Full-text

Multi-omics analysis reveals prognostic value of tumor mutation burden in hepatocellular carcinoma

Cancer Cell International ◽

10.1186/s12935-021-02049-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Qianhui Xu ◽

Hao Xu ◽

Rongshan Deng ◽

Zijie Wang ◽

Nanjun Li ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Risk Score ◽

Enrichment Analysis ◽

R Package ◽

Immune Checkpoint Blockade ◽

Differential Analysis ◽

Bioinformatics Analyses ◽

Tumor Mutation Burden ◽

Mutation Burden

Abstract Background Hepatocellular carcinoma (HCC) was the sixth common malignancies characteristic with highly aggressive in the world. It was well established that tumor mutation burden (TMB) act as indicator of immunotherapeutic responsiveness in various tumors. However, the role of TMB in tumor immune microenvironment (TIME) is still obscure. Method The mutation data was analyzed by employing “maftools” package. Weighted gene co-expression network analysis (WGCNA) was implemented to determine candidate module and significant genes correlated with TMB value. Differential analysis was performed between different level of TMB subgroups employing R package “limma”. Gene ontology (GO) enrichment analysis was implemented with “clusterProfiler”, “enrichplot” and “ggplot2” packages. Then risk score signature was developed by systematical bioinformatics analyses. K-M survival curves and receiver operating characteristic (ROC) plot were further analyzed for prognostic validity. To depict comprehensive context of TIME, XCELL, TIMER, QUANTISEQ, MCPcounter, EPIC, CIBERSORT, and CIBERSORT-ABS algorithm were employed. Additionally, the potential role of risk score on immune checkpoint blockade (ICB) immunotherapy was further explored. The quantitative real-time polymerase chain reaction was performed to detect expression of HTRA3. Results TMB value was positively correlated with older age, male gender and early T status. A total of 75 intersection genes between TMB-related genes and differentially expressed genes (DEGs) were screened and enriched in extracellular matrix-relevant pathways. Risk score based on three hub genes significantly affected overall survival (OS) time, infiltration of immune cells, and ICB-related hub targets. The prognostic performance of risks score was validated in the external testing group. Risk-clinical nomogram was constructed for clinical application. HTRA3 was demonstrated to be a prognostic factor in HCC in further exploration. Finally, mutation of TP53 was correlated with risk score and do not interfere with risk score-based prognostic prediction. Conclusion Collectively, a comprehensive analysis of TMB might provide novel insights into mutation-driven mechanism of tumorigenesis further contribute to tailored immunotherapy and prognosis prediction of HCC.

Download Full-text

ALTRE: workflow for defining ALTered Regulatory Elements using chromatin accessibility data

10.1101/080564 ◽

2016 ◽

Author(s):

Elizabeth Baskin ◽

Rick Farouni ◽

Ewy A. Mathe

Keyword(s):

Cell Types ◽

R Package ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Differential Analysis ◽

Genome Wide ◽

Wide Range ◽

R Shiny ◽

Cell Type Specific ◽

Different Cell Types

AbstractSummaryRegulatory elements regulate gene transcription, and their location and accessibility is cell-type specific, particularly for enhancers. Mapping and comparing chromatin accessibility between different cell types may identify mechanisms involved in cellular development and disease progression. To streamline and simplify differential analysis of regulatory elements genome-wide using chromatin accessibility data, such as DNase-seq, ATAC-seq, we developed ALTRE (ALTered Regulatory Elements), an R package and associated R Shiny web app. ALTRE makes such analysis accessible to a wide range of users – from novice to practiced computational biologists.Availabilityhttps://github.com/Mathelab/[email protected]

Download Full-text

MODA: MOdule Differential Analysis for weighted gene co-expression network

10.1101/053496 ◽

2016 ◽

Cited By ~ 4

Author(s):

Dong Li ◽

James B. Brown ◽

Luisa Orsini ◽

Zhisong Pan ◽

Guyu Hu ◽

...

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Synthetic Data ◽

R Package ◽

Gene Expression Patterns ◽

Specific Gene ◽

Biological Processes ◽

Expression Data ◽

Differential Analysis ◽

Transcriptomic Data

1SummaryGene co-expression network differential analysis is designed to help biologists understand gene expression patterns under different conditions. We have implemented an R package called MODA (Module Differential Analysis) for gene co-expression network differential analysis. Based on transcriptomic data, MODA can be used to estimate and construct condition-specific gene co-expression networks, and identify differentially expressed subnetworks as conserved or condition specific modules which are potentially associated with relevant biological processes. The usefulness of the method is also demonstrated by synthetic data as well as Daphnia magna gene expression data under different environmental stresses.

Download Full-text

DIscBIO: a user-friendly pipeline for biomarker discovery in single-cell transcriptomics

10.1101/700989 ◽

2019 ◽

Author(s):

Salim Ghannoum ◽

Waldir Leoncio Netto ◽

Damiano Fantini ◽

Benjamin Ragan-Kelley ◽

Amirabbas Parizadeh ◽

...

Keyword(s):

Single Cell ◽

Biomarker Discovery ◽

Enrichment Analysis ◽

Myxoid Liposarcoma ◽

R Package ◽

Differential Analysis ◽

A Cell ◽

Reproducible Analysis ◽

User Friendly ◽

Cycle Regulation

AbstractThe growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the trasncriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a computational pipeline using Jupyter notebooks. We also provide a user-friendly, cloud version of the notebook for researchers with very limited programming skills. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation datatset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. These notebooks can be used as tutorials for training purposes and will guide researchers to explore their scRNA-seq data.

Download Full-text

DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling

Bioinformatics ◽

10.1093/bioinformatics/btz148 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3651-3662 ◽

Cited By ~ 1

Author(s):

F J Campos-Laborie ◽

A Risueño ◽

M Ortiz-Estévez ◽

B Rosón-Burgo ◽

C Droste ◽

...

Keyword(s):

Correspondence Analysis ◽

Large Scale ◽

Simulated Data ◽

R Package ◽

Heterogeneous Data ◽

Supplementary Information ◽

Patient Stratification ◽

Differential Analysis ◽

Data Profiling ◽

Omic Data

Abstract Motivation Patient and sample diversity is one of the main challenges when dealing with clinical cohorts in biomedical genomics studies. During last decade, several methods have been developed to identify biomarkers assigned to specific individuals or subtypes of samples. However, current methods still fail to discover markers in complex scenarios where heterogeneity or hidden phenotypical factors are present. Here, we propose a method to analyze and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Results DEcomposing heterogeneous Cohorts using Omic data profiling (DECO) is a method to find significant association among biological features (biomarkers) and samples (individuals) analyzing large-scale omic data. The method identifies and categorizes biomarkers of specific phenotypic conditions based on a recurrent differential analysis integrated with a non-symmetrical correspondence analysis. DECO integrates both omic data dispersion and predictor–response relationship from non-symmetrical correspondence analysis in a unique statistic (called h-statistic), allowing the identification of closely related sample categories within complex cohorts. The performance is demonstrated using simulated data and five experimental transcriptomic datasets, and comparing to seven other methods. We show DECO greatly enhances the discovery and subtle identification of biomarkers, making it especially suited for deep and accurate patient stratification. Availability and implementation DECO is freely available as an R package (including a practical vignette) at Bioconductor repository (http://bioconductor.org/packages/deco/). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text