Differential Analysis of Microbial Community grown with different DO levels Using R Package

Author(s):  
Snehal V. Bhange ◽  
S. S. Dongre ◽  
Hitesh Tikariha ◽  
H. J. Purohit
F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 7
Author(s):  
Sebastien Theil ◽  
Etienne Rifa

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.


2021 ◽  
Vol 22 (3) ◽  
pp. 1399
Author(s):  
Salim Ghannoum ◽  
Waldir Leoncio Netto ◽  
Damiano Fantini ◽  
Benjamin Ragan-Kelley ◽  
Amirabbas Parizadeh ◽  
...  

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.


2018 ◽  
Author(s):  
Jacob R. Price ◽  
Stephen Woloszynek ◽  
Gail Rosen ◽  
Christopher M. Sales

Abstracttheseus is a collection of functions within the R programming framework [1] to assist microbiologists and molecular biologists in the interpretation of microbial community composition data.


Author(s):  
Chi Liu ◽  
Yaoming Cui ◽  
Xiangzhen Li ◽  
Minjie Yao

Abstract A large amount of sequencing data is produced in microbial community ecology studies using the high-throughput sequencing technique, especially amplicon-sequencing-based community data. After conducting the initial bioinformatic analysis of amplicon sequencing data, performing the subsequent statistics and data mining based on the operational taxonomic unit and taxonomic assignment tables is still complicated and time-consuming. To address this problem, we present an integrated R package-‘microeco’ as an analysis pipeline for treating microbial community and environmental data. This package was developed based on the R6 class system and combines a series of commonly used and advanced approaches in microbial community ecology research. The package includes classes for data preprocessing, taxa abundance plotting, venn diagram, alpha diversity analysis, beta diversity analysis, differential abundance test and indicator taxon analysis, environmental data analysis, null model analysis, network analysis and functional analysis. Each class is designed to provide a set of approaches that can be easily accessible to users. Compared with other R packages in the microbial ecology field, the microeco package is fast, flexible and modularized to use, and provides powerful and convenient tools for researchers. The microeco package can be installed from CRAN (The Comprehensive R Archive Network) or github (https://github.com/ChiLiubio/microeco).


2021 ◽  
Author(s):  
Ahmed A. Metwally ◽  
Tom Zhang ◽  
Si Wu ◽  
Ryan Kellogg ◽  
Wenyu Zhou ◽  
...  

Longitudinal studies increasingly collect rich 'omics' data sampled frequently over time and across large cohorts to capture dynamic health fluctuations and disease transitions. However, the generation of longitudinal omics data has preceded the development of analysis tools that can efficiently extract insights from such data. In particular, there is a need for statistical frameworks that can identify not only which omics features are differentially regulated between groups but also over what time intervals. Additionally, longitudinal omics data may have inconsistencies, including nonuniform sampling intervals, missing data points, subject dropout, and differing numbers of samples per subject. In this work, we developed a statistical method that provides robust identification of time intervals of temporal omics biomarkers. The proposed method is based on a semi-parametric approach, in which we use smoothing splines to model longitudinal data and infer significant time intervals of omics features based on an empirical distribution constructed through a permutation procedure. We benchmarked the proposed method on five simulated datasets with diverse temporal patterns, and the method showed specificity greater than 0.99 and sensitivity greater than 0.72. Applying the proposed method to the Integrative Personal Omics Profiling (iPOP) cohort revealed temporal patterns of amino acids, lipids, and hormone metabolites that are differentially regulated in male versus female subjects following a respiratory infection. In addition, we applied the longitudinal multi-omics dataset of pregnant women with and without preeclampsia, and the method identified potential lipid markers that are temporally significantly different between the two groups. We provide an open-source R package, OmicsLonDA (Omics Longitudinal Differential Analysis): https://bioconductor.org/packages/OmicsLonDA to enable widespread use.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Qianhui Xu ◽  
Hao Xu ◽  
Rongshan Deng ◽  
Zijie Wang ◽  
Nanjun Li ◽  
...  

Abstract Background Hepatocellular carcinoma (HCC) was the sixth common malignancies characteristic with highly aggressive in the world. It was well established that tumor mutation burden (TMB) act as indicator of immunotherapeutic responsiveness in various tumors. However, the role of TMB in tumor immune microenvironment (TIME) is still obscure. Method The mutation data was analyzed by employing “maftools” package. Weighted gene co-expression network analysis (WGCNA) was implemented to determine candidate module and significant genes correlated with TMB value. Differential analysis was performed between different level of TMB subgroups employing R package “limma”. Gene ontology (GO) enrichment analysis was implemented with “clusterProfiler”, “enrichplot” and “ggplot2” packages. Then risk score signature was developed by systematical bioinformatics analyses. K-M survival curves and receiver operating characteristic (ROC) plot were further analyzed for prognostic validity. To depict comprehensive context of TIME, XCELL, TIMER, QUANTISEQ, MCPcounter, EPIC, CIBERSORT, and CIBERSORT-ABS algorithm were employed. Additionally, the potential role of risk score on immune checkpoint blockade (ICB) immunotherapy was further explored. The quantitative real-time polymerase chain reaction was performed to detect expression of HTRA3. Results TMB value was positively correlated with older age, male gender and early T status. A total of 75 intersection genes between TMB-related genes and differentially expressed genes (DEGs) were screened and enriched in extracellular matrix-relevant pathways. Risk score based on three hub genes significantly affected overall survival (OS) time, infiltration of immune cells, and ICB-related hub targets. The prognostic performance of risks score was validated in the external testing group. Risk-clinical nomogram was constructed for clinical application. HTRA3 was demonstrated to be a prognostic factor in HCC in further exploration. Finally, mutation of TP53 was correlated with risk score and do not interfere with risk score-based prognostic prediction. Conclusion Collectively, a comprehensive analysis of TMB might provide novel insights into mutation-driven mechanism of tumorigenesis further contribute to tailored immunotherapy and prognosis prediction of HCC.


2016 ◽  
Author(s):  
Elizabeth Baskin ◽  
Rick Farouni ◽  
Ewy A. Mathe

AbstractSummaryRegulatory elements regulate gene transcription, and their location and accessibility is cell-type specific, particularly for enhancers. Mapping and comparing chromatin accessibility between different cell types may identify mechanisms involved in cellular development and disease progression. To streamline and simplify differential analysis of regulatory elements genome-wide using chromatin accessibility data, such as DNase-seq, ATAC-seq, we developed ALTRE (ALTered Regulatory Elements), an R package and associated R Shiny web app. ALTRE makes such analysis accessible to a wide range of users – from novice to practiced computational biologists.Availabilityhttps://github.com/Mathelab/[email protected]


2016 ◽  
Author(s):  
Dong Li ◽  
James B. Brown ◽  
Luisa Orsini ◽  
Zhisong Pan ◽  
Guyu Hu ◽  
...  

1SummaryGene co-expression network differential analysis is designed to help biologists understand gene expression patterns under different conditions. We have implemented an R package called MODA (Module Differential Analysis) for gene co-expression network differential analysis. Based on transcriptomic data, MODA can be used to estimate and construct condition-specific gene co-expression networks, and identify differentially expressed subnetworks as conserved or condition specific modules which are potentially associated with relevant biological processes. The usefulness of the method is also demonstrated by synthetic data as well as Daphnia magna gene expression data under different environmental stresses.


2019 ◽  
Author(s):  
Salim Ghannoum ◽  
Waldir Leoncio Netto ◽  
Damiano Fantini ◽  
Benjamin Ragan-Kelley ◽  
Amirabbas Parizadeh ◽  
...  

AbstractThe growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the trasncriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a computational pipeline using Jupyter notebooks. We also provide a user-friendly, cloud version of the notebook for researchers with very limited programming skills. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation datatset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. These notebooks can be used as tutorials for training purposes and will guide researchers to explore their scRNA-seq data.


2019 ◽  
Vol 35 (19) ◽  
pp. 3651-3662 ◽  
Author(s):  
F J Campos-Laborie ◽  
A Risueño ◽  
M Ortiz-Estévez ◽  
B Rosón-Burgo ◽  
C Droste ◽  
...  

Abstract Motivation Patient and sample diversity is one of the main challenges when dealing with clinical cohorts in biomedical genomics studies. During last decade, several methods have been developed to identify biomarkers assigned to specific individuals or subtypes of samples. However, current methods still fail to discover markers in complex scenarios where heterogeneity or hidden phenotypical factors are present. Here, we propose a method to analyze and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Results DEcomposing heterogeneous Cohorts using Omic data profiling (DECO) is a method to find significant association among biological features (biomarkers) and samples (individuals) analyzing large-scale omic data. The method identifies and categorizes biomarkers of specific phenotypic conditions based on a recurrent differential analysis integrated with a non-symmetrical correspondence analysis. DECO integrates both omic data dispersion and predictor–response relationship from non-symmetrical correspondence analysis in a unique statistic (called h-statistic), allowing the identification of closely related sample categories within complex cohorts. The performance is demonstrated using simulated data and five experimental transcriptomic datasets, and comparing to seven other methods. We show DECO greatly enhances the discovery and subtle identification of biomarkers, making it especially suited for deep and accurate patient stratification. Availability and implementation DECO is freely available as an R package (including a practical vignette) at Bioconductor repository (http://bioconductor.org/packages/deco/). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document