scholarly journals EasyMicroPlot: An Efficient and Convenient R Package in Microbiome Downstream Analysis and Visualization for Clinical Study

2022 ◽  
Vol 12 ◽  
Author(s):  
Bingdong Liu ◽  
Liujing Huang ◽  
Zhihong Liu ◽  
Xiaohan Pan ◽  
Zongbing Cui ◽  
...  

Advances in next-generation sequencing (NGS) have revolutionized microbial studies in many fields, especially in clinical investigation. As the second human genome, microbiota has been recognized as a new approach and perspective to understand the biological and pathologic basis of various diseases. However, massive amounts of sequencing data remain a huge challenge to researchers, especially those who are unfamiliar with microbial data analysis. The mathematic algorithm and approaches introduced from another scientific field will bring a bewildering array of computational tools and acquire higher quality of script experience. Moreover, a large cohort research together with extensive meta-data including age, body mass index (BMI), gender, medical results, and others related to subjects also aggravate this situation. Thus, it is necessary to develop an efficient and convenient software for clinical microbiome data analysis. EasyMicroPlot (EMP) package aims to provide an easy-to-use microbial analysis tool based on R platform that accomplishes the core tasks of metagenomic downstream analysis, specially designed by incorporation of popular microbial analysis and visualization used in clinical microbial studies. To illustrate how EMP works, 694 bio-samples from Guangdong Gut Microbiome Project (GGMP) were selected and analyzed with EMP package. Our analysis demonstrated the influence of dietary style on gut microbiota and proved EMP package's powerful ability and excellent convenience to address problems for this field.

2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

AbstractBackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5’ and 3’ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: (i) full-length RNA-seq for detection of splicing patterns and (ii) high-throughput 5′ and 3′ tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts. We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings and Saccharomyces cerevisiae cells as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the most commonly used community gene models, TAIR10 and Araport11 for A.thaliana and SacCer3 for S.cerevisiae. In particular, we identify multiple transient transcripts missing from the existing annotations. Our new annotations promise to improve the quality of A.thaliana and S.cerevisiae genome research. Conclusions Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2021 ◽  
Author(s):  
Arun H. Patil ◽  
Marc K. Halushka

ABSTRACTMicroRNAs and tRFs are classes of small non-coding RNAs, known for their roles in translational regulation of genes. Advances in next-generation sequencing (NGS) have enabled high-throughput small RNA-seq studies, which require robust alignment pipelines. Our laboratory previously developed miRge and miRge2.0, as flexible tools to process sequencing data for annotation of miRNAs and other small-RNA species and further predict novel miRNAs using a support vector machine approach. Although, miRge2.0 is a leading analysis tool in terms of speed with unique quantifying and annotation features, it has a few limitations. We present miRge3.0 which provides additional features along with compatibility to newer versions of Cutadapt and Python. The revisions of the tool include the ability to process Unique Molecular Identifiers (UMIs) to account for PCR duplicates while quantifying miRNAs in the datasets and an accurate GFF3 formatted isomiR tool. miRge3.0 also has speed improvements benchmarked to miRge2.0, Chimira and sRNAbench. Finally, miRge3.0 output integrates into other packages for a streamlined analysis process and provides a cross-platform Graphical User Interface (GUI). In conclusion miRge3.0 is our 3rd generation small RNA-seq aligner with improvements in speed, versatility, and functionality over earlier iterations.


2018 ◽  
Author(s):  
Maziyar Baran Pouyan ◽  
Dennis Kostka

AbstractMotivationGenome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore obtaining accurate cell–cell similarities from scRNA-seq data is critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.ResultsHere we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization, and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.Availability and ImplementationThe RAFSIL R package is available online at www.kostkalab.net/software.html


2019 ◽  
Vol 14 (2) ◽  
pp. 324-336
Author(s):  
Yuliati Yuliati ◽  
Susanti Wahyuningsih

The purpose of this study was to analyze the influence of service quality, trust and commitment to community satisfaction. The population in this study were all people who used services in Pandean Lamper Village, Gayamsari District, Semarang City, amounting to 741 people (Data for April-May 2016). While the samples taken were 88 people. Sampling uses stratified proportional sampling. The data analysis tool used in this study is multiple regression analysis. Based on the results of the study: There is a significant positive effect between the quality of service on community satisfaction. There is a significant influence between trust in community satisfaction. There is a significant influence between commitment to community satisfaction. There is a significant influence between service quality, trust and commitment to community satisfaction


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 7
Author(s):  
Sebastien Theil ◽  
Etienne Rifa

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.


2019 ◽  
Vol 35 (22) ◽  
pp. 4827-4829 ◽  
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xing-Ming Zhao ◽  
Xiaohua Hu ◽  
...  

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4419-4421 ◽  
Author(s):  
Sun Ah Kim ◽  
Myriam Brossard ◽  
Delnaz Roshandel ◽  
Andrew D Paterson ◽  
Shelley B Bull ◽  
...  

Abstract Summary For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. Availability and implementation The R package is available at https://bioconductor.org/packages/gpart. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 62 (8) ◽  
pp. 692-703 ◽  
Author(s):  
Gregory B. Gloor ◽  
Gregor Reid

A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.


Sign in / Sign up

Export Citation Format

Share Document