PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Evaluation of Subsampling-Based Normalization Strategies for Tagged High-Throughput Sequencing Data Sets from Gut Microbiomes

Applied and Environmental Microbiology ◽

10.1128/aem.05491-11 ◽

2011 ◽

Vol 77 (24) ◽

pp. 8795-8798 ◽

Cited By ~ 70

Author(s):

Daniel Aguirre de Cárcer ◽

Stuart E. Denman ◽

Chris McSweeney ◽

Mark Morrison

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Sequencing Data ◽

Β Diversity ◽

High Throughput Sequencing Data ◽

Minimum Number ◽

Diversity Metrics

ABSTRACTSeveral subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.

Download Full-text

Rqc: A Bioconductor Package for Quality Control of High-Throughput Sequencing Data

Journal of Statistical Software ◽

10.18637/jss.v087.c02 ◽

2018 ◽

Vol 87 (Code Snippet 2) ◽

Cited By ~ 2

Author(s):

Wélliton de Souza ◽

Benilton de Sá Carvalho ◽

Iscia Lopes-Cendes

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Bioconductor Package ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

F1000Research ◽

10.12688/f1000research.22954.3 ◽

2020 ◽

Vol 9 ◽

pp. 240

Author(s):

Frédéric Jarlier ◽

Nicolas Joly ◽

Nicolas Fedy ◽

Thomas Magalhaes ◽

Leonor Sirotti ◽

...

Keyword(s):

High Throughput ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

High Throughput Sequencing ◽

Genome Structure ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Speed Up ◽

Time To Delivery

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data. Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.

Download Full-text

Identification of Infectious Agents in High-Throughput Sequencing Data Sets Is Easily Achievable Using Free, Cloud-Based Bioinformatics Platforms

Journal of Clinical Microbiology ◽

10.1128/jcm.01386-19 ◽

2019 ◽

Vol 57 (12) ◽

Cited By ~ 2

Author(s):

Joseph G. Chappell ◽

Timothy Byaruhanga ◽

Theocharis Tsoleridis ◽

Jonathan K. Ball ◽

C. Patrick McClure

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Infectious Agents ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btv566 ◽

2015 ◽

pp. btv566 ◽

Cited By ~ 210

Author(s):

Konstantin Okonechnikov ◽

Ana Conesa ◽

Fernando García-Alcalde

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Sample Quality ◽

High Throughput Sequencing Data

Download Full-text

ANGSD-wrapper: utilities for analyzing next generation sequencing data

10.7287/peerj.preprints.1472 ◽

2016 ◽

Author(s):

Arun Durvasula ◽

Paul J Hoffman ◽

Tyler V Kent ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Principal Component ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

User Friendly

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.

Download Full-text

HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis

Scientific Reports ◽

10.1038/s41598-021-98124-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Renesh Bedre ◽

Carlos Avila ◽

Kranthi Mandadi

Keyword(s):

Quality Control ◽

High Throughput ◽

High Throughput Sequencing ◽

Science Research ◽

Control Analysis ◽

Sequencing Data ◽

Quality Control Analysis ◽

High Throughput Sequencing Data ◽

One Step ◽

Automated Quality Control

AbstractUse of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly. Here, we developed HTSQualC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSQualC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSQualC for conducting batch analysis of HTS datasets with 322 samples with an average ~ 1 M (paired end) sequence reads per sample. HTSQualC accomplished the QC analysis in ~ 3 h in distributed mode and ~ 31 h in shared mode, thus underscoring its utility and robust performance. In addition to command-line execution, we integrated HTSQualC into the free, open-source, CyVerse cyberinfrastructure resource as a GUI interface, for wider access to experimental biologists who have limited computational resources and/or programming abilities.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

10.7287/peerj.preprints.27019v2 ◽

2018 ◽

Author(s):

Sten Anslan ◽

Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appear to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon data set. We conclude that the output of each platform require manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

ANGSD-wrapper: utilities for analyzing next generation sequencing data

10.7287/peerj.preprints.1472v2 ◽

2016 ◽

Cited By ~ 1

Author(s):

Arun Durvasula ◽

Paul J Hoffman ◽

Tyler V Kent ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Principal Component ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

User Friendly

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.

Download Full-text