QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView

Download Full-text

Rapid evaluation and quality control of next generation sequencing data with FaQCs

BMC Bioinformatics ◽

10.1186/s12859-014-0366-2 ◽

2014 ◽

Vol 15 (1) ◽

Cited By ~ 88

Author(s):

Chien-Chi Lo ◽

Patrick S G Chain

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Rapid Evaluation ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

NGS QC Toolkit: A Platform for Quality Control of Next-Generation Sequencing Data

Encyclopedia of Metagenomics ◽

10.1007/978-1-4614-6418-1_348-2 ◽

2013 ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Ravi K. Patel ◽

Mukesh Jain

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0030619 ◽

2012 ◽

Vol 7 (2) ◽

pp. e30619 ◽

Cited By ~ 1500

Author(s):

Ravi K. Patel ◽

Mukesh Jain

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

Genome Biology ◽

10.1186/s13059-020-02254-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eric M. Davis ◽

Yu Sun ◽

Yanling Liu ◽

Pandurang Kolekar ◽

Ying Shao ◽

...

Keyword(s):

Next Generation Sequencing ◽

Error Rate ◽

Control Method ◽

Error Rates ◽

Computational Method ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Flow Cells ◽

Generation Sequencing

Abstract Background There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. Results We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. Conclusions Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

Download Full-text