Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis

AbstractHigh-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.

Download Full-text

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

BMC Bioinformatics ◽

10.1186/1471-2105-15-264 ◽

2014 ◽

Vol 15 (1) ◽

Cited By ~ 8

Author(s):

Tilo Buschmann ◽

Rong Zhang ◽

Douglas E Brash ◽

Leonid V Bystrykh

Keyword(s):

Dna Sequencing ◽

False Discovery Rate ◽

High Throughput ◽

Sequencing Data ◽

False Discovery ◽

High Throughput Dna Sequencing

Download Full-text

Quality Assessment of High-throughput DNA Sequencing Data via Range analysis

10.1101/101469 ◽

2017 ◽

Author(s):

M. Oğuzhan Külekci ◽

Ali Fotouhi ◽

Mina Majidi

Keyword(s):

Quality Assessment ◽

Software Tool ◽

Threshold Value ◽

Sequencing Data ◽

Range Analysis ◽

Statistical Parameters ◽

Mean Values ◽

Fastq File ◽

High Throughput Dna Sequencing ◽

Assessment Procedures

AbstractIn the recent literature there appeared a number of studies for the quality assessment of sequencing data. These efforts, to a great extent, focused on reporting the statistical parameters regarding to the distribution of the quality scores and/or the base-calls in a FASTQ file. We investigate another dimension for the quality assessment motivated with the fact that reads including long intervals having fewer errors improve the performances of the post-processing tools in the down-stream analysis. Thus, the quality assessment procedures proposed in this study aim to analyze the segments on the reads that are above a certain quality. We define an interval of a read to be of desired quality when there are at most k quality scores less than or equal to a threshold value v, for some v and k provided by the user. We present the algorithm to detect those ranges and introduce new metrics computed from their lengths. These metrics include the mean values for the longest, shortest, average, cubic average, and average variation coefficient of the fragment lengths that are appropriate according to the v and k input parameters. We provide a new software tool QASDRA for quality assessment of sequencing data via range analysis. QASDRA, implemented in Python, and publicly available at https://github.com/ali-cp/QASDRA.git, creates the quality assessment report of an input FASTQ file according to the user specified k and v parameters. It also has the capabilities to filter out the reads according to the metrics introduced.

Download Full-text