scholarly journals Quality Assessment of High-Throughput DNA Sequencing Data via Range Analysis

Author(s):  
Ali Fotouhi ◽  
Mina Majidi ◽  
M. Oğuzhan Külekci
2011 ◽  
Vol 21 (5) ◽  
pp. 734-740 ◽  
Author(s):  
M. Hsi-Yang Fritz ◽  
R. Leinonen ◽  
G. Cochrane ◽  
E. Birney

2017 ◽  
Author(s):  
Yuchao Jiang ◽  
Rujin Wang ◽  
Eugene Urrutia ◽  
Ioannis N. Anastopoulos ◽  
Katherine L. Nathanson ◽  
...  

AbstractHigh-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.


2017 ◽  
Author(s):  
M. Oğuzhan Külekci ◽  
Ali Fotouhi ◽  
Mina Majidi

AbstractIn the recent literature there appeared a number of studies for the quality assessment of sequencing data. These efforts, to a great extent, focused on reporting the statistical parameters regarding to the distribution of the quality scores and/or the base-calls in a FASTQ file. We investigate another dimension for the quality assessment motivated with the fact that reads including long intervals having fewer errors improve the performances of the post-processing tools in the down-stream analysis. Thus, the quality assessment procedures proposed in this study aim to analyze the segments on the reads that are above a certain quality. We define an interval of a read to be of desired quality when there are at most k quality scores less than or equal to a threshold value v, for some v and k provided by the user. We present the algorithm to detect those ranges and introduce new metrics computed from their lengths. These metrics include the mean values for the longest, shortest, average, cubic average, and average variation coefficient of the fragment lengths that are appropriate according to the v and k input parameters. We provide a new software tool QASDRA for quality assessment of sequencing data via range analysis. QASDRA, implemented in Python, and publicly available at https://github.com/ali-cp/QASDRA.git, creates the quality assessment report of an input FASTQ file according to the user specified k and v parameters. It also has the capabilities to filter out the reads according to the metrics introduced.


2013 ◽  
Vol 30 (4) ◽  
pp. 409-415
Author(s):  
Zexuan Zhu ◽  
Yongpeng Zhang ◽  
Zhuhong You ◽  
Liang Jiang ◽  
Zhen Ji

PLoS ONE ◽  
2016 ◽  
Vol 11 (5) ◽  
pp. e0155461 ◽  
Author(s):  
José M. Abuín ◽  
Juan C. Pichel ◽  
Tomás F. Pena ◽  
Jorge Amigo

Sign in / Sign up

Export Citation Format

Share Document