An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data

2017 ◽  
Vol 40 (1) ◽  
pp. 39-47 ◽  
Author(s):  
Yong Ju Ahn ◽  
Kesavan Markkandan ◽  
In-Pyo Baek ◽  
Seyoung Mun ◽  
Wooseok Lee ◽  
...  
2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Kelley Paskov ◽  
Jae-Yoon Jung ◽  
Brianna Chrisman ◽  
Nate T. Stockham ◽  
Peter Washington ◽  
...  

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.


2014 ◽  
Author(s):  
Ruibang Luo ◽  
Yiu-Lun Wong ◽  
Wai-Chun Law ◽  
Lap-Kei Lee ◽  
Chi-Man Liu ◽  
...  

This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 hours to process 50-fold whole genome sequencing (~750 million 100bp paired-end reads), or just 25 minutes for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa


BMC Genomics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Daniela Esser ◽  
Niklas Holze ◽  
Jochen Haag ◽  
Stefan Schreiber ◽  
Sandra Krüger ◽  
...  

2017 ◽  
Author(s):  
Daeyoon Kim ◽  
Sung-Soo Yoon ◽  
Youngil Koh ◽  
Su Yeon Lee ◽  
Hongseok Yun ◽  
...  

2014 ◽  
Vol 6 (10) ◽  
Author(s):  
Han Fang ◽  
Yiyang Wu ◽  
Giuseppe Narzisi ◽  
Jason A ORawe ◽  
Laura T Jimenez Barrón ◽  
...  

2013 ◽  
Vol 3 (1) ◽  
Author(s):  
Daichi Shigemizu ◽  
Akihiro Fujimoto ◽  
Shintaro Akiyama ◽  
Tetsuo Abe ◽  
Kaoru Nakano ◽  
...  

Author(s):  
Elke de Boer ◽  
Burcu Yaldiz ◽  
Anne-Sophie Denommé-Pichon ◽  
Leslie Matalonga ◽  
Steve Laurie ◽  
...  

2014 ◽  
Author(s):  
Ruibang Luo ◽  
Yiu-Lun Wong ◽  
Wai-Chun Law ◽  
Lap-Kei Lee ◽  
Chi-Man Liu ◽  
...  

This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 hours to process 50-fold whole genome sequencing (~750 million 100bp paired-end reads), or just 25 minutes for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa


Sign in / Sign up

Export Citation Format

Share Document