scholarly journals Design and Statistical Analysis of Pooled Next Generation Sequencing for Rare Variants

2012 ◽  
Vol 2012 ◽  
pp. 1-19 ◽  
Author(s):  
Tao Wang ◽  
Chang-Yun Lin ◽  
Yuanhao Zhang ◽  
Ruofeng Wen ◽  
Kenny Ye

Next generation sequencing (NGS) is a revolutionary technology for biomedical research. One highly cost-efficient application of NGS is to detect disease association based on pooled DNA samples. However, several key issues need to be addressed for pooled NGS. One of them is the high sequencing error rate and its high variability across genomic positions and experiment runs, which, if not well considered in the experimental design and analysis, could lead to either inflated false positive rates or loss in statistical power. Another important issue is how to test association of a group of rare variants. To address the first issue, we proposed a new blocked pooling design in which multiple pools of DNA samples from cases and controls are sequenced together on same NGS functional units. To address the second issue, we proposed a testing procedure that does not require individual genotypes but by taking advantage of multiple DNA pools. Through a simulation study, we demonstrated that our approach provides a good control of the type I error rate, and yields satisfactory power compared to the test-based on individual genotypes. Our results also provide guidelines for designing an efficient pooled.

Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 217
Author(s):  
Jung Yeon Lee ◽  
Myeong-Kyu Kim ◽  
Wonkuk Kim

Low-coverage next-generation sequencing experiments assisted by statistical methods are popular in a genetic association study. Next-generation sequencing experiments produce genotype data that include allele read counts and read depths. For low sequencing depths, the genotypes tend to be highly uncertain; therefore, the uncertain genotypes are usually removed or imputed before performing a statistical analysis. It may result in the inflated type I error rate and in a loss of statistical power. In this paper, we propose a mixture-based penalized score association test adjusting for non-genetic covariates. The proposed score test statistic is based on a sandwich variance estimator so that it is robust under the model misspecification between the covariates and the latent genotypes. The proposed method takes advantage of not requiring either external imputation or elimination of uncertain genotypes. The results of our simulation study show that the type I error rates are well controlled and the proposed association test have reasonable statistical power. As an illustration, we apply our statistic to pharmacogenomics data for drug responsiveness among 400 epilepsy patients.


2019 ◽  
Author(s):  
Tiffany M. Delhomme ◽  
Patrice H. Avogbe ◽  
Aurélie Gabriel ◽  
Nicolas Alcala ◽  
Noemie Leblay ◽  
...  

ABSTRACTThe emergence of Next-Generation Sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analyzing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub plateform: https://github.com/IARCbioinfo/needlestack.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Tiffany M Delhomme ◽  
Patrice H Avogbe ◽  
Aurélie A G Gabriel ◽  
Nicolas Alcala ◽  
Noemie Leblay ◽  
...  

Abstract The emergence of next-generation sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations, such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analysing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub platform: https://github.com/IARCbioinfo/needlestack.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Shunqiao Feng ◽  
Lin Han ◽  
Mei Yue ◽  
Dixiao Zhong ◽  
Jing Cao ◽  
...  

Abstract Background Langerhans cell histiocytosis (LCH) is a rare neoplastic disease that occurs in both children and adults, and BRAF V600E is detected in up to 64% of the patients. Several studies have discussed the associations between BRAF V600E mutation and clinicopathological manifestations, but no clear conclusions have been drawn regarding the clinical significance of the mutation in pediatric patients. Results We retrieved the clinical information for 148 pediatric LCH patients and investigated the BRAF V600E mutation using next-generation sequencing alone or with droplet digital PCR. The overall positive rate of BRAF V600E was 60/148 (41%). The type of sample (peripheral blood and formalin-fixed paraffin-embedded tissue) used for testing was significantly associated with the BRAF V600E mutation status (p-value = 0.000 and 0.000). The risk of recurrence declined in patients who received targeted therapy (p-value = 0.006; hazard ratio 0.164, 95%CI: 0.046 to 0.583). However, no correlation was found between the BRAF V600E status and gender, age, stage, specific organ affected, TP53 mutation status, masses close to the lesion or recurrence. Conclusions This is the largest pediatric LCH study conducted with a Chinese population to date. BRAF V600E in LCH may occur less in East Asian populations than in other ethnic groups, regardless of age. Biopsy tissue is a more sensitive sample for BRAF mutation screening because not all of circulating DNA is tumoral. Approaches with low limit of detection or high sensitivity are recommended for mutation screening to avoid type I and II errors.


2019 ◽  
Author(s):  
Xinyue You ◽  
Suresh Thiruppathi ◽  
Weiying Liu ◽  
Yiyi Cao ◽  
Mikihiko Naito ◽  
...  

ABSTRACTTo improve the accuracy and the cost-efficiency of next-generation sequencing in ultralow-frequency mutation detection, we developed the Paired-End and Complementary Consensus Sequencing (PECC-Seq), a PCR-free duplex consensus sequencing approach. PECC-Seq employed shear points as endogenous barcodes to identify consensus sequences from the overlap in the shortened, complementary DNA strands-derived paired-end reads for sequencing error correction. With the high accuracy of PECC-Seq, we identified the characteristic base substitution errors introduced by the end-repair process of mechanical fragmentation-based library preparations, which were prominent at the terminal 6 bp of the library fragments in the 5’-NpCpA-3’ or 5’-NpCpT-3’ trinucleotide context. As demonstrated at the human genome scale (TK6 cells), after removing these potential end-repair artifacts from the terminal 6 bp, PECC-Seq could reduce the sequencing error frequency to mid-10−7 with a relatively low sequencing depth. For TA base pairs, the background error rate could be suppressed to mid-10−8. In mutagen-treated TK6, slight increases in mutagen treatment-related mutant frequencies could be detected, indicating the potential of PECC-Seq in detecting genome-wide ultra-rare mutations. In addition, our finding on the patterns of end-repair artifacts may provide new insights in further reducing technical errors not only for PECC-Seq, but also for other next-generation sequencing techniques.


2019 ◽  
Vol 44 ◽  
pp. 14-20 ◽  
Author(s):  
Isabel Diebold ◽  
Ulrike Schön ◽  
Rita Horvath ◽  
Oliver Schwartz ◽  
Elke Holinski-Feder ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document