The Impact of Collisions on the Ability to Detect Rare Mutant Alleles Using Barcode-Type Next-Generation Sequencing Techniques

Barcoding techniques are used to reduce error from next-generation sequencing, with applications ranging from understanding tumor subclone populations to detecting circulating tumor DNA. Collisions occur when more than one sample molecule is tagged by the same unique identifier (UID) and can result in failure to detect very-low-frequency mutations and error in estimating mutation frequency. Here, we created computer models of barcoding technique, with and without amplification bias introduced by the UID, and analyzed the effect of collisions for a range of mutant allele frequencies (1e−6 to 0.2), number of sample molecules (10 000 to 1e7), and number of UIDs (410-414). Inability to detect rare mutant alleles occurred in 0% to 100% of simulations, depending on collisions and number of mutant molecules. Collisions also introduced error in estimating mutant allele frequency resulting in underestimation of minor allele frequency. Incorporating an understanding of the effect of collisions into experimental design can allow for optimization of the number of sample molecules and number of UIDs to minimize the negative impact on rare mutant detection and mutant frequency estimation.

Download Full-text

High throughput crop genome genotyping by a combination of pool next generation sequencing and haplotype-based data processing

10.21203/rs.3.rs-415602/v1 ◽

2021 ◽

Author(s):

Michael Schneider ◽

Asis Shrestha ◽

Agim Ballvora ◽

Jens Leon

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Frequency Estimation ◽

Whole Genome ◽

Next Generation ◽

Conservation Genomics ◽

High Coverage ◽

Allele Frequency Estimation ◽

Low Coverage ◽

Generation Sequencing

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.

Download Full-text

Biases and Errors on Allele Frequency Estimation and Disease Association Tests of Next-Generation Sequencing of Pooled Samples

Genetic Epidemiology ◽

10.1002/gepi.21648 ◽

2012 ◽

Vol 36 (6) ◽

pp. 549-560 ◽

Cited By ~ 17

Author(s):

Xiaowei Chen ◽

Jennifer B. Listman ◽

Frank J. Slack ◽

Joel Gelernter ◽

Hongyu Zhao

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Frequency Estimation ◽

Disease Association ◽

Next Generation ◽

Association Tests ◽

Allele Frequency Estimation ◽

Pooled Samples ◽

Generation Sequencing

Download Full-text

Mitochondrial disorders in the Arab Middle East population: the impact of next generation sequencing on the genetic diagnosis.

Journal of Biochemical and Clinical Genetics ◽

10.24911/jbcgenetics/183-1548325196 ◽

2019 ◽

pp. 54-64

Author(s):

Ahmad Alahmad ◽

Hebatallah Muhammad ◽

Angela Pyle ◽

Buthaina Albash ◽

Robert McFarland ◽

...

Keyword(s):

Next Generation Sequencing ◽

Middle East ◽

Genetic Diagnosis ◽

Mitochondrial Disorders ◽

Next Generation ◽

The Impact ◽

Generation Sequencing ◽

Arab Middle East

Download Full-text

The broad variety of next-generation sequencing methodologies and the impact on biomarker testing

10.26226/morressier.578f37fed462b8028d88fe24 ◽

2016 ◽

Author(s):

Véronique Tack

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Biomarker Testing ◽

Broad Variety ◽

The Impact ◽

Generation Sequencing

Download Full-text

Suitability of Bronchoscopic Biopsy Tissue Samples for Next-Generation Sequencing

Diagnostics ◽

10.3390/diagnostics11030391 ◽

2021 ◽

Vol 11 (3) ◽

pp. 391

Author(s):

Shuji Murakami ◽

Tomoyuki Yokose ◽

Daiji Nemoto ◽

Masaki Suzuki ◽

Ryou Usui ◽

...

Keyword(s):

Next Generation Sequencing ◽

Success Rate ◽

Rna Sequencing ◽

Surface Area ◽

Endobronchial Ultrasound ◽

Next Generation ◽

Endobronchial Ultrasonography ◽

Tissue Surface ◽

The Impact ◽

Generation Sequencing

A sufficiently large tissue sample is required to perform next-generation sequencing (NGS) with a high success rate, but the majority of patients with advanced non-small-cell lung cancer (NSCLC) are diagnosed with small biopsy specimens. Biopsy samples were collected from 184 patients with bronchoscopically diagnosed NSCLC. The tissue surface area, tumor cell count, and tumor content rate of each biopsy sample were evaluated. The impact of the cut-off criteria for the tissue surface area (≥1 mm2) and tumor content rate (≥30%) on the success rate of the Oncomine Dx Target Test (ODxTT) was evaluated. The mean tissue surface area of the transbronchial biopsies was 1.23 ± 0.85 mm2 when small endobronchial ultrasonography with a guide sheath (EBUS-GS) was used, 2.16 ± 1.49 mm2 with large EBUS-GS, and 1.81 ± 0.75 mm2 with endobronchial biopsy (EBB). The proportion of samples with a tissue surface area of ≥1 mm2 was 48.8% for small EBUS-GS, 79.2% for large EBUS-GS, and 78.6% for EBB. Sixty-nine patients underwent ODxTT. The success rate of DNA sequencing was 84.1% and that of RNA sequencing was 92.7% over all patients. The success rate of DNA (RNA) sequencing was 57.1% (71.4%) for small EBUS-GS (n = 14), 93.4% (96.9%) for large EBUS-GS (n = 32), 62.5% (100%) for EBB (n = 8), and 100% (100%) for endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) (n = 15). Regardless of the device used, a tissue surface area of ≥ 1 mm2 is adequate for samples to be tested with NGS.

Download Full-text

Does histological assessment accurately distinguish separate primary lung adenocarcinomas from intrapulmonary metastases? A study of paired resected lung nodules in 32 patients using a routine next-generation sequencing panel for driver mutations

Journal of Clinical Pathology ◽

10.1136/jclinpath-2021-207421 ◽

2021 ◽

pp. jclinpath-2021-207421

Author(s):

Frido K Bruehl ◽

Erika E Doxtader ◽

Yu-Wei Cheng ◽

Daniel H Farkas ◽

Carol Farver ◽

...

Keyword(s):

Next Generation Sequencing ◽

Molecular Analysis ◽

Low Frequency ◽

Driver Mutations ◽

Lung Nodules ◽

Driver Gene ◽

Next Generation ◽

Histological Assessment ◽

Lung Adenocarcinomas ◽

Generation Sequencing

AimVarious approaches have been reported for distinguishing separate primary lung adenocarcinomas from intrapulmonary metastases in patients with two lung nodules. The aim of this study was to determine whether histological assessment is reliable and accurate in distinguishing separate primary lung adenocarcinomas from intrapulmonary metastases using routine molecular findings as an adjunct.MethodsWe studied resected tumour pairs from 32 patients with lung adenocarcinomas in different lobes. In 15 of 32 tumour pairs, next-generation sequencing (NGS) for common driver mutations was performed on both nodules. The remainder of tumour pairs underwent limited NGS, or EGFR genotyping. Tumour pairs with different drivers (or one driver/one wild-type) were classified as molecularly unrelated, while those with identical low-frequency drivers were classified as related. Three pathologists independently and blinded to the molecular results categorised tumour pairs as related or unrelated based on histological assessment.ResultsOf 32 pairs, 15 were classified as related by histological assessment, and 17 as unrelated. Of 15 classified as related by histology, 6 were classified as related by molecular analysis, 4 were unrelated and 5 were indeterminate. Of 17 classified as unrelated by histology, 14 were classified as unrelated by molecular analysis, none was related and 3 were indeterminate. Histological assessment of relatedness was inaccurate in 4/32 (12.5%) tumour pairs.ConclusionsA small but significant subset of two-nodule adenocarcinoma pairs is inaccurately judged as related by histological assessment, and can be proven to be unrelated by molecular analysis (driver gene mutations), leading to significant downstaging.

Download Full-text

Next-Generation Sequencing: From Basic Research to Diagnostics

Clinical Chemistry ◽

10.1373/clinchem.2008.112789 ◽

2009 ◽

Vol 55 (4) ◽

pp. 641-658 ◽

Cited By ~ 459

Author(s):

Karl V Voelkerding ◽

Shale A Dames ◽

Jacob D Durtschi

Keyword(s):

Next Generation Sequencing ◽

Dna Sequencing ◽

Single Molecule ◽

Basic Research ◽

Clinical Diagnostics ◽

Massively Parallel ◽

Next Generation ◽

New Paradigm ◽

The Impact ◽

Generation Sequencing

Abstract Background: For the past 30 years, the Sanger method has been the dominant approach and gold standard for DNA sequencing. The commercial launch of the first massively parallel pyrosequencing platform in 2005 ushered in the new era of high-throughput genomic analysis now referred to as next-generation sequencing (NGS). Content: This review describes fundamental principles of commercially available NGS platforms. Although the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA templates or single DNA molecules is performed in a flow cell in a massively parallel manner. Through iterative cycles of polymerase-mediated nucleotide extensions or, in one approach, through successive oligonucleotide ligations, sequence outputs in the range of hundreds of megabases to gigabases are now obtained routinely. Highlighted in this review are the impact of NGS on basic research, bioinformatics considerations, and translation of this technology into clinical diagnostics. Also presented is a view into future technologies, including real-time single-molecule DNA sequencing and nanopore-based sequencing. Summary: In the relatively short time frame since 2005, NGS has fundamentally altered genomics research and allowed investigators to conduct experiments that were previously not technically feasible or affordable. The various technologies that constitute this new paradigm continue to evolve, and further improvements in technology robustness and process streamlining will pave the path for translation into clinical diagnostics.

Download Full-text

Estimating Allele Frequency from Next-Generation Sequencing of Pooled Mitochondrial DNA Samples

Frontiers in Genetics ◽

10.3389/fgene.2011.00051 ◽

2011 ◽

Vol 2 ◽

Cited By ~ 5

Author(s):

Tao Wang ◽

Kith Pradhan ◽

Kenny Ye ◽

Lee-Jun Wong ◽

Thomas E. Rohan

Keyword(s):

Mitochondrial Dna ◽

Next Generation Sequencing ◽

Allele Frequency ◽

Next Generation ◽

Generation Sequencing

Download Full-text

Profiling soil microbial communities with next-generation sequencing: the influence of DNA kit selection and technician technical expertise

10.7287/peerj.preprints.3073 ◽

2017 ◽

Author(s):

Taha Soliman ◽

Sung-Yin Yang ◽

Tomoko Yamazaki ◽

Holger Jenke-Kodama

Keyword(s):

Next Generation Sequencing ◽

Microbial Communities ◽

Dna Extraction ◽

Dna Isolation ◽

Soil Microbial Communities ◽

Next Generation ◽

Microbial Composition ◽

Okinawa Island ◽

The Impact ◽

Generation Sequencing

Structure and diversity of microbial communities are an important research topic in biology, since microbes play essential roles in the ecology of various environments. Different DNA isolation protocols can lead to data bias and can affect results of next-generation sequencing. To evaluate the impact of protocols for DNA isolation from soil samples and also the influence of individual handling of samples, we compared results obtained by two researchers (R and T) using two different DNA extraction kits: (1) MO BIO PowerSoil® DNA Isolation kit (MO_R and MO_T) and (2) NucleoSpin® Soil kit (MN_R and MN_T). Samples were collected from six different sites on Okinawa Island, Japan. For all sites, differences in the results of microbial composition analyses (bacteria, archaea, fungi, and other eukaryotes), obtained by the two researchers using the two kits, were analyzed. For both researchers, the MN kit gave significantly higher yields of genomic DNA at all sites compared to the MO kit (ANOVA; P <0.006). In addition, operational taxonomic units for some phyla and classes were missed in some cases: Micrarchaea were detected only in the MN_T and MO_R analyses; the bacterial phylum Armatimonadetes was detected only in MO_R and MO_T; and WIM5 of the phylum Amoebozoa of eukaryotes was found only in the MO_T analysis. Our results suggest the possibility of handling bias; therefore, it is crucial that replicated DNA extraction be performed by at least two technicians for thorough microbial analyses and to obtain accurate estimates of microbial diversity.

Download Full-text

Detection of genome-wide low-frequency mutations with Paired-End and Complementary Consensus Sequencing (PECC-Seq) revealed end-repair derived artifacts as residual errors

10.1101/2019.12.22.886440 ◽

2019 ◽

Author(s):

Xinyue You ◽

Suresh Thiruppathi ◽

Weiying Liu ◽

Yiyi Cao ◽

Mikihiko Naito ◽

...

Keyword(s):

Next Generation Sequencing ◽

Low Frequency ◽

Repair Process ◽

Sequencing Error ◽

Base Substitution ◽

Next Generation ◽

Background Error ◽

Technical Errors ◽

Genome Wide ◽

Generation Sequencing

ABSTRACTTo improve the accuracy and the cost-efficiency of next-generation sequencing in ultralow-frequency mutation detection, we developed the Paired-End and Complementary Consensus Sequencing (PECC-Seq), a PCR-free duplex consensus sequencing approach. PECC-Seq employed shear points as endogenous barcodes to identify consensus sequences from the overlap in the shortened, complementary DNA strands-derived paired-end reads for sequencing error correction. With the high accuracy of PECC-Seq, we identified the characteristic base substitution errors introduced by the end-repair process of mechanical fragmentation-based library preparations, which were prominent at the terminal 6 bp of the library fragments in the 5’-NpCpA-3’ or 5’-NpCpT-3’ trinucleotide context. As demonstrated at the human genome scale (TK6 cells), after removing these potential end-repair artifacts from the terminal 6 bp, PECC-Seq could reduce the sequencing error frequency to mid-10−7 with a relatively low sequencing depth. For TA base pairs, the background error rate could be suppressed to mid-10−8. In mutagen-treated TK6, slight increases in mutagen treatment-related mutant frequencies could be detected, indicating the potential of PECC-Seq in detecting genome-wide ultra-rare mutations. In addition, our finding on the patterns of end-repair artifacts may provide new insights in further reducing technical errors not only for PECC-Seq, but also for other next-generation sequencing techniques.

Download Full-text