scholarly journals ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell

2021 ◽  
Author(s):  
Amy Wing-Sze Leung ◽  
Henry Chi-Ming Leung ◽  
Chak-Lim Wong ◽  
Zhen-Xian Zheng ◽  
Wui-Wang Lui ◽  
...  

Background: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample. Method: We introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4,800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples. Results: ECNano achieved deep on-target depth of coverage (DoC) at average >100x and >98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30x DoC. ECNano obtained an average read length of 1,000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30x DoC. Clair-ensemble achieved >99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days. Conclusion: We presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4,800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.

Author(s):  
Umair Ahsan ◽  
Qian Liu ◽  
Li Fang ◽  
Kai Wang

AbstractVariant (SNPs/indels) detection from high-throughput sequencing data remains an important yet unresolved problem. Long-read sequencing enables variant detection in difficult-to-map genomic regions that short-read sequencing cannot reliably examine (for example, only ~80% of genomic regions are marked as “high-confidence region” to have SNP/indel calls in the Genome In A Bottle project); however, the high per-base error rate poses unique challenges in variant detection. Existing methods on long-read data typically rely on analyzing pileup information from neighboring bases surrounding a candidate variant, similar to short-read variant callers, yet the benefits of much longer read length are not fully exploited. Here we present a deep neural network called NanoCaller, which detects SNPs by examining pileup information solely from other nonadjacent candidate SNPs that share the same long reads using long-range haplotype information. With called SNPs by NanoCaller, NanoCaller phases long reads and performs local realignment on two sets of phased reads to call indels by another deep neural network. Extensive evaluation on 5 human genomes (sequenced by Nanopore and PacBio long-read techniques) demonstrated that NanoCaller greatly improved performance in difficult-to-map regions, compared to other long-read variant callers. We experimentally validated 41 novel variants in difficult-to-map regions in a widely-used benchmarking genome, which cannot be reliably detected previously. We extensively evaluated the run-time characteristics and the sensitivity of parameter settings of NanoCaller to different characteristics of sequencing data. Finally, we achieved the best performance in Nanopore-based variant calling from MHC regions in the PrecisionFDA Variant Calling Challenge on Difficult-to-Map Regions by ensemble calling. In summary, by incorporating haplotype information in deep neural networks, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing data.


2021 ◽  
Author(s):  
Jyun-Hong Lin ◽  
Liang-Chi Chen ◽  
Shu-Qi Yu ◽  
Yao-Ting Huang

AbstractLong-read phasing has been used for reconstructing diploid genomes, improving variant calling, and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing. This paper presents an ultra-fast algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in ∼10-20 minutes, 10x faster than the state-of-the-art WhatsHap and Margin. In particular, LongPhase produces much larger phased blocks at almost chromosome level with only long reads (N50=26Mbp). We demonstrate that LongPhase combined with Nanopore is a cost-effective approach for providing chromosome-scale phasing without the need for additional trios, chromosome-conformation, and single-cell strand-seq data.


2021 ◽  
Author(s):  
Kishwar Shafin ◽  
Trevor Pesout ◽  
Pi-Chuan Chang ◽  
Maria Nattestad ◽  
Alexey Kolesnikov ◽  
...  

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).


2019 ◽  
Author(s):  
Lu Zhang ◽  
Xin Zhou ◽  
Ziming Weng ◽  
Arend Sidow

AbstractStructural variants (SVs) in a personal genome are important but, for all practical purposes, impossible to detect comprehensively by standard short-fragment sequencing. De novo assembly, traditionally used to generate reference genomes, offers an alternative means for variant detection and phasing but has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10x linked-read sequencing, which has been applied to assemble human diploid genomes into high quality contigs, supports accurate SV detection. We examined variants in six de novo 10x assemblies with diverse experimental parameters from two commonly used human cell lines, NA12878 and NA24385. The assemblies are effective in detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the accuracy of SV breakpoint at base-pair level is high, with a majority (80% for deletion and 70% for insertion) of SVs having precisely correct sizes and breakpoints (<2bp difference). Finally, setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation, which in about half of cases is opposite to that of the reference-based call. Interestingly, we uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10x linked-read data can achieve cost-effective SV detection for personal genomes.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S497-S498
Author(s):  
Mohamad Sater ◽  
Remy Schwab ◽  
Ian Herriott ◽  
Tim Farrell ◽  
Miriam Huntley

Abstract Background Healthcare associated infections (HAIs) are a major contributor to patient morbidity and mortality worldwide. HAIs are increasingly important due to the rise of multidrug resistant pathogens which can lead to deadly nosocomial outbreaks. Current methods for investigating transmissions are slow, costly, or have poor detection resolution. A rapid, cost-effective and high-resolution method to identify transmission events is imperative to guide infection control. Whole genome sequencing of infecting pathogens paired with a single nucleotide polymorphism (SNP) analysis can provide high-resolution clonality determination, yet these methods typically have long turnaround times. Here we examined the utility of the Oxford Nanopore Technologies (ONT) platform, a rapid sequencing technology, for whole genome sequencing based transmission analysis. Methods We developed a SNP calling pipeline customized for ONT data, which exhibit higher sequencing error rates and can therefore be challenging for transmission analysis. The pipeline leverages the latest basecalling tools as well as a suite of custom variant calling and filtering algorithms to achieve highest accuracy in clonality calls compared to Illumina-based sequencing. We also capitalize on ONT long reads by assembling outbreak-specific genomes in order to overcome the need for an external reference genome. Results We examined 20 bacterial isolates from 5 HAI investigations previously performed at Day Zero Diagnostics as part of epiXact®, our commercialized Illumina-based HAI sequencing and analysis service. Using the ONT data and pipeline, we achieved greater than 90% SNP-calling sensitivity and precision, allowing 100% accuracy of clonality classification compared to Illumina-based results across common HAI species. We demonstrate the validity and increased resolution of our SNP analysis pipeline using assembled genomes from each outbreak. We also demonstrate that this ONT-based workflow can produce isolate to transmission determination (i.e. including WGS and analysis) in less than 24 hours. SNP calling performance ONT-based SNP calling sensitivity and precision compared to Illumina-based pipeline Conclusion We demonstrate the utility of ONT for HAI investigation, establishing the potential to transform healthcare epidemiology with same-day high-resolution transmission determination. Disclosures Mohamad Sater, PhD, Day Zero Diagnostics (Employee, Shareholder) Remy Schwab, MSc, Day Zero Diagnostics (Employee, Shareholder) Ian Herriott, BS, Day Zero Diagnostics (Employee, Shareholder) Tim Farrell, MS, Day Zero Diagnostics, Inc. (Employee, Shareholder) Miriam Huntley, PhD, Day Zero Diagnostics (Employee, Shareholder)


2002 ◽  
Vol 126 (1) ◽  
pp. 100-102
Author(s):  
Sara Kukuczka ◽  
Leonard E. Grosso

Abstract Context.—The availability of effective antiviral therapy for hepatitis C has increased the need for molecular detection and quantification of circulating hepatitis C viral particles. The limits of detection differ for the quantitative and qualitative reverse transcriptase polymerase chain reaction (RT-PCR) assays; furthermore, adequate patient assessment requires both detection of hepatitis C virus when it is present and quantitation of the viral load when possible. The combination of these factors promotes the simultaneous ordering of both tests with the possibility of generating redundant test information. Objective.—To reduce the number of unnecessary hepatitis C tests performed. Methods.—We established a reflexive testing protocol for quantitative and qualitative RT-PCR testing for hepatitis C. Results.—During a 3½-month interval, 170 qualitative RT-PCR hepatitis C tests were eliminated (a 59.4% reduction in the number of these tests). This reduction was achieved without a clinically significant change in turnaround time or a compromise of patient care. Conclusions.—Establishing the quantitative and qualitative RT-PCR tests in-house and adopting the reflexive testing protocol was cost-effective and did not compromise patient management or care.


2017 ◽  
Vol 2 ◽  
pp. 6 ◽  
Author(s):  
Laura Oikkonen ◽  
Stefano Lise

Identifying variants from RNA-seq (transcriptome sequencing) data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tao Jiang ◽  
Shiqi Liu ◽  
Shuqi Cao ◽  
Yadong Liu ◽  
Zhe Cui ◽  
...  

Abstract Background With the rapid development of long-read sequencing technologies, it is possible to reveal the full spectrum of genetic structural variation (SV). However, the expensive cost, finite read length and high sequencing error for long-read data greatly limit the widespread adoption of SV calling. Therefore, it is urgent to establish guidance concerning sequencing coverage, read length, and error rate to maintain high SV yields and to achieve the lowest cost simultaneously. Results In this study, we generated a full range of simulated error-prone long-read datasets containing various sequencing settings and comprehensively evaluated the performance of SV calling with state-of-the-art long-read SV detection methods. The benchmark results demonstrate that almost all SV callers perform better when the long-read data reach 20× coverage, 20 kbp average read length, and approximately 10–7.5% or below 1% error rates. Furthermore, high sequencing coverage is the most influential factor in promoting SV calling, while it also directly determines the expensive costs. Conclusions Based on the comprehensive evaluation results, we provide important guidelines for selecting long-read sequencing settings for efficient SV calling. We believe these recommended settings of long-read sequencing will have extraordinary guiding significance in cutting-edge genomic studies and clinical practices.


2019 ◽  
Author(s):  
Yusmiati Liau ◽  
Simran Maggo ◽  
Allison L. Miller ◽  
John F. Pearson ◽  
Martin A. Kennedy ◽  
...  

AbstractBackgroundThe accurate genotyping of CYP2D6 is hindered by the very polymorphic nature of the gene, high homology with its pseudogene CYP2D7, and the occurrence of structural variations. Long read sequencing offers the promise of overcoming some of these challenges, along with the advantage of straightforward variant phasing. We have established methods for sequencing and analysis of DNA amplicons containing the whole CYP2D6 gene, using the GridION nanopore sequencer.Materials and methodsSeven reference and 25 clinical samples covering various haplotypes including gene duplication were barcoded and sequenced over two sequencing runs. Sequenced raw reads were analyzed using a pipeline of bioinformatics tools including two mapping tools and two variant calling tools.ResultsUsing minimap2 and nanopolish (mapping and variant calling tools respectively) resulted in the most accurate variant detection. Haplotypes of 52 alleles could be matched accurately to known alleles or subvariants, while the remaining 12 alleles being assigned as novel star (*) allele of novel subvariants of known alleles in the PharmVar CYP2D6 haplotype database. Allele duplication could be detected by analyzing the allelic balance between the sample haplotypes.ConclusionNanopore sequencing of CYP2D6 offers a high throughput method for genotyping, accurate haplotyping, and detection of new variants and duplicated alleles.


2021 ◽  
Author(s):  
Ning Wang ◽  
Vladislav Lysenkov ◽  
Katri Orte ◽  
Veli Kairisto ◽  
Juhani Aakko ◽  
...  

Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools on indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage, coupled with specific variant calling tools.


Sign in / Sign up

Export Citation Format

Share Document