Nanopore sequencing of the pharmacogene CYP2D6 allows simultaneous haplotyping and detection of duplications

AbstractBackgroundThe accurate genotyping of CYP2D6 is hindered by the very polymorphic nature of the gene, high homology with its pseudogene CYP2D7, and the occurrence of structural variations. Long read sequencing offers the promise of overcoming some of these challenges, along with the advantage of straightforward variant phasing. We have established methods for sequencing and analysis of DNA amplicons containing the whole CYP2D6 gene, using the GridION nanopore sequencer.Materials and methodsSeven reference and 25 clinical samples covering various haplotypes including gene duplication were barcoded and sequenced over two sequencing runs. Sequenced raw reads were analyzed using a pipeline of bioinformatics tools including two mapping tools and two variant calling tools.ResultsUsing minimap2 and nanopolish (mapping and variant calling tools respectively) resulted in the most accurate variant detection. Haplotypes of 52 alleles could be matched accurately to known alleles or subvariants, while the remaining 12 alleles being assigned as novel star (*) allele of novel subvariants of known alleles in the PharmVar CYP2D6 haplotype database. Allele duplication could be detected by analyzing the allelic balance between the sample haplotypes.ConclusionNanopore sequencing of CYP2D6 offers a high throughput method for genotyping, accurate haplotyping, and detection of new variants and duplicated alleles.

Download Full-text

Fast and sensitive mapping of error-prone nanopore sequencing reads with GraphMap

10.1101/020719 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ivan Sovic ◽

Mile Sikic ◽

Andreas Wilm ◽

Shannon Nicole Fenlon ◽

Swaine Chen ◽

...

Keyword(s):

Human Genome ◽

Variant Calling ◽

Error Rates ◽

Nanopore Sequencing ◽

Structural Variants ◽

Specific Identification ◽

Long Reads ◽

Long Read ◽

Specific Error ◽

Very High

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks

10.1101/2019.12.29.890418 ◽

2019 ◽

Cited By ~ 1

Author(s):

Umair Ahsan ◽

Qian Liu ◽

Li Fang ◽

Kai Wang

Keyword(s):

Deep Neural Network ◽

Deep Neural Networks ◽

Variant Calling ◽

Sequencing Data ◽

Long Reads ◽

Novel Variants ◽

Long Read ◽

Variant Detection ◽

Genomic Regions ◽

Haplotype Information

AbstractVariant (SNPs/indels) detection from high-throughput sequencing data remains an important yet unresolved problem. Long-read sequencing enables variant detection in difficult-to-map genomic regions that short-read sequencing cannot reliably examine (for example, only ~80% of genomic regions are marked as “high-confidence region” to have SNP/indel calls in the Genome In A Bottle project); however, the high per-base error rate poses unique challenges in variant detection. Existing methods on long-read data typically rely on analyzing pileup information from neighboring bases surrounding a candidate variant, similar to short-read variant callers, yet the benefits of much longer read length are not fully exploited. Here we present a deep neural network called NanoCaller, which detects SNPs by examining pileup information solely from other nonadjacent candidate SNPs that share the same long reads using long-range haplotype information. With called SNPs by NanoCaller, NanoCaller phases long reads and performs local realignment on two sets of phased reads to call indels by another deep neural network. Extensive evaluation on 5 human genomes (sequenced by Nanopore and PacBio long-read techniques) demonstrated that NanoCaller greatly improved performance in difficult-to-map regions, compared to other long-read variant callers. We experimentally validated 41 novel variants in difficult-to-map regions in a widely-used benchmarking genome, which cannot be reliably detected previously. We extensively evaluated the run-time characteristics and the sensitivity of parameter settings of NanoCaller to different characteristics of sequencing data. Finally, we achieved the best performance in Nanopore-based variant calling from MHC regions in the PrecisionFDA Variant Calling Challenge on Difficult-to-Map Regions by ensemble calling. In summary, by incorporating haplotype information in deep neural networks, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing data.

Download Full-text

LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants

10.1101/2021.09.09.459623 ◽

2021 ◽

Author(s):

Jyun-Hong Lin ◽

Liang-Chi Chen ◽

Shu-Qi Yu ◽

Yao-Ting Huang

Keyword(s):

Variant Calling ◽

Cost Effective ◽

Nucleotide Polymorphisms ◽

Structural Variations ◽

Single Nucleotide ◽

Chromosome Conformation ◽

Long Reads ◽

Cost Effective Approach ◽

Long Read ◽

Microbial Strains

AbstractLong-read phasing has been used for reconstructing diploid genomes, improving variant calling, and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing. This paper presents an ultra-fast algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in ∼10-20 minutes, 10x faster than the state-of-the-art WhatsHap and Margin. In particular, LongPhase produces much larger phased blocks at almost chromosome level with only long reads (N50=26Mbp). We demonstrate that LongPhase combined with Nanopore is a cost-effective approach for providing chromosome-scale phasing without the need for additional trios, chromosome-conformation, and single-cell strand-seq data.

Download Full-text

Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes

F1000Research ◽

10.12688/f1000research.6037.2 ◽

2015 ◽

Vol 4 ◽

pp. 17 ◽

Cited By ~ 55

Author(s):

Ron Ammar ◽

Tara A. Paton ◽

Dax Torti ◽

Adam Shlien ◽

Gary D. Bader

Keyword(s):

Medical Decision ◽

Nanopore Sequencing ◽

Clinical Environment ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Complete Genomics ◽

Nanopore Sequencer ◽

Actionable Findings ◽

Haplotype Information

Haplotypes are often critical for the interpretation of genetic laboratory observations into medically actionable findings. Current massively parallel DNA sequencing technologies produce short sequence reads that are often unable to resolve haplotype information. Phasing short read data typically requires supplemental statistical phasing based on known haplotype structure in the population or parental genotypic data. Here we demonstrate that the MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing. Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes. Our results demonstrate that nanopore sequencing is an emerging standalone technology with potential utility in a clinical environment to aid in medical decision-making.

Download Full-text

ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell

10.1101/2021.04.05.438455 ◽

2021 ◽

Author(s):

Amy Wing-Sze Leung ◽

Henry Chi-Ming Leung ◽

Chak-Lim Wong ◽

Zhen-Xian Zheng ◽

Wui-Wang Lui ◽

...

Keyword(s):

Variant Calling ◽

Cost Effective ◽

Turnaround Time ◽

Read Length ◽

Sequencing Error ◽

Target Enrichment ◽

Long Read ◽

Wet Lab ◽

Variant Detection ◽

Clinically Significant

Background: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample. Method: We introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4,800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples. Results: ECNano achieved deep on-target depth of coverage (DoC) at average >100x and >98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30x DoC. ECNano obtained an average read length of 1,000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30x DoC. Clair-ensemble achieved >99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days. Conclusion: We presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4,800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.

Download Full-text

Evaluating nanopore sequencing data processing pipelines for structural variation identification

Genome Biology ◽

10.1186/s13059-019-1858-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 9

Author(s):

Anbo Zhou ◽

Timothy Lin ◽

Jinchuan Xing

Keyword(s):

Detection Accuracy ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Sequencing Technology ◽

Structural Variations ◽

Human Genomes ◽

Data Assessment ◽

Machine Learning Approach ◽

Long Read ◽

The Impact

Abstract Background Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. Results Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers’ performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. Conclusions We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.

Download Full-text

Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes

F1000Research ◽

10.12688/f1000research.6037.1 ◽

2015 ◽

Vol 4 ◽

pp. 17 ◽

Cited By ~ 8

Author(s):

Ron Ammar ◽

Tara A. Paton ◽

Dax Torti ◽

Adam Shlien ◽

Gary D. Bader

Keyword(s):

Medical Decision ◽

Nanopore Sequencing ◽

Clinical Environment ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Complete Genomics ◽

Nanopore Sequencer ◽

Actionable Findings ◽

Haplotype Information

Download Full-text

BugSeq: a highly accurate cloud platform for long-read metagenomic analyses

BMC Bioinformatics ◽

10.1186/s12859-021-04089-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jeremy Fan ◽

Steven Huang ◽

Samuel D. Chorlton

Keyword(s):

Respiratory Tract Infections ◽

Simulated Data ◽

Clinical Samples ◽

Command Line ◽

Nanopore Sequencing ◽

Cloud Platform ◽

Lower Respiratory Tract Infections ◽

Commercial Use ◽

Long Read ◽

Tract Infections

Abstract Background As the use of nanopore sequencing for metagenomic analysis increases, tools capable of performing long-read taxonomic classification (ie. determining the composition of a sample) in a fast and accurate manner are needed. Existing tools were either designed for short-read data (eg. Centrifuge), take days to analyse modern sequencer outputs (eg. MetaMaps) or suffer from suboptimal accuracy (eg. CDKAM). Additionally, all tools require command line expertise and do not scale in the cloud. Results We present BugSeq, a novel, highly accurate metagenomic classifier for nanopore reads. We evaluate BugSeq on simulated data, mock microbial communities and real clinical samples. On the ZymoBIOMICS Even and Log communities, BugSeq (F1 = 0.95 at species level) offers better read classification than MetaMaps (F1 = 0.89–0.94) in a fraction of the time. BugSeq significantly improves on the accuracy of Centrifuge (F1 = 0.79–0.93) and CDKAM (F1 = 0.91–0.94) while offering competitive run times. When applied to 41 samples from patients with lower respiratory tract infections, BugSeq produces greater concordance with microbiological culture and qPCR compared with “What’s In My Pot” analysis. Conclusion BugSeq is deployed to the cloud for easy and scalable long-read metagenomic analyses. BugSeq is freely available for non-commercial use at https://bugseq.com/free.

Download Full-text

Nanopore sequencing of the pharmacogene CYP2D6 allows simultaneous haplotyping and detection of duplications

Pharmacogenomics ◽

10.2217/pgs-2019-0080 ◽

2019 ◽

Vol 20 (14) ◽

pp. 1033-1047 ◽

Cited By ~ 9

Author(s):

Yusmiati Liau ◽

Simran Maggo ◽

Allison L Miller ◽

John F Pearson ◽

Martin A Kennedy ◽

...

Keyword(s):

Gene Duplication ◽

High Throughput ◽

Nanopore Sequencing ◽

Cyp2d6 Gene ◽

High Throughput Method ◽

Long Read ◽

New Alleles

Aim: Long read sequencing offers the promise of overcoming some of the challenges in accurate genotyping of complex genes, along with the advantage of straightforward variant phasing. We have established methods for sequencing and haplotyping of the whole CYP2D6 gene using nanopore sequencing. Materials and methods: 32 samples covering various haplotypes including gene duplication were sequenced on the GridION platform. Results: Haplotypes of 52 alleles matched accurately to known star (*) allele subvariants, with the remaining 12 being assigned as new alleles, or new subvariants of known alleles. Duplicated alleles could be detected by analyzing the allelic balance. Conclusion: Nanopore sequencing of CYP2D6 offers a high throughput method for accurate haplotyping, detection of new variants and determination of duplicated alleles.

Download Full-text