scholarly journals Nanopore sequencing detects structural variants in cancer

2015 ◽  
Author(s):  
Alexis L. Norris ◽  
Rachael E. Workman ◽  
Yunfan Fan ◽  
James R. Eshleman ◽  
Winston Timp

Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

2015 ◽  
Author(s):  
Ivan Sovic ◽  
Mile Sikic ◽  
Andreas Wilm ◽  
Shannon Nicole Fenlon ◽  
Swaine Chen ◽  
...  

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.


2020 ◽  
Vol 48 (9) ◽  
pp. 4940-4945
Author(s):  
Pieter Spealman ◽  
Jaden Burrell ◽  
David Gresham

Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.


2020 ◽  
Author(s):  
Anand Ramachandran ◽  
Steven S. Lumetta ◽  
Eric Klee ◽  
Deming Chen

AbstractNext Generation Sequencing (NGS) technologies that cost-effectively characterize genomic regions and identify sequence variations using short reads are the current standard for genome sequencing. However, calling small indels in low-complexity regions of the genome using NGS is challenging. Recent advances in Third Generation Sequencing (TGS) provide long reads, which call large-structural variants accurately. However, these reads have context-dependent indel errors in low-complexity regions, resulting in lower accuracy of small indel calls compared to NGS reads. When both small and large-structural variants need to be called, both NGS and TGS reads may be available. Integration of the two data types with unique error profiles could improve robustness of small variant calling in challenging cases. However, there isn’t currently such a method integrating both types of data. We present a novel method that integrates NGS and TGS reads to call small variants. We leverage the Mixture of Experts paradigm which uses an ensemble of Deep Neural Networks (DNN), each processing a different data type to make predictions. We present improvements in our DNN design compared to previous work such as sequence processing using one-dimensional convolutions instead of image processing using two-dimensional convolutions and an algorithm to efficiently process sites with many variant candidates, which help us reduce computations. Using our method to integrate Illumina and PacBio reads, we find a reduction in the number of erroneous small variant calls of up to ~30%, compared to the state-of-the-art using only Illumina data. We also find improvements in calling small indels in low-complexity regions.


2021 ◽  
Vol 22 (18) ◽  
pp. 10069
Author(s):  
Ying Cao ◽  
Haizhou Liu ◽  
Yi Yan ◽  
Wenjun Liu ◽  
Di Liu ◽  
...  

Influenza viruses still pose a serious threat to humans, and we have not yet been able to effectively predict future pandemic strains and prepare vaccines in advance. One of the main reasons is the high genetic diversity of influenza viruses. We do not know the individual clonotypes of a virus population because some are the majority and others make up only a small fraction of the population. First-generation (FGS) and next-generation sequencing (NGS) technologies have inherent limitations that are unable to resolve a minority clonotype’s information in the virus population. Third-generation sequencing (TGS) technologies with ultra-long reads have the potential to solve this problem but have a high error rate. Here, we evaluated emerging direct RNA sequencing and cDNA sequencing with the MinION platform and established a novel approach that combines the high accuracy of Illumina sequencing technology and long reads of nanopore sequencing technology to resolve both variants and clonotypes of influenza virus. Furthermore, a new program was written to eliminate the effect of nanopore sequencing errors for the analysis of the results. By using this pipeline, we identified 47 clonotypes in our experiment. We conclude that this approach can quickly discriminate the clonotypes of virus genes, allowing researchers to understand virus adaptation and evolution at the population level.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Wouter De Coster ◽  
Mojca Strazisar ◽  
Peter De Rijk

Abstract Long-read sequencing has substantial advantages for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used long reads simulated from human genomes and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequencing projects.


2019 ◽  
Author(s):  
De Coster Wouter ◽  
Strazisar Mojca ◽  
De Rijk Peter

AbstractLong read sequencing has a substantial advantage for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used simulated long reads and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 15 kbp. Haplotyping genes entirely only reaches its optimum from reads of 100 kbp. These findings are important for the design of future long read sequencing projects.


2020 ◽  
Vol 15 ◽  
Author(s):  
Hongdong Li ◽  
Wenjing Zhang ◽  
Yuwen Luo ◽  
Jianxin Wang

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.


2020 ◽  
Vol 48 (12) ◽  
pp. 030006052096777
Author(s):  
Peisong Chen ◽  
Xuegao Yu ◽  
Hao Huang ◽  
Wentao Zeng ◽  
Xiaohong He ◽  
...  

Introduction To evaluate a next-generation sequencing (NGS) workflow in the screening and diagnosis of thalassemia. Methods In this prospective study, blood samples were obtained from people undergoing genetic screening for thalassemia at our centre in Guangzhou, China. Genomic DNA was polymerase chain reaction (PCR)-amplified and sequenced using the Ion Torrent system and results compared with traditional genetic analyses. Results Of the 359 subjects, 148 (41%) were confirmed to have thalassemia. Variant detection identified 35 different types including the most common. Identification of the mutational sites by NGS were consistent with those identified by Sanger sequencing and Gap-PCR. The sensitivity and specificities of the Ion Torrent NGS were 100%. In a separate test of 16 samples, results were consistent when repeated ten times. Conclusion Our NGS workflow based on the Ion Torrent sequencer was successful in the detection of large deletions and non-deletional defects in thalassemia with high accuracy and repeatability.


2021 ◽  
Vol 22 (12) ◽  
pp. 6410
Author(s):  
Vasily Smirnov ◽  
Olivier Grunewald ◽  
Jean Muller ◽  
Christina Zeitz ◽  
Carolin D. Obermaier ◽  
...  

Variants of the TTLL5 gene, which encodes tubulin tyrosine ligase-like family member five, are a rare cause of cone dystrophy (COD) or cone-rod dystrophy (CORD). To date, only a few TTLL5 patients have been clinically and genetically described. In this study, we report five patients harbouring biallelic variants of TTLL5. Four adult patients presented either COD or CORD with onset in the late teenage years. The youngest patient had a phenotype of early onset severe retinal dystrophy (EOSRD). Genetic analysis was performed by targeted next generation sequencing of gene panels and assessment of copy number variants (CNV). We identified eight variants, of which six were novel, including two large multiexon deletions in patients with COD or CORD, while the EOSRD patient harboured the novel homozygous p.(Trp640*) variant and three distinct USH2A variants, which might explain the observed rod involvement. Our study highlights the role of TTLL5 in COD/CORD and the importance of large deletions. These findings suggest that COD or CORD patients lacking variants in known genes may harbour CNVs to be discovered in TTLL5, previously undetected by classical sequencing methods. In addition, variable phenotypes in TTLL5-associated patients might be due to the presence of additional gene defects.


Sign in / Sign up

Export Citation Format

Share Document