Detection and visualization of complex structural variants from long reads

Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.

Download Full-text

MoMI-G: modular multi-scale integrated genome graph browser

BMC Bioinformatics ◽

10.1186/s12859-019-3145-2 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Toshiyuki T. Yokoyama ◽

Yoshitaka Sakamoto ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Masahiro Kasahara

Keyword(s):

Human Cancer ◽

Read Depth ◽

Structural Variants ◽

Structural Variations ◽

Multi Scale ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Complex Structural ◽

Genome Graph

Abstract Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.

Download Full-text

Systematic analysis of mutational spectra associated with DNA repair deficiency in C. elegans

10.1101/2020.06.04.133306 ◽

2020 ◽

Cited By ~ 1

Author(s):

B Meier ◽

NV Volkova ◽

Y Hong ◽

S Bertolini ◽

V González-Huici ◽

...

Keyword(s):

Dna Damage ◽

Dna Repair ◽

Excision Repair ◽

Structural Variants ◽

Systematic Analysis ◽

C Elegans ◽

Mutational Spectra ◽

Base Substitutions ◽

Repair Pathways ◽

Complex Structural

AbstractGenome integrity is particularly important in germ cells to faithfully preserve genetic information across generations. As yet little is known about the contribution of various DNA repair pathways to prevent mutagenesis. Using the C. elegans model we analyse mutational spectra that arise in wild-type and 61 DNA repair and DNA damage response mutants cultivated over multiple generations. Overall, 44% of lines show >2-fold increased mutagenesis with a broad spectrum of mutational outcomes including changes in single or multiple types of base substitutions induced by defects in base excision or nucleotide excision repair, or elevated levels of 50-400 bp deletions in translesion polymerase mutants rev-3(pol ζ) and polh-1(pol η). Mutational signatures associated with defective homologous recombination fall into two classes: 1) mutants lacking brc-1/BRCA1 or rad-51/RAD51 paralogs show elevated base substitutions, indels and structural variants, while 2) deficiency for MUS-81/MUS81 and SLX-1/SLX1 nucleases, and HIM-6/BLM, HELQ-1/HELQ and RTEL-1/RTEL1 helicases primarily cause structural variants. Genome-wide investigation of mutagenesis patterns identified elevated rates of tandem duplications often associated with inverted repeats in helq-1 mutants, and a unique pattern of ‘translocation’ events involving homeologous sequences in rip-1 paralog mutants. atm-1/ATM DNA damage checkpoint mutants harboured complex structural variants enriched in subtelomeric regions, and chromosome end-to-end fusions. Finally, while inactivation of the p53-like gene cep-1 did not affect mutagenesis, combined brc-1 cep-1 deficiency displayed increased, locally clustered mutagenesis. In summary, we provide a global view of how DNA repair pathways prevent germ cell mutagenesis.

Download Full-text

Fast and sensitive mapping of error-prone nanopore sequencing reads with GraphMap

10.1101/020719 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ivan Sovic ◽

Mile Sikic ◽

Andreas Wilm ◽

Shannon Nicole Fenlon ◽

Swaine Chen ◽

...

Keyword(s):

Human Genome ◽

Variant Calling ◽

Error Rates ◽

Nanopore Sequencing ◽

Structural Variants ◽

Specific Identification ◽

Long Reads ◽

Long Read ◽

Specific Error ◽

Very High

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

Download Full-text

Faculty Opinions recommendation of Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734615989.793574031 ◽

2020 ◽

Author(s):

Guy Rouleau

Keyword(s):

Genome Sequencing ◽

Structural Variants ◽

Mendelian Disorders ◽

Long Read ◽

Complex Structural

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

Ribbon: intuitive visualization for complex genomic variation

Bioinformatics ◽

10.1093/bioinformatics/btaa680 ◽

2020 ◽

Cited By ~ 5

Author(s):

Maria Nattestad ◽

Robert Aboukhalil ◽

Chen-Shan Chin ◽

Michael C Schatz

Keyword(s):

Genomic Variation ◽

Supplementary Information ◽

Visualization Tool ◽

Visualization Method ◽

Structural Variants ◽

Long Read ◽

Complex Structural ◽

Intuitive View ◽

Genome Comparisons ◽

Shed Light

Abstract Summary Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. Availability and implementation Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Targeted long-read sequencing resolves complex structural variants and identifies missing disease-causing variants

10.1101/2020.11.03.365395 ◽

2020 ◽

Author(s):

Danny E. Miller ◽

Arvis Sulovari ◽

Tianyun Wang ◽

Hailey Loucks ◽

Kendra Hoekzema ◽

...

Keyword(s):

Copy Number ◽

Genetic Diagnosis ◽

Clinical Testing ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Pathogenic Variants ◽

Long Read ◽

Repeat Expansions ◽

Complex Structural

ABSTRACTBACKGROUNDDespite widespread availability of clinical genetic testing, many individuals with suspected genetic conditions do not have a precise diagnosis. This limits their opportunity to take advantage of state-of-the-art treatments. In such instances, testing sometimes reveals difficult-to-evaluate complex structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in specific genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted.METHODSTargeted long-read sequencing (T-LRS) was performed on 33 individuals using Read Until on the Oxford Nanopore platform. This method allowed us to computationally target up to 100 Mbp of sequence per experiment, resulting in an average of 20x coverage of target regions, a 500% increase over background. We analyzed patient DNA for pathogenic substitutions, structural variants, and methylation differences using a single data source.RESULTSThe effectiveness of T-LRS was validated by detecting all genomic aberrations, including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences, previously identified by prior clinical testing. In 6/7 individuals who had complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, which led, in one case, to a change in clinical management. In nine individuals with suspected Mendelian conditions who lacked a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in five and variants of uncertain significance in two others.CONCLUSIONST-LRS can accurately predict pathogenic copy number variants and triplet repeat expansions, resolve complex rearrangements, and identify single-nucleotide variants not detected by other technologies, including short-read sequencing. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority candidate genes and regions or to further evaluate complex clinical testing results. The application of T-LRS will likely increase the diagnostic rate of rare disorders.

Download Full-text

Nanopore sequencing detects structural variants in cancer

10.1101/028290 ◽

2015 ◽

Cited By ~ 3

Author(s):

Alexis L. Norris ◽

Rachael E. Workman ◽

Yunfan Fan ◽

James R. Eshleman ◽

Winston Timp

Keyword(s):

Detection Efficiency ◽

Electrical Current ◽

Read Length ◽

Therapeutic Monitoring ◽

Base Substitution ◽

Nanopore Sequencing ◽

Structural Variants ◽

Large Deletions ◽

Long Reads ◽

Generation Sequencing

Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

Download Full-text

TSD: A computational tool to study the complex structural variants using PacBio targeted sequencing data

10.1101/474445 ◽

2018 ◽

Author(s):

Guofeng Meng ◽

Ying Tan ◽

Yue Fan ◽

Yan Wang ◽

Guang Yang ◽

...

Keyword(s):

Human Cell Line ◽

Targeted Sequencing ◽

Structural Variants ◽

Sequencing Data ◽

Rna Sequences ◽

Variant Discovery ◽

Powerful Approach ◽

Full Profile ◽

Long Read ◽

Complex Structural

ABSTRACTThe PacBio sequencing is a powerful approach to study the DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of internal or external sequences. However, there is still no tool designed to uncover their structural organization in the host genome. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd

Download Full-text