TSD: A computational tool to study the complex structural variants using PacBio targeted sequencing data

ABSTRACTThe PacBio sequencing is a powerful approach to study the DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of internal or external sequences. However, there is still no tool designed to uncover their structural organization in the host genome. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd

Download Full-text

TSD: A Computational Tool To Study the Complex Structural Variants Using PacBio Targeted Sequencing Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.118.200900 ◽

2019 ◽

Vol 9 (5) ◽

pp. 1371-1376 ◽

Cited By ~ 1

Author(s):

Guofeng Meng ◽

Ying Tan ◽

Yue Fan ◽

Yan Wang ◽

Guang Yang ◽

...

Keyword(s):

Targeted Sequencing ◽

Structural Variants ◽

Computational Tool ◽

Sequencing Data ◽

Complex Structural

Download Full-text

Faculty Opinions recommendation of Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.734615989.793574031 ◽

2020 ◽

Author(s):

Guy Rouleau

Keyword(s):

Genome Sequencing ◽

Structural Variants ◽

Mendelian Disorders ◽

Long Read ◽

Complex Structural

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

Trio deep-sequencing does not reveal unexpected off-target and on-target mutations in Cas9-edited rhesus monkeys

Nature Communications ◽

10.1038/s41467-019-13481-y ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 8

Author(s):

Xin Luo ◽

Yaoxi He ◽

Chao Zhang ◽

Xiechao He ◽

Lanzhen Yan ◽

...

Keyword(s):

Rhesus Monkeys ◽

De Novo ◽

Preclinical Model ◽

Structural Variants ◽

Sequencing Data ◽

De Novo Mutations ◽

Target Region ◽

Long Read ◽

Target Effect

AbstractCRISPR-Cas9 is a widely-used genome editing tool, but its off-target effect and on-target complex mutations remain a concern, especially in view of future clinical applications. Non-human primates (NHPs) share close genetic and physiological similarities with humans, making them an ideal preclinical model for developing Cas9-based therapies. However, to our knowledge no comprehensive in vivo off-target and on-target assessment has been conducted in NHPs. Here, we perform whole genome trio sequencing of Cas9-treated rhesus monkeys. We only find a small number of de novo mutations that can be explained by expected spontaneous mutations, and no unexpected off-target mutations (OTMs) were detected. Furthermore, the long-read sequencing data does not detect large structural variants in the target region.

Download Full-text

Ribbon: intuitive visualization for complex genomic variation

Bioinformatics ◽

10.1093/bioinformatics/btaa680 ◽

2020 ◽

Cited By ~ 5

Author(s):

Maria Nattestad ◽

Robert Aboukhalil ◽

Chen-Shan Chin ◽

Michael C Schatz

Keyword(s):

Genomic Variation ◽

Supplementary Information ◽

Visualization Tool ◽

Visualization Method ◽

Structural Variants ◽

Long Read ◽

Complex Structural ◽

Intuitive View ◽

Genome Comparisons ◽

Shed Light

Abstract Summary Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. Availability and implementation Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data

10.1101/409789 ◽

2018 ◽

Cited By ~ 2

Author(s):

Li Fang ◽

Charlly Kao ◽

Michael V Gonzalez ◽

Fernanda A Mafra ◽

Renata Pellegrino da Silva ◽

...

Keyword(s):

Exome Sequencing ◽

Read Depth ◽

Structural Variants ◽

Sequencing Data ◽

High Coverage ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Studies ◽

Long Read ◽

Local Assembly

AbstractLinked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve the detection and breakpoint identification for structural variants (SVs). We present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrates that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.

Download Full-text

Targeted long-read sequencing resolves complex structural variants and identifies missing disease-causing variants

10.1101/2020.11.03.365395 ◽

2020 ◽

Author(s):

Danny E. Miller ◽

Arvis Sulovari ◽

Tianyun Wang ◽

Hailey Loucks ◽

Kendra Hoekzema ◽

...

Keyword(s):

Copy Number ◽

Genetic Diagnosis ◽

Clinical Testing ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Pathogenic Variants ◽

Long Read ◽

Repeat Expansions ◽

Complex Structural

ABSTRACTBACKGROUNDDespite widespread availability of clinical genetic testing, many individuals with suspected genetic conditions do not have a precise diagnosis. This limits their opportunity to take advantage of state-of-the-art treatments. In such instances, testing sometimes reveals difficult-to-evaluate complex structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in specific genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted.METHODSTargeted long-read sequencing (T-LRS) was performed on 33 individuals using Read Until on the Oxford Nanopore platform. This method allowed us to computationally target up to 100 Mbp of sequence per experiment, resulting in an average of 20x coverage of target regions, a 500% increase over background. We analyzed patient DNA for pathogenic substitutions, structural variants, and methylation differences using a single data source.RESULTSThe effectiveness of T-LRS was validated by detecting all genomic aberrations, including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences, previously identified by prior clinical testing. In 6/7 individuals who had complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, which led, in one case, to a change in clinical management. In nine individuals with suspected Mendelian conditions who lacked a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in five and variants of uncertain significance in two others.CONCLUSIONST-LRS can accurately predict pathogenic copy number variants and triplet repeat expansions, resolve complex rearrangements, and identify single-nucleotide variants not detected by other technologies, including short-read sequencing. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority candidate genes and regions or to further evaluate complex clinical testing results. The application of T-LRS will likely increase the diagnostic rate of rare disorders.

Download Full-text

Resolving Complex Structural Genomic Rearrangements using a Randomized Approach

10.1101/028217 ◽

2015 ◽

Author(s):

Xuefang Zhao ◽

Sarah B. Emery ◽

Bridget Myers ◽

Jeffrey M. Kidd ◽

Ryan E. Mills

Keyword(s):

Chromosomal Rearrangements ◽

Genome Structure ◽

Genomic Rearrangements ◽

Sequencing Data ◽

Genomic Alterations ◽

Structural Genomic ◽

Homologous Chromosomes ◽

Complex Chromosomal Rearrangements ◽

Long Read ◽

Complex Structural

Complex chromosomal rearrangements consist of structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that first identifies regions of the genome suspected to harbor a complex event and then iteratively rearranges the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. We show that SVelter is able to accurately reconstruct these regions when compared to well-characterized genomes that have been deep sequenced with both short and long read technologies.

Download Full-text

NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

10.1101/092544 ◽

2016 ◽

Author(s):

Li Fang ◽

Jiang Hu ◽

Depeng Wang ◽

Kai Wang

Keyword(s):

Whole Genome ◽

Ashkenazi Jewish ◽

Structural Variants ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Human Genomes ◽

Long Read ◽

Personal Genomes ◽

Low Coverage

AbstractBackgroundStructural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.ResultsIn this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5% to 94.1% for deletions and 87.9% to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset.ConclusionsOur results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.

Download Full-text