scholarly journals Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing

2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E. French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  
2018 ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.


2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


Author(s):  
Maria Nattestad ◽  
Robert Aboukhalil ◽  
Chen-Shan Chin ◽  
Michael C Schatz

Abstract Summary Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. Availability and implementation Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Danny E. Miller ◽  
Arvis Sulovari ◽  
Tianyun Wang ◽  
Hailey Loucks ◽  
Kendra Hoekzema ◽  
...  

ABSTRACTBACKGROUNDDespite widespread availability of clinical genetic testing, many individuals with suspected genetic conditions do not have a precise diagnosis. This limits their opportunity to take advantage of state-of-the-art treatments. In such instances, testing sometimes reveals difficult-to-evaluate complex structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in specific genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted.METHODSTargeted long-read sequencing (T-LRS) was performed on 33 individuals using Read Until on the Oxford Nanopore platform. This method allowed us to computationally target up to 100 Mbp of sequence per experiment, resulting in an average of 20x coverage of target regions, a 500% increase over background. We analyzed patient DNA for pathogenic substitutions, structural variants, and methylation differences using a single data source.RESULTSThe effectiveness of T-LRS was validated by detecting all genomic aberrations, including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences, previously identified by prior clinical testing. In 6/7 individuals who had complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, which led, in one case, to a change in clinical management. In nine individuals with suspected Mendelian conditions who lacked a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in five and variants of uncertain significance in two others.CONCLUSIONST-LRS can accurately predict pathogenic copy number variants and triplet repeat expansions, resolve complex rearrangements, and identify single-nucleotide variants not detected by other technologies, including short-read sequencing. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority candidate genes and regions or to further evaluate complex clinical testing results. The application of T-LRS will likely increase the diagnostic rate of rare disorders.


2018 ◽  
Author(s):  
Guofeng Meng ◽  
Ying Tan ◽  
Yue Fan ◽  
Yan Wang ◽  
Guang Yang ◽  
...  

ABSTRACTThe PacBio sequencing is a powerful approach to study the DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of internal or external sequences. However, there is still no tool designed to uncover their structural organization in the host genome. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd


2020 ◽  
Author(s):  
Andrew G. Sharo ◽  
Zhiqiang Hu ◽  
Steven E. Brenner

AbstractWhole genome sequencing resolves clinical cases where standard diagnostic methods have failed. However, preliminary studies show that at least half of these cases still remain unresolved, even after whole genome sequencing. Structural variants (genomic variants larger than 50 base pairs) of uncertain significance may be the genetic cause of a portion of these unresolved cases. Historically, structural variants (SVs) have been difficult to detect with confidence from short-read sequencing. As both detection algorithms and long-read/linked-read sequencing methods become more accessible, clinical researchers will have access to thousands of reliable SVs of unknown disease relevance. Filtering these SVs by overlap with cataloged SVs is an imperfect solution. Innovative methods to predict the pathogenicity of these SVs will be needed to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE (Structural Variant Classifier Trained on Variants Rare and Exonic), a classifier that can be used to distinguish pathogenic SVs from benign SVs that overlap exons. We made use of features that capture gene importance, coding region, conservation, expression, and exon structure in a random forest classifier. We found that some features, such as expression and conservation, are important but are absent from SV classification guidelines. Although databases of SVs reflect size biases from sequencing techniques, we leveraged multiple databases to construct a size-matched training set of rare, putatively benign and pathogenic SVs. In independent test sets, we found our method performs accurately across a wide SV size range, which will allow clinical researchers to eliminate nearly 60% of SVs from consideration at an elevated sensitivity of 90%. However, our method and its assessment are still constrained by a small training dataset and acquisition bias in databases of pathogenic variants. StrVCTVRE fills an empty niche in the clinical evaluation of SVs of unknown significance. We anticipate researchers will use it to prioritize SVs in patients where no variant is immediately compelling, empowering deeper investigation into novel SVs and disease genes to resolve cases.


2021 ◽  
Author(s):  
Jonas Elsner ◽  
Martin A. Mensah ◽  
Manuel Holtgrewe ◽  
Jakob Hertzberg ◽  
Stefania Bigoni ◽  
...  

AbstractThe extensive clinical and genetic heterogeneity of congenital limb malformation calls for comprehensive genome-wide analysis of genetic variation. Genome sequencing (GS) has the potential to identify all genetic variants. Here we aim to determine the diagnostic potential of GS as a comprehensive one-test-for-all strategy in a cohort of undiagnosed patients with congenital limb malformations. We collected 69 cases (64 trios, 1 duo, 5 singletons) with congenital limb malformations with no molecular diagnosis after standard clinical genetic testing and performed genome sequencing. We also developed a framework to identify potential noncoding pathogenic variants. We identified likely pathogenic/disease-associated variants in 12 cases (17.4%) including four in known disease genes, and one repeat expansion in HOXD13. In three unrelated cases with ectrodactyly, we identified likely pathogenic variants in UBA2, establishing it as a novel disease gene. In addition, we found two complex structural variants (3%). We also identified likely causative variants in three novel high confidence candidate genes. We were not able to identify any noncoding variants. GS is a powerful strategy to identify all types of genomic variants associated with congenital limb malformation, including repeat expansions and complex structural variants missed by standard diagnostic approaches. In this cohort, no causative noncoding SNVs could be identified.


Neurology ◽  
2021 ◽  
Vol 96 (13) ◽  
pp. e1770-e1782
Author(s):  
Elizabeth Emma Palmer ◽  
Rani Sachdev ◽  
Rebecca Macintosh ◽  
Uirá Souto Melo ◽  
Stefan Mundlos ◽  
...  

ObjectiveTo assess the benefits and limitations of whole genome sequencing (WGS) compared to exome sequencing (ES) or multigene panel (MGP) in the molecular diagnosis of developmental and epileptic encephalopathies (DEE).MethodsWe performed WGS of 30 comprehensively phenotyped DEE patient trios that were undiagnosed after first-tier testing, including chromosomal microarray and either research ES (n = 15) or diagnostic MGP (n = 15).ResultsEight diagnoses were made in the 15 individuals who received prior ES (53%): 3 individuals had complex structural variants; 5 had ES-detectable variants, which now had additional evidence for pathogenicity. Eleven diagnoses were made in the 15 MGP-negative individuals (68%); the majority (n = 10) involved genes not included in the panel, particularly in individuals with postneonatal onset of seizures and those with more complex presentations including movement disorders, dysmorphic features, or multiorgan involvement. A total of 42% of diagnoses were autosomal recessive or X-chromosome linked.ConclusionWGS was able to improve diagnostic yield over ES primarily through the detection of complex structural variants (n = 3). The higher diagnostic yield was otherwise better attributed to the power of re-analysis rather than inherent advantages of the WGS platform. Additional research is required to assist in the assessment of pathogenicity of novel noncoding and complex structural variants and further improve diagnostic yield for patients with DEE and other neurogenetic disorders.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Toshiyuki T. Yokoyama ◽  
Yoshitaka Sakamoto ◽  
Masahide Seki ◽  
Yutaka Suzuki ◽  
Masahiro Kasahara

Abstract Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.


Sign in / Sign up

Export Citation Format

Share Document