scholarly journals Complex Structural Variants Resolved by Short-Read and Long-Read Whole Genome Sequencing in Mendelian Disorders

2018 ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.

2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


2017 ◽  
Vol 5 (42) ◽  
Author(s):  
S. Wesley Long ◽  
Sarah E. Linson ◽  
Matthew Ojeda Saavedra ◽  
Concepcion Cantu ◽  
James J. Davis ◽  
...  

ABSTRACT In a study of 1,777 Klebsiella strains, we discovered KPN1705, which was distinct from all recognized Klebsiella spp. We closed the genome of strain KPN1705 using a hybrid of Illumina short-read and Oxford Nanopore long-read technologies. For this novel species, we propose the name Klebsiella quasivariicola sp. nov.


2018 ◽  
Author(s):  
Mark T. W. Ebbert ◽  
Stefan Farrugia ◽  
Jonathon Sens ◽  
Karen Jansen-West ◽  
Tania F. Gendron ◽  
...  

AbstractBackground: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinlON was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8x coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >800x coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >99% G4C2 content, though we cannot rule out small interruptions.Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.


2020 ◽  
Author(s):  
Andrew G. Sharo ◽  
Zhiqiang Hu ◽  
Steven E. Brenner

AbstractWhole genome sequencing resolves clinical cases where standard diagnostic methods have failed. However, preliminary studies show that at least half of these cases still remain unresolved, even after whole genome sequencing. Structural variants (genomic variants larger than 50 base pairs) of uncertain significance may be the genetic cause of a portion of these unresolved cases. Historically, structural variants (SVs) have been difficult to detect with confidence from short-read sequencing. As both detection algorithms and long-read/linked-read sequencing methods become more accessible, clinical researchers will have access to thousands of reliable SVs of unknown disease relevance. Filtering these SVs by overlap with cataloged SVs is an imperfect solution. Innovative methods to predict the pathogenicity of these SVs will be needed to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE (Structural Variant Classifier Trained on Variants Rare and Exonic), a classifier that can be used to distinguish pathogenic SVs from benign SVs that overlap exons. We made use of features that capture gene importance, coding region, conservation, expression, and exon structure in a random forest classifier. We found that some features, such as expression and conservation, are important but are absent from SV classification guidelines. Although databases of SVs reflect size biases from sequencing techniques, we leveraged multiple databases to construct a size-matched training set of rare, putatively benign and pathogenic SVs. In independent test sets, we found our method performs accurately across a wide SV size range, which will allow clinical researchers to eliminate nearly 60% of SVs from consideration at an elevated sensitivity of 90%. However, our method and its assessment are still constrained by a small training dataset and acquisition bias in databases of pathogenic variants. StrVCTVRE fills an empty niche in the clinical evaluation of SVs of unknown significance. We anticipate researchers will use it to prioritize SVs in patients where no variant is immediately compelling, empowering deeper investigation into novel SVs and disease genes to resolve cases.


Neurology ◽  
2021 ◽  
Vol 96 (13) ◽  
pp. e1770-e1782
Author(s):  
Elizabeth Emma Palmer ◽  
Rani Sachdev ◽  
Rebecca Macintosh ◽  
Uirá Souto Melo ◽  
Stefan Mundlos ◽  
...  

ObjectiveTo assess the benefits and limitations of whole genome sequencing (WGS) compared to exome sequencing (ES) or multigene panel (MGP) in the molecular diagnosis of developmental and epileptic encephalopathies (DEE).MethodsWe performed WGS of 30 comprehensively phenotyped DEE patient trios that were undiagnosed after first-tier testing, including chromosomal microarray and either research ES (n = 15) or diagnostic MGP (n = 15).ResultsEight diagnoses were made in the 15 individuals who received prior ES (53%): 3 individuals had complex structural variants; 5 had ES-detectable variants, which now had additional evidence for pathogenicity. Eleven diagnoses were made in the 15 MGP-negative individuals (68%); the majority (n = 10) involved genes not included in the panel, particularly in individuals with postneonatal onset of seizures and those with more complex presentations including movement disorders, dysmorphic features, or multiorgan involvement. A total of 42% of diagnoses were autosomal recessive or X-chromosome linked.ConclusionWGS was able to improve diagnostic yield over ES primarily through the detection of complex structural variants (n = 3). The higher diagnostic yield was otherwise better attributed to the power of re-analysis rather than inherent advantages of the WGS platform. Additional research is required to assist in the assessment of pathogenicity of novel noncoding and complex structural variants and further improve diagnostic yield for patients with DEE and other neurogenetic disorders.


2018 ◽  
Author(s):  
Jessica Nordlund ◽  
Yanara Marincevic-Zuniga ◽  
Lucia Cavelier ◽  
Amanda Raine ◽  
Tom Martin ◽  
...  

ABSTRACTStructural chromosomal rearrangements that may lead to in-frame gene-fusions represent a leading source of information for diagnosis, risk stratification, and prognosis in pediatric acute lymphoblastic leukemia (ALL). However, short-read whole genome sequencing (WGS) technologies struggle to accurately identify and phase such large-scale chromosomal aberrations in cancer genomes. We therefore evaluated linked-read WGS for detection of chromosomal rearrangements in an ALL cell line (REH) and primary samples of varying DNA quality from 12 patients diagnosed with ALL. We assessed the effect of input DNA quality on phased haplotype block size and the detectability of copy number aberrations (CNAs) and structural variants (SVs). Biobanked DNA isolated by standard column-based extraction methods was sufficient to detect chromosomal rearrangements even at low 10x sequencing coverage. Linked-read WGS enabled precise, allele-specific, digital karyotyping at a base-pair resolution for a wide range of structural variants including complex rearrangements and aneuploidy assessment. With use of haplotype information from the linked-reads, we also identified additional structural variants, such as a compound heterozygous deletion of ERG in a patient with the DUX4-IGH fusion gene. Thus, linked-read WGS allows detection of important pathogenic variants in ALL genomes at a resolution beyond that of traditional karyotyping or short-read WGS.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4588 ◽  
Author(s):  
Märt Roosaare ◽  
Mikk Puustusmaa ◽  
Märt Möls ◽  
Mihkel Vaher ◽  
Maido Remm

BackgroundPlasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.ResultsWe developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based onk-mers and usesk-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.ConclusionPlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub:https://github.com/bioinfo-ut/PlasmidSeeker.


2020 ◽  
Vol 18 (2) ◽  
pp. 197-208
Author(s):  
Le Tung Lam ◽  
Nguyen Trung Hieu ◽  
Nguyen Hong Trang ◽  
Ho Thi Thuong ◽  
Tran Huyen Linh ◽  
...  

The pandemic COVID-19 caused by the virus SARS-CoV-2 has devastated countries worldwide, infecting more than 4.5 million people and leading to more than 300,000 deaths as of May 16th, 2020. Whole-genome sequencing (WGS) is an effective tool to monitor emerging strains and provide information for intervention, thus help to inform outbreak control decisions. Here, we reported the first effort to sequence and de novo assemble the whole genome of SARS-CoV-2 using PacBio’s SMRT sequencing technology in Vietnam. We also presented the annotation results and a brief analysis of the variants found in our SARS-CoV-2 strain, which was isolated from a Vietnamese patient. The sequencing was successfully completed and de novo assembled in less than 30 hours, resulting in one contig with no gap and a length of 29,766 bp. All detected variants as compared to the NCBI reference were highly accurate, as confirmed by Sanger sequencing. The results have shown the potential of long read sequencing to provide high quality WGS data to support public health responses and advance understanding of this and future pandemics.


Author(s):  
Yongzhuang Liu ◽  
Yalin Huang ◽  
Guohua Wang ◽  
Yadong Wang

Abstract Short read whole genome sequencing has become widely used to detect structural variants in human genetic studies and clinical practices. However, accurate detection of structural variants is a challenging task. Especially existing structural variant detection approaches produce a large proportion of incorrect calls, so effective structural variant filtering approaches are urgently needed. In this study, we propose a novel deep learning-based approach, DeepSVFilter, for filtering structural variants in short read whole genome sequencing data. DeepSVFilter encodes structural variant signals in the read alignments as images and adopts the transfer learning with pre-trained convolutional neural networks as the classification models, which are trained on the well-characterized samples with known high confidence structural variants. We use two well-characterized samples to demonstrate DeepSVFilter’s performance and its filtering effect coupled with commonly used structural variant detection approaches. The software DeepSVFilter is implemented using Python and freely available from the website at https://github.com/yongzhuang/DeepSVFilter.


Sign in / Sign up

Export Citation Format

Share Document