scholarly journals Detection of complex structural variation from paired-end sequencing data

2017 ◽  
Author(s):  
Joseph G. Arthur ◽  
Xi Chen ◽  
Bo Zhou ◽  
Alexander E. Urban ◽  
Wing Hung Wong

AbstractDetecting structural variants (SVs) from sequencing data is key to genome analysis, but methods using standard whole-genome sequencing (WGS) data are typically incapable of resolving complex SVs with multiple co-located breakpoints. We introduce the ARC-SV method, which uses a probabilistic model to detect arbitrary local rearrangements from WGS data. Our method performs well on simple SVs while surpassing state-of-the-art methods in complex SV detection.

2020 ◽  
Author(s):  
Xiao Chen ◽  
Fei Shen ◽  
Nina Gonzaludo ◽  
Alka Malhotra ◽  
Cande Rogert ◽  
...  

AbstractResponsible for the metabolism of 25% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84-86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius’s accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.


Neurology ◽  
2021 ◽  
Vol 96 (13) ◽  
pp. e1770-e1782
Author(s):  
Elizabeth Emma Palmer ◽  
Rani Sachdev ◽  
Rebecca Macintosh ◽  
Uirá Souto Melo ◽  
Stefan Mundlos ◽  
...  

ObjectiveTo assess the benefits and limitations of whole genome sequencing (WGS) compared to exome sequencing (ES) or multigene panel (MGP) in the molecular diagnosis of developmental and epileptic encephalopathies (DEE).MethodsWe performed WGS of 30 comprehensively phenotyped DEE patient trios that were undiagnosed after first-tier testing, including chromosomal microarray and either research ES (n = 15) or diagnostic MGP (n = 15).ResultsEight diagnoses were made in the 15 individuals who received prior ES (53%): 3 individuals had complex structural variants; 5 had ES-detectable variants, which now had additional evidence for pathogenicity. Eleven diagnoses were made in the 15 MGP-negative individuals (68%); the majority (n = 10) involved genes not included in the panel, particularly in individuals with postneonatal onset of seizures and those with more complex presentations including movement disorders, dysmorphic features, or multiorgan involvement. A total of 42% of diagnoses were autosomal recessive or X-chromosome linked.ConclusionWGS was able to improve diagnostic yield over ES primarily through the detection of complex structural variants (n = 3). The higher diagnostic yield was otherwise better attributed to the power of re-analysis rather than inherent advantages of the WGS platform. Additional research is required to assist in the assessment of pathogenicity of novel noncoding and complex structural variants and further improve diagnostic yield for patients with DEE and other neurogenetic disorders.


Author(s):  
Yongzhuang Liu ◽  
Yalin Huang ◽  
Guohua Wang ◽  
Yadong Wang

Abstract Short read whole genome sequencing has become widely used to detect structural variants in human genetic studies and clinical practices. However, accurate detection of structural variants is a challenging task. Especially existing structural variant detection approaches produce a large proportion of incorrect calls, so effective structural variant filtering approaches are urgently needed. In this study, we propose a novel deep learning-based approach, DeepSVFilter, for filtering structural variants in short read whole genome sequencing data. DeepSVFilter encodes structural variant signals in the read alignments as images and adopts the transfer learning with pre-trained convolutional neural networks as the classification models, which are trained on the well-characterized samples with known high confidence structural variants. We use two well-characterized samples to demonstrate DeepSVFilter’s performance and its filtering effect coupled with commonly used structural variant detection approaches. The software DeepSVFilter is implemented using Python and freely available from the website at https://github.com/yongzhuang/DeepSVFilter.


2018 ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jian-Jun Jin ◽  
Wen-Bin Yu ◽  
Jun-Bo Yang ◽  
Yu Song ◽  
Claude W. dePamphilis ◽  
...  

Abstract GetOrganelle is a state-of-the-art toolkit to accurately assemble organelle genomes from whole genome sequencing data. It recruits organelle-associated reads using a modified “baiting and iterative mapping” approach, conducts de novo assembly, filters and disentangles the assembly graph, and produces all possible configurations of circular organelle genomes. For 50 published plant datasets, we are able to reassemble the circular plastomes from 47 datasets using GetOrganelle. GetOrganelle assemblies are more accurate than published and/or NOVOPlasty-reassembled plastomes as assessed by mapping. We also assemble complete mitochondrial genomes using GetOrganelle. GetOrganelle is freely released under a GPL-3 license (https://github.com/Kinggerm/GetOrganelle).


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 8577-8577
Author(s):  
Deborah Ritter ◽  
Kimberly Walker ◽  
Myoung Kwon ◽  
Premal Lulla ◽  
Catherine M. Bollard ◽  
...  

8577 Background: Burkitt Lymphoma is defined by canonical translocations between MYC and immunoglobulin IgH, IgK or IgL (8:14, 8:2, 8:22, respectively), and is commonly associated with HIV. The identification of HIV from sequenced samples is critical to understanding HIV-associated Burkitt Lymphoma. While recent novel gene mutations (ID3 and TCF3) have been implicated in functional roles, concomitant genomic structural variants and the interaction of HIV with structural variation is less well defined. Methods: We sequenced the whole genomes of 15 patients with 100bp paired-end reads on Illumina Hi-Seq platform, resulting in an average insert size of 278 (+/- 63) and coverage of 60X tumor and 30X normal. We included 7 HIV-negative, and 8 HIV-positive subjects. Sequencing reads were mapped to the reference genome using BWA. Large-scale structural variation was detected by the BreakDancer and Crest programs. Functional annotation was used to prioritize structural variants for validation. Single nucleotide variants and small insertions and deletions were detected by CARNAC, a somatic variation discovery pipeline. The subset of WGS reads that failed to align to the human reference genome were tested for the presence of HIV sequences by comparing the unmapped reads to a database of viral DNA sequences which included the common subtypes of HIV defined by Los Alamos. Reads matching HIV or EBV with an expectation value of <10-4 were analyzed to determine virus coverage and viral integration sites. Results: Canonical MYC-IgH translocations were identified in 9/15 (60%) tumor samples, with 2 additional subjects harboring either a deletion or an inversion near exon1 of MYC; 4 had no MYC rearrangement. MYC translocations occurred equally in both groups. TP53 and SMARC4 point mutations were observed recurrently in the HIV uninfected group but not in the HIV infected patients. Variable levels of HIV DNA sequence were observed in normal tissue of all HIV infected patients. Conclusions: Whole genome sequencing has identified known somatic variants in HIV infected and uninfected patients. Two genes, TP53 and SMARC4, appear to be differentially mutated, but additional samples are needed to achieve statistical significance.


Sign in / Sign up

Export Citation Format

Share Document