scholarly journals svaRetro and svaNUMT: modular packages for annotation of retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data

2021 ◽  
Author(s):  
Ruining Dong ◽  
Daniel L Cameron ◽  
Justin Bedo ◽  
Anthony T Papenfuss

Background: The biological significance of structural variation is now more widely recognized. However, due to the lack of available tools for downstream analysis, including processing and annotating, interpretation of structural variant calls remains a challenge. Findings: Here we present svaRetro and svaNUMT, R packages that provide functions for annotating novel genomic events such as non-reference retro-copied transcripts and nuclear integration of mitochondrial DNA. We evaluate the performance of these packages to detect events using simulations and public benchmarking datasets, and annotate processed transcripts in a public structural variant database. Conclusions: svaRetro and svaNUMT provide efficient, modular tools for downstream identification and annotation of structural variant calls.

Author(s):  
Varuni Sarwal ◽  
Sebastian Niehus ◽  
Ram Ayyala ◽  
Sei Chang ◽  
Angela Lu ◽  
...  

AbstractAdvances in whole genome sequencing promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from whole genome sequencing (WGS) data presents a substantial number of challenges and a plethora of SV-detection methods have been developed. Currently, there is a paucity of evidence which investigators can use to select appropriate SV-detection tools. In this paper, we evaluated the performance of SV-detection tools using a comprehensive PCR-confirmed gold standard set of SVs. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of SV-detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance, as the SV-detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV-detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low and ultra-low pass sequencing data.


2017 ◽  
Author(s):  
Mark J.P. Chaisson ◽  
Ashley D. Sanders ◽  
Xuefang Zhao ◽  
Ankit Malhotra ◽  
David Porubsky ◽  
...  

ABSTRACTThe incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.


Author(s):  
Yongzhuang Liu ◽  
Yalin Huang ◽  
Guohua Wang ◽  
Yadong Wang

Abstract Short read whole genome sequencing has become widely used to detect structural variants in human genetic studies and clinical practices. However, accurate detection of structural variants is a challenging task. Especially existing structural variant detection approaches produce a large proportion of incorrect calls, so effective structural variant filtering approaches are urgently needed. In this study, we propose a novel deep learning-based approach, DeepSVFilter, for filtering structural variants in short read whole genome sequencing data. DeepSVFilter encodes structural variant signals in the read alignments as images and adopts the transfer learning with pre-trained convolutional neural networks as the classification models, which are trained on the well-characterized samples with known high confidence structural variants. We use two well-characterized samples to demonstrate DeepSVFilter’s performance and its filtering effect coupled with commonly used structural variant detection approaches. The software DeepSVFilter is implemented using Python and freely available from the website at https://github.com/yongzhuang/DeepSVFilter.


2017 ◽  
Author(s):  
Nagarajan Paramasivam ◽  
Martin Granzow ◽  
Christina Evers ◽  
Katrin Hinderhofer ◽  
Stefan Wiemann ◽  
...  

AbstractWith genome sequencing entering the clinics as diagnostic tool to study genetic disorders, there is an increasing need for bioinformatics solutions that enable precise causal variant identification in a timely manner.BackgroundWorkflows for the identification of candidate disease-causing variants perform usually the following tasks: i) identification of variants; ii) filtering of variants to remove polymorphisms and technical artifacts; and iii) prioritization of the remaining variants to provide a small set of candidates for further analysis.MethodsHere, we present a pipeline designed to identify variants and prioritize the variants and genes from trio sequencing or pedigree-based sequencing data into different tiers.ResultsWe show how this pipeline was applied in a study of patients with neurodevelopmental disorders of unknown cause, where it helped to identify the causal variants in more than 35% of the cases.ConclusionsClassification and prioritization of variants into different tiers helps to select a small set of variants for downstream analysis.


2017 ◽  
Author(s):  
Joseph G. Arthur ◽  
Xi Chen ◽  
Bo Zhou ◽  
Alexander E. Urban ◽  
Wing Hung Wong

AbstractDetecting structural variants (SVs) from sequencing data is key to genome analysis, but methods using standard whole-genome sequencing (WGS) data are typically incapable of resolving complex SVs with multiple co-located breakpoints. We introduce the ARC-SV method, which uses a probabilistic model to detect arbitrary local rearrangements from WGS data. Our method performs well on simple SVs while surpassing state-of-the-art methods in complex SV detection.


Sign in / Sign up

Export Citation Format

Share Document