scholarly journals stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads

2021 ◽  
Vol 12 ◽  
Author(s):  
Junfu Guo ◽  
Chang Shi ◽  
Xi Chen ◽  
Ou Wang ◽  
Ping Liu ◽  
...  

Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.

2020 ◽  
Author(s):  
Junfu Guo ◽  
Chang Shi ◽  
Xi Chen ◽  
Ou Wang ◽  
Ping Liu ◽  
...  

AbstractCo-barcoded reads originated from long DNA fragment (mean length larger than 50Kbp) with barcodes, maintain both single base level accuracy and long range genomic information. We propose a pipeline stLFRsv to detect structure variation using co-barcoded reads. stLFRsv identifies abnormally large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structure variations. The barcodes enabled co-barcoded reads phasing increases the signal to noise ratio and barcode sharing profiles are used to filter out false positives. We integrate the short reads SV caller smoove for smaller variations with stLFRsv. The integrated pipeline was evaluated on the well characterized genome HG002/NA24385 and obtained precision and recall rate of 74.2% and 22.3% for deletion on the whole genome. stLFR found some large variations not included in the benchmark set and verified by means of long reads or assembly. Our work indicates that co-barcoded reads technology has the potential to improve genome completeness.


2018 ◽  
Vol 19 (S20) ◽  
Author(s):  
Zachary Stephens ◽  
Chen Wang ◽  
Ravishankar K. Iyer ◽  
Jean-Pierre Kocher

2019 ◽  
Vol 35 (21) ◽  
pp. 4397-4399 ◽  
Author(s):  
S U Greer ◽  
H P Ji

Abstract Summary Linked-read sequencing generates synthetic long reads which are useful for the detection and analysis of structural variants (SVs). The software associated with 10× Genomics linked-read sequencing, Long Ranger, generates the essential output files (BAM, VCF, SV BEDPE) necessary for downstream analyses. However, to perform downstream analyses requires the user to customize their own tools to handle the unique features of linked-read sequencing data. Here, we describe gemtools, a collection of tools for the downstream and in-depth analysis of SVs from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of SV breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules. The gemtools package is a suite of tools that provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions. Availability and implementation The gemtools package is freely available for download at: https://github.com/sgreer77/gemtools. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Toshiyuki T. Yokoyama ◽  
Yoshitaka Sakamoto ◽  
Masahide Seki ◽  
Yutaka Suzuki ◽  
Masahiro Kasahara

Abstract Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.


2019 ◽  
Author(s):  
Sandra Louzada ◽  
Walid Algady ◽  
Eleanor Weyell ◽  
Luciana W. Zuccherato ◽  
Paulina Brajer ◽  
...  

AbstractApproximately 5% of the human genome consists of structural variants, which are enriched for genes involved in the immune response and cell-cell interactions. A well-established region of extensive structural variation is the glycophorin gene cluster, comprising three tandemly-repeated regions about 120kb in length, carrying the highly homologous genes GYPA, GYPB and GYPE. Glycophorin A and glycophorin B are glycoproteins present at high levels on the surface of erythrocytes, and they have been suggested to act as decoy receptors for viral pathogens. They act as receptors for invasion of a causative agent of malaria, Plasmodium falciparum. A particular complex structural variant (DUP4) that creates a GYPB/GYPA fusion gene is known to confer resistance to malaria. Many other structural variants exist, and remain poorly characterised. Here, we analyse sequences from 6466 genomes from across the world for structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes project cohort, discovering 9 new variants, and characterising a selection using fibre-FISH and breakpoint mapping. We identify variants predicted to create novel fusion genes and a common inversion duplication variant at appreciable frequencies in West Africans. We show that almost all variants can be explained by unequal cross over events (non-allelic homologous recombination, NAHR) and. by comparing the structural variant breakpoints with recombination hotspot maps, show the importance of a particular meiotic recombination hotspot on structural variant formation in this region.


2020 ◽  
Author(s):  
Simone Maestri ◽  
Giorgio Gambino ◽  
Andrea Minio ◽  
Irene Perrone ◽  
Emanuela Cosentino ◽  
...  

AbstractStructural Variants (SVs) are a widely unexplored source of genetic variation, both due to methodological limitations and because they are generally associated to deleterious effects. However, with the advent of long-range genomic platforms, it has become easier to directly detect SVs. In the same direction, clonally propagated crops provide a unique opportunity to study SVs, offering a suitable genomic environment for their accumulation in heterozygosis. In particular, it has been reported that SVs generate drastic levels of heterozygosity in grapevines. ‘Nebbiolo’ (Vitis vinifera L.) is a grapevine cultivar typical of north-western Italy, appreciated for its use in producing high-quality red wines. Here, we aimed to analyze the frequency of SVs in ‘Nebbiolo’, at three different organizational levels. For this purpose, we generated genomic data based on long-reads, linked-reads and optical mapping. We assembled a reference genome for this cultivar and compared two different clones, including V. vinifera reference genome (PN40024) in our comparisons. Our results indicate that SVs differentially occurring between ‘Nebbiolo’ clones might be rare, while SVs differentiating haplotypes of the same individual are as abundant as those that occur differentially between cultivars.


2017 ◽  
Author(s):  
Joseph G. Arthur ◽  
Xi Chen ◽  
Bo Zhou ◽  
Alexander E. Urban ◽  
Wing Hung Wong

AbstractDetecting structural variants (SVs) from sequencing data is key to genome analysis, but methods using standard whole-genome sequencing (WGS) data are typically incapable of resolving complex SVs with multiple co-located breakpoints. We introduce the ARC-SV method, which uses a probabilistic model to detect arbitrary local rearrangements from WGS data. Our method performs well on simple SVs while surpassing state-of-the-art methods in complex SV detection.


Author(s):  
B Meier ◽  
NV Volkova ◽  
Y Hong ◽  
S Bertolini ◽  
V González-Huici ◽  
...  

AbstractGenome integrity is particularly important in germ cells to faithfully preserve genetic information across generations. As yet little is known about the contribution of various DNA repair pathways to prevent mutagenesis. Using the C. elegans model we analyse mutational spectra that arise in wild-type and 61 DNA repair and DNA damage response mutants cultivated over multiple generations. Overall, 44% of lines show >2-fold increased mutagenesis with a broad spectrum of mutational outcomes including changes in single or multiple types of base substitutions induced by defects in base excision or nucleotide excision repair, or elevated levels of 50-400 bp deletions in translesion polymerase mutants rev-3(pol ζ) and polh-1(pol η). Mutational signatures associated with defective homologous recombination fall into two classes: 1) mutants lacking brc-1/BRCA1 or rad-51/RAD51 paralogs show elevated base substitutions, indels and structural variants, while 2) deficiency for MUS-81/MUS81 and SLX-1/SLX1 nucleases, and HIM-6/BLM, HELQ-1/HELQ and RTEL-1/RTEL1 helicases primarily cause structural variants. Genome-wide investigation of mutagenesis patterns identified elevated rates of tandem duplications often associated with inverted repeats in helq-1 mutants, and a unique pattern of ‘translocation’ events involving homeologous sequences in rip-1 paralog mutants. atm-1/ATM DNA damage checkpoint mutants harboured complex structural variants enriched in subtelomeric regions, and chromosome end-to-end fusions. Finally, while inactivation of the p53-like gene cep-1 did not affect mutagenesis, combined brc-1 cep-1 deficiency displayed increased, locally clustered mutagenesis. In summary, we provide a global view of how DNA repair pathways prevent germ cell mutagenesis.


2015 ◽  
Author(s):  
Ivan Sovic ◽  
Mile Sikic ◽  
Andreas Wilm ◽  
Shannon Nicole Fenlon ◽  
Swaine Chen ◽  
...  

Exploiting the power of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. We present the first nanopore read mapper (GraphMap) that uses a read-funneling paradigm to robustly handle variable error rates and fast graph traversal to align long reads with speed and very high precision (>95%). Evaluation on MinION sequencing datasets against short and long-read mappers indicates that GraphMap increases mapping sensitivity by at least 15-80%. GraphMap alignments are the first to demonstrate consensus calling with <1 error in 100,000 bases, variant calling on the human genome with 76% improvement in sensitivity over the next best mapper (BWA-MEM), precise detection of structural variants from 100bp to 4kbp in length and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.


Sign in / Sign up

Export Citation Format

Share Document