split read
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 4)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Kaihao Tang ◽  
Weiquan Wang ◽  
Yamin Sun ◽  
Yiqing Zhou ◽  
Pengxia Wang ◽  
...  

Abstract The life cycle of temperate phages includes a lysogenic cycle stage when the phage integrates into the host genome and becomes a prophage. However, the identification of prophages that are highly divergent from known phages remains challenging. In this study, by taking advantage of the lysis-lysogeny switch of temperate phages, we designed Prophage Tracer, a tool for recognizing active prophages in prokaryotic genomes using short-read sequencing data, independent of phage gene similarity searching. Prophage Tracer uses the criterion of overlapping split-read alignment to recognize discriminative reads that contain bacterial (attB) and phage (attP) att sites representing prophage excision signals. Performance testing showed that Prophage Tracer could predict known prophages with precise boundaries, as well as novel prophages. Two novel prophages, dsDNA and ssDNA, encoding highly divergent major capsid proteins, were identified in coral-associated bacteria. Prophage Tracer is a reliable data mining tool for the identification of novel temperate phages and mobile genetic elements. The code for the Prophage Tracer is publicly available at https://github.com/WangLab-SCSIO/Prophage_Tracer.


2020 ◽  
Author(s):  
Eugene J. Gardner ◽  
Alejandro Sifrim ◽  
Sarah J. Lindsay ◽  
Elena Prigmore ◽  
Diana Rajan ◽  
...  

AbstractPurposeIdentifying structural variations (SVs) associated with developmental disorder (DD) patient phenotype missed by conventional approaches.MethodsWe have developed a novel SV discovery approach that mines split-read information, ‘InDelible’, and applied it to exome sequencing (ES) of 13,438 probands with severe DD recruited as part of the Deciphering Developmental Disorders (DDD) study.ResultsUsing InDelible we were able to find 59 previously undetected variants in genes previously associated with DD, of which 49.2% (29) had phenotypic features that accord with those of the patient in which they were found, and were deemed plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21-500 bps in size, and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.0% (n = 29 variants). Of particular interest were seven confirmed de novo SVs in the gene MECP2; these variants represent 31.8% of all de novo protein truncating variants in MECP2 among DDD patients.ConclusionInDelible provides a rapid framework for the discovery of likely pathogenic SVs that are likely to be missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES.


2020 ◽  
Vol 12 (10) ◽  
pp. 1711-1718
Author(s):  
Yi Feng ◽  
Leslie Y Beh ◽  
Wei-Jen Chang ◽  
Laura F Landweber

Abstract Ciliates are microbial eukaryotes with distinct somatic and germline genomes. Postzygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. Although many high-quality somatic genomes have been assembled, a high-quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline, SIGAR (Split-read Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences, and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.


2020 ◽  
Author(s):  
Yi Feng ◽  
Leslie Y. Beh ◽  
Wei-Jen Chang ◽  
Laura F. Landweber

AbstractCiliates are microbial eukaryotes with distinct somatic and germline genomes. Post-zygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programmed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. While many high-quality somatic genomes have been assembled, a high quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline SIGAR (Splitread Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short germline DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.


2018 ◽  
Author(s):  
Brent S. Pedersen ◽  
Aaron R. Quinlan

AbstractMost structural variant detection tools use clusters of discordant read-pair and split-read alignments to identify variants, yet do not integrate depth of sequence coverage as an additional means to support or refute putative events. Here, we present duphold, as a new method to efficiently annotate structural variant calls with sequence depth information that can add (or remove) confidence to SV predicted to affect copy number. It indicates not only the change in depth across the event, but also the presence of a rapid change in depth relative to the regions surrounding the breakpoints. It uses a unique algorithm that allows the run time to be nearly independent of the number of variants. This performance is important for large, jointly-called projects with many samples, each of which must be evaluated at thousands of sites. We show that filtering on duphold annotations can greatly improve the specificity of deletion calls and that its annotations match visual inspection. Duphold can annotate structural variant predictions made from both short-read and long-read data. It is available under the MIT license at: https://github.com/brentp/duphold.


2018 ◽  
Author(s):  
Fatih Karaoglanoglu ◽  
Camir Ricketts ◽  
Marzieh Eslami Rasekh ◽  
Ezgi Ebren ◽  
Iman Hajirasouliha ◽  
...  

AbstractMany algorithms aimed at characterizing genomic structural variation (SV) have been developed since the inception of high-throughput sequencing. However, the full spectrum of SVs in the human genome is not yet assessed. Most of the existing methods focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced SVs with no gain or loss of genomic segments (e.g., inversions) is particularly a challenging task. Long read sequencing has been leveraged to find short inversions but there is still a need to develop methods to detect large genomic inversions. Furthermore, currently there are no algorithms to predict the insertion locus of large interspersed segmental duplications.Here we propose novel algorithms to characterize large (>40Kbp) interspersed segmental duplications and (>80Kbp) inversions using Linked-Read sequencing data. Linked-Read sequencing provides long range information, where Illumina reads are tagged with barcodes that can be used to assign short reads to pools of larger (30-50 Kbp) molecules. Our methods rely on split molecule sequence signature that we have previously described [11]. Similar to the split read, split molecules refer to large segments of DNA that span an SV breakpoint. Therefore, when mapped to the reference genome, the mapping of these segments would be discontinuous. We redesign our earlier algorithm, VALOR, to specifically leverage Linked-Read sequencing data to discover large inversions and characterize interspersed segmental duplications. We implement our new algorithms in a new software package, called VALOR2.AvailabilityVALOR2 is available at https://github.com/BilkentCompGen/valor.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Daichi Shigemizu ◽  
Fuyuki Miya ◽  
Shintaro Akiyama ◽  
Shujiro Okuda ◽  
Keith A Boroevich ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document