BreakID: genomics breakpoints identification to detect gene fusion events using discordant pairs and split reads

2019 ◽  
Vol 35 (16) ◽  
pp. 2859-2861
Author(s):  
Linfang Jin ◽  
Jinhuo Lai ◽  
Yang Zhang ◽  
Ying Fu ◽  
Shuhang Wang ◽  
...  

AbstractSummaryHere we developed a tool called Breakpoint Identification (BreakID) to identity fusion events from targeted sequencing data. Taking discordant read pairs and split reads as supporting evidences, BreakID can identify gene fusion breakpoints at single nucleotide resolution. After validation with confirmed fusion events in cancer cell lines, we have proved that BreakID can achieve high sensitivity of 90.63% along with PPV of 100% at sequencing depth of 500× and perform better than other available fusion detection tools. We anticipate that BreakID will have an extensive popularity in the detection and analysis of fusions involved in clinical and research sequencing scenarios.Availability and implementationSource code is freely available at https://github.com/SinOncology/BreakID.Supplementary informationSupplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Alli L. Gombolay ◽  
Francesca Storici

ABSTRACTRibose-Map is a user-friendly, standardized bioinformatics toolkit for the comprehensive analysis of ribonucleotide sequencing experiments. It allows researchers to map the locations of ribonucleotides in DNA to single-nucleotide resolution and identify biological signatures of ribonucleotide incorporation. In addition, it can be applied to data generated using any currently available high-throughput ribonucleotide sequencing technique, thus standardizing the analysis of ribonucleotide sequencing experiments and allowing direct comparisons of results. This protocol describes in detail how to use Ribose-Map to analyze raw ribonucleotide sequencing data, including preparing the reads for analysis, locating the genomic coordinates of ribonucleotides, exploring the genome-wide distribution of ribonucleotides, determining the nucleotide sequence context of ribonucleotides, and identifying hotspots of ribonucleotide incorporation. Ribose-Map does not require background knowledge of ribonucleotide sequencing analysis and assumes only basic command-line skills. The protocol requires less than 3 hr of computing time for most datasets and about 30 min of hands-on time.


Author(s):  
Quang Tran ◽  
Alexej Abyzov

Abstract Summary Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation—LongAGE—based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. Availability and implementation LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Yao-zhong Zhang ◽  
Seiya Imoto ◽  
Satoru Miyano ◽  
Rui Yamaguchi

AbstractMotivationFor short-read sequencing, read-depth based structural variant (SV) callers are difficult to find single-nucleotide-resolution breakpoints due to the bin-size limitation.ResultsIn this paper, we present RDBKE to enhance the breakpoint resolution of read-depth SV callers using deep segmentation model UNet. We show that UNet can be trained with a small amount of data and applied for breakpoint enhancement both in-sample and cross-sample. On both simulation and real data, RDBKE significantly increases the number of SVs with more precise breakpoints.Availabilitysource code of RDBKE is available athttps://github.com/yaozhong/[email protected]


2019 ◽  
Vol 36 (7) ◽  
pp. 2033-2039 ◽  
Author(s):  
Junfeng Liu ◽  
Ziyang An ◽  
Jianjun Luo ◽  
Jing Li ◽  
Feifei Li ◽  
...  

Abstract Motivation RNA 5-methylcytosine (m5C) is a type of post-transcriptional modification that may be involved in numerous biological processes and tumorigenesis. RNA m5C can be profiled at single-nucleotide resolution by high-throughput sequencing of RNA treated with bisulfite (RNA-BisSeq). However, the exploration of transcriptome-wide profile and potential function of m5C in splicing remains to be elucidated due to lack of isoform level m5C quantification tool. Results We developed a computational package to quantify Epitranscriptomal RNA m5C at the transcript isoform level (named Episo). Episo consists of three tools: mapper, quant and Bisulfitefq, for mapping, quantifying and simulating RNA-BisSeq data, respectively. The high accuracy of Episo was validated using an improved m5C-specific methylated RNA immunoprecipitation (meRIP) protocol, as well as a set of in silico experiments. By applying Episo to public human and mouse RNA-BisSeq data, we found that the RNA m5C is not evenly distributed among the transcript isoforms, implying the m5C may subject to be regulated at isoform level. Availability and implementation Episo is released under the GNU GPLv3+ license. The resource code Episo is freely accessible from https://github.com/liujunfengtop/Episo (with Tophat/cufflink) and https://github.com/liujunfengtop/Episo/tree/master/Episo_Kallisto (with Kallisto). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 17 (10) ◽  
pp. e1009186
Author(s):  
Yao-zhong Zhang ◽  
Seiya Imoto ◽  
Satoru Miyano ◽  
Rui Yamaguchi

Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs. We integrate model predictions with an RD-based SV caller to enhance breakpoints in single-nucleotide resolution. We show that UNet can be trained with a small amount of data and can be applied both in-sample and cross-sample. An enhancement pipeline named RDBKE significantly increases the number of SVs with more precise breakpoints on simulated and real data. The source code of RDBKE is freely available at https://github.com/yaozhong/deepIntraSV.


2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Yiran Zhou ◽  
Qinghua Cui ◽  
Yuan Zhou

Abstract Background 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutionary technique termed Nm-seq, was developed to profile Nm sites mainly in mRNA with single nucleotide resolution and high sensitivity. In a recent work, supported by the Nm-seq data, we have reported a method in silico for predicting Nm sites, which relies on nucleotide sequence information, and established an online server named NmSEER. More recently, a more confident dataset produced by refined Nm-seq was available. Therefore, in this work, we redesigned the prediction model to achieve a more robust performance on the new data. Results We redesigned the prediction model from two perspectives, including machine learning algorithm and multi-encoding scheme combination. With optimization by 5-fold cross-validation tests and evaluation by independent test respectively, random forest was selected as the most robust algorithm. Meanwhile, one-hot encoding, together with position-specific dinucleotide sequence profile and K-nucleotide frequency encoding were collectively applied to build the final predictor. Conclusions The predictor of updated version, named NmSEER V2.0, achieves an accurate prediction performance (AUROC = 0.862) and has been settled into a brand-new server, which is available at http://www.rnanut.net/nmseer-v2/ for free.


2019 ◽  
Author(s):  
Iñigo Prada-Luengo ◽  
Anders Krogh ◽  
Lasse Maretty ◽  
Birgitte Regenberg

AbstractCircular DNA has recently been identified across different species including human normal and cancerous tissue, but short-read mappers are unable to align many of the reads crossing circle junctions and hence limits their detection from short-read sequencing data. Here, we propose a new method, Circle-Map, that guides the realignment of partially aligned reads using information from discordantly mapped reads. We demonstrate how this approach dramatically increases sensitivity for detection of circular DNA on both simulated and real data while retaining high precision.


Foods ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 2218
Author(s):  
Xiaoying Zhu ◽  
Minghua Wu ◽  
Ruijie Deng ◽  
Mohammad Rizwan Khan ◽  
Sha Deng ◽  
...  

Waxy sorghum has greater economic value than wild sorghum in relation to their use in food processing and the brewing industry. Thus, the authentication of the waxy sorghum species is an important issue. Herein, a rapid and sensitive Authentication Amplification Refractory Mutation System-PCR (aARMS-PCR) method was employed to identify sorghum species via its ability to resolve single-nucleotide in genes. As a proof of concept, we chose a species of waxy sorghum containing the wxc mutation which is abundantly used in liquor brewing. The aARMS-PCR can distinguish non-wxc sorghum from wxc sorghum to guarantee identification of specific waxy sorghum species. It allowed to detect as low as 1% non-wxc sorghum in sorghum mixtures, which ar one of the most sensitive tools for food authentication. Due to its ability for resolving genes with single-nucleotide resolution and high sensitivity, aARMS-PCR may have wider applicability in monitoring food adulteration, offering a rapid food authenticity verification in the control of adulteration.


Sign in / Sign up

Export Citation Format

Share Document