splice junctions
Recently Published Documents


TOTAL DOCUMENTS

147
(FIVE YEARS 40)

H-INDEX

28
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Takumi Ito ◽  
Kazutoshi Yoshitake ◽  
Takeshi Iwata

The 'ePat' (extended PROVEAN annotation tool) is a software tool that extends the functionality of PROVEAN: a software tool for predicting whether amino acid substitutions and indels will affect the biological function of proteins. The 'ePat' extends the conventional PROVEAN to enable the following two things. First is to calculate the pathogenicity of indel mutations with frameshift and variants near splice junctions, for which the conventional PROVEAN could not calculate the pathogenicity of these variants. Second is to use batch processing to calculate the pathogenicity of multiple variants in a variants list (VCF file) in a single step. In order to identify variants that are predicted to be functionally important from the variants list, ePat can help filter out variants that affect biological functions by utilizing not only point mutations, and indel mutations that does not cause frameshift, but also frameshift, stop gain, and splice variants.


2021 ◽  
Author(s):  
Pierre Murat ◽  
Guillaume Guilbaud ◽  
Julian E Sale

DNA replication starts with the activation of the replicative helicases, polymerases and associated factors at thousands of origins per S-phase. Due to local torsional constraints generated during licensing and the switch between polymerases of distinct fidelity and proofreading ability following firing, origin activation has the potential to induce DNA damage and mutagenesis. However, whether sites of replication initiation exhibit a specific mutational footprint has not yet been established. Here we demonstrate that mutagenesis is increased at early and highly efficient origins. The elevated mutation rate observed at these sites is caused by two distinct mutational processes consistent with formation of DNA breaks at the origin itself and local error-prone DNA synthesis in the immediate vicinity of the origin. We demonstrate that these replication-dependent mutational processes create the skew in base composition observed at human replication origins. Further, we show that mutagenesis associated with replication initiation exerts an influence on phenotypic diversity in human populations disproportionate to the origins genomic footprint: by increasing mutational loads at gene promoters and splice junctions the presence of an origin influences both gene expression and mRNA isoform usage. These findings have important implications for our understanding of the mutational processes that sculpt the human genome.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
R. Koster ◽  
R. D. Brandão ◽  
D. Tserpelis ◽  
C. E. P. van Roozendaal ◽  
C. N. van Oosterhoud ◽  
...  

AbstractNeurofibromatosis type 1 (NF1) is caused by loss-of-function variants in the NF1 gene. Approximately 10% of these variants affect RNA splicing and are either missed by conventional DNA diagnostics or are misinterpreted by in silico splicing predictions. Therefore, a targeted RNAseq-based approach was designed to detect pathogenic RNA splicing and associated pathogenic DNA variants. For this method RNA was extracted from lymphocytes, followed by targeted RNAseq. Next, an in-house developed tool (QURNAs) was used to calculate the enrichment score (ERS) for each splicing event. This method was thoroughly tested using two different patient cohorts with known pathogenic splice-variants in NF1. In both cohorts all 56 normal reference transcript exon splice junctions, 24 previously described and 45 novel non-reference splicing events were detected. Additionally, all expected pathogenic splice-variants were detected. Eleven patients with NF1 symptoms were subsequently tested, three of which have a known NF1 DNA variant with a putative effect on RNA splicing. This effect could be confirmed for all 3. The other eight patients were previously without any molecular confirmation of their NF1-diagnosis. A deep-intronic pathogenic splice variant could now be identified for two of them (25%). These results suggest that targeted RNAseq can be successfully used to detect pathogenic RNA splicing variants in NF1.


2021 ◽  
Author(s):  
Yupei You ◽  
Michael B. Clark ◽  
Heejung Shim

Motivation: Long read sequencing methods have considerable advantages for characterising RNA isoforms. Oxford nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilising matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results: We developed "NanoSplicer" to identify splice junctions using raw nanopore signal (squiggles). For each splice junction the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using 1. synthetic mRNAs with known splice junctions 2. biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Our method is implemented in the software package NanoSplicer, available at https://github.com/shimlab/NanoSplicer.


2021 ◽  
Vol 22 ◽  
Author(s):  
Kevin Regan ◽  
Abolfazl Saghafi ◽  
Zhijun Li

Background: Splice junctions are the key to going from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications. Objective: The aim is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA. Method: A Long Short-Term Memory (LSTM) Neural Network model is developed that classifies a given sequence as EI (Exon-Intron splice), IE (Intron-Exon splice), or N (No splice). The model is trained with groups of trinucleotides and its performance is tested using validation and test data to prevent bias. Results: Model performance was measured using accuracy and f-score in test data. The finalized model achieved an average accuracy of 91.34% with an average f-score of 91.36% over 50 runs. Conclusion: Comparisons show a highly competitive model to recent Convolutional Neural Network structures. The proposed LSTM model achieves the highest accuracy and f-score among published alternative LSTM structures.


2021 ◽  
Author(s):  
Hsin-Yen Larry Wu ◽  
Polly Yingshan Hsu

Abstract Background: Ribo-seq has revolutionized the study of genome-wide mRNA translation. High-quality Ribo-seq data display strong 3-nucleotide (nt) periodicity, which corresponds to translating ribosomes deciphering three nts at a time. While 3-nt periodicity has been widely used to study novel translation events such as upstream ORFs in 5’ untranslated regions and small ORFs in presumed non-coding RNAs, tools that allow the visualization of these events remain underdeveloped.Results: RiboPlotR is a visualization package written in R that presents both RNA-seq coverage and Ribo-seq reads in genomic coordinates for all annotated transcript isoforms of a gene. Specifically, for individual isoform models, RiboPlotR plots Ribo-seq data related to splice junctions and presents the reads for all three reading frames in three different colors. Moreover, RiboPlotR shows Ribo-seq reads in upstream ORFs, 5' and 3' untranslated regions and introns, which is critical for observing new translation events and identifying potential regulatory mechanisms.Conclusions: RiboPlotR is freely available (https://github.com/hsinyenwu/RiboPlotR and https://sourceforge.net/projects/riboplotr/) and allows the visualization of translated features identified in Ribo-seq data.


2021 ◽  
Author(s):  
Max Coulter ◽  
Juan Carlos Entizne ◽  
Wenbin Guo ◽  
Micha Bayer ◽  
Ronja Wonneberger ◽  
...  

Accurate characterization of splice junctions as well as transcription start and end sites in reference transcriptomes allows precise quantification of transcripts from RNA-seq data and enable detailed investigations of transcriptional and post-transcriptional regulation. Using novel computational methods and a combination of PacBio Iso-seq and Illumina short read sequences from 20 diverse tissues and conditions, we generated a comprehensive and highly resolved barley reference transcript dataset (RTD) from the European 2-row spring barley cultivar Barke (BaRTv2.18). Stringent and thorough filtering was carried out to maintain the quality and accuracy of the splice junctions and transcript start and end sites. BaRTv2.18 shows increased transcript diversity and completeness compared to an earlier version, BaRTv1.0. The accuracy of transcript level quantification, splice junctions and transcript start and end sites has been validated extensively using parallel technologies and analysis, including high resolution RT PCR and 5 prime RACE. BaRTv2.18 contains 39,434 genes and 148,260 transcripts, representing the most comprehensive and resolved reference transcriptome in barley to date. It provides an important and high-quality resource for advanced transcriptomic analyses, including both transcriptional and post-transcriptional regulation, with exceptional resolution and precision.


2021 ◽  
Author(s):  
Runxuan Zhang ◽  
Richard Kuo ◽  
Max Coulter ◽  
Cristiane P.G. Calixto ◽  
Juan Carlos Entizne ◽  
...  

Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis-match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Roozbeh Dehghannasiri ◽  
Julia Eve Olivieri ◽  
Ana Damljanovic ◽  
Julia Salzman

AbstractPrecise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN is a general method that can be applied to bulk or single-cell data, but has particular utility for single-cell analysis due to that data’s unique challenges and opportunities for discovery. SICILIAN’s precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, and increases agreement between biological replicates. SICILIAN detects unannotated splicing in single cells, enabling the discovery of novel splicing regulation through single-cell analysis workflows.


Author(s):  
Aparajita Dutta ◽  
Kusum Kumari Singh ◽  
Ashish Anand

Most of the current computational models for splice junction prediction are based on the identification of canonical splice junctions. However, it is observed that the junctions lacking the consensus dimers GT and AG also undergo splicing. Identification of such splice junctions, called the non-canonical splice junctions, is also essential for a comprehensive understanding of the splicing phenomenon. This work focuses on the identification of non-canonical splice junctions through the application of a bidirectional long short-term memory (BLSTM) network. Furthermore, we apply a back-propagation-based (integrated gradient) and a perturbation-based (occlusion) visualization techniques to extract the non-canonical splicing features learned by the model. The features obtained are validated with the existing knowledge from the literature. Integrated gradient extracts features that comprise contiguous nucleotides, whereas occlusion extracts features that are individual nucleotides distributed across the sequence.


Sign in / Sign up

Export Citation Format

Share Document