scholarly journals Predicting the effect of variants on splicing using Convolutional Neural Networks

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9470
Author(s):  
Thanyathorn Thanapattheerakul ◽  
Worrawat Engchuan ◽  
Jonathan H. Chan

Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10−7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition.

1984 ◽  
Vol 4 (5) ◽  
pp. 966-972
Author(s):  
C Montell ◽  
E F Fisher ◽  
M H Caruthers ◽  
A J Berk

The primary transcript from adenovirus 2 early region 1B (E1B) is processed by differential RNA splicing into two overlapping mRNAs, 13S and 22S. The 22S mRNA is the major E1B mRNA during the early phase of infection, whereas the 13S mRNA predominates during the late phase. In previous work, it has been shown that this shift in proportions of the E1B mRNAs is influenced by increased cytoplasmic stability of the 13S mRNA at late times in infection. Two observations presented here demonstrate that the increase in proportion of the 13S mRNA at late times is also regulated by a change in the specificity of RNA splicing. First, the relative concentrations of the 13S to 22S nuclear RNAs were not constant throughout infection but increased at late times. Secondly, studies with the mutant, adenovirus 2 pm2250 , provided evidence that there was an increased propensity to utilize a 5' splice in the region of the 13S 5' splice site at late times in infection. Adenovirus 2 pm2250 has a G----C transversion in the first base of E1B 13S mRNA intron preventing splicing of the 13S mRNA but not of the 22S mRNA. During the early phase of a pm2250 infection, the E1B primary transcripts were processed into the 22S mRNA only. However, during the late phase, when the 13S mRNA normally predominates, E1B primary transcripts were also processed by RNA splicing at two formerly unused or cryptic 5' splice sites. Both cryptic splice sites were located much closer to the disrupted 13S 5' splice site than to the 22S 5' splice site. Thus, the temporal increase in proportion of the 13S mRNA to the 22S mRNA is regulated by two processes, an increase in cytoplasmic stability of the 13S mRNA and an increased propensity to utilize the 13S 5' splice site during the late phase of infection. Adenovirus 2 pm2250 was not defective for productive infection of HeLa cells or for transformation of rat cells.


1984 ◽  
Vol 4 (5) ◽  
pp. 966-972 ◽  
Author(s):  
C Montell ◽  
E F Fisher ◽  
M H Caruthers ◽  
A J Berk

The primary transcript from adenovirus 2 early region 1B (E1B) is processed by differential RNA splicing into two overlapping mRNAs, 13S and 22S. The 22S mRNA is the major E1B mRNA during the early phase of infection, whereas the 13S mRNA predominates during the late phase. In previous work, it has been shown that this shift in proportions of the E1B mRNAs is influenced by increased cytoplasmic stability of the 13S mRNA at late times in infection. Two observations presented here demonstrate that the increase in proportion of the 13S mRNA at late times is also regulated by a change in the specificity of RNA splicing. First, the relative concentrations of the 13S to 22S nuclear RNAs were not constant throughout infection but increased at late times. Secondly, studies with the mutant, adenovirus 2 pm2250 , provided evidence that there was an increased propensity to utilize a 5' splice in the region of the 13S 5' splice site at late times in infection. Adenovirus 2 pm2250 has a G----C transversion in the first base of E1B 13S mRNA intron preventing splicing of the 13S mRNA but not of the 22S mRNA. During the early phase of a pm2250 infection, the E1B primary transcripts were processed into the 22S mRNA only. However, during the late phase, when the 13S mRNA normally predominates, E1B primary transcripts were also processed by RNA splicing at two formerly unused or cryptic 5' splice sites. Both cryptic splice sites were located much closer to the disrupted 13S 5' splice site than to the 22S 5' splice site. Thus, the temporal increase in proportion of the 13S mRNA to the 22S mRNA is regulated by two processes, an increase in cytoplasmic stability of the 13S mRNA and an increased propensity to utilize the 13S 5' splice site during the late phase of infection. Adenovirus 2 pm2250 was not defective for productive infection of HeLa cells or for transformation of rat cells.


2020 ◽  
Vol 89 (1) ◽  
pp. 359-388 ◽  
Author(s):  
Max E. Wilkinson ◽  
Clément Charenton ◽  
Kiyoshi Nagai

The spliceosome removes introns from messenger RNA precursors (pre-mRNA). Decades of biochemistry and genetics combined with recent structural studies of the spliceosome have produced a detailed view of the mechanism of splicing. In this review, we aim to make this mechanism understandable and provide several videos of the spliceosome in action to illustrate the intricate choreography of splicing. The U1 and U2 small nuclear ribonucleoproteins (snRNPs) mark an intron and recruit the U4/U6.U5 tri-snRNP. Transfer of the 5′ splice site (5′SS) from U1 to U6 snRNA triggers unwinding of U6 snRNA from U4 snRNA. U6 folds with U2 snRNA into an RNA-based active site that positions the 5′SS at two catalytic metal ions. The branch point (BP) adenosine attacks the 5′SS, producing a free 5′ exon. Removal of the BP adenosine from the active site allows the 3′SS to bind, so that the 5′ exon attacks the 3′SS to produce mature mRNA and an excised lariat intron.


1993 ◽  
Vol 13 (5) ◽  
pp. 2677-2687 ◽  
Author(s):  
D A Sterner ◽  
S M Berget

Very small vertebrate exons are problematic for RNA splicing because of the proximity of their 3' and 5' splice sites. In this study, we investigated the recognition of a constitutive 7-nucleotide mini-exon from the troponin I gene that resides quite close to the adjacent upstream exon. The mini-exon failed to be included in spliced RNA when placed in a heterologous gene unless accompanied by the upstream exon. The requirement for the upstream exon disappeared when the mini-exon was internally expanded, suggesting that the splice sites bordering the mini-exon are compatible with those of other constitutive vertebrate exons and that the small size of the exon impaired inclusion. Mutation of the 5' splice site of the natural upstream exon did not result in either exon skipping or activation of a cryptic 5' splice site, the normal vertebrate phenotypes for such mutants. Instead, a spliced RNA accumulated that still contained the upstream intron. In vitro, the mini-exon failed to assemble into spliceosome complexes unless either internally expanded or accompanied by the upstream exon. Thus, impaired usage of the mini-exon in vivo was accompanied by impaired recognition in vitro, and recognition of the mini-exon was facilitated by the presence of the upstream exon in vivo and in vitro. Cumulatively, the atypical in vivo and in vitro properties of the troponin exons suggest a mechanism for the recognition of this mini-exon in which initial recognition of an exon-intron-exon unit is followed by subsequent recognition of the intron.


Blood ◽  
1992 ◽  
Vol 80 (6) ◽  
pp. 1553-1558 ◽  
Author(s):  
M de Boer ◽  
BG Bolscher ◽  
MC Dinauer ◽  
SH Orkin ◽  
CI Smith ◽  
...  

Chronic granulomatous disease (CGD) is characterized by the absence of a respiratory burst in activated phagocytes. Defects in at least four different genes lead to CGD. Patients with the X-linked form of CGD have mutations in the gene for the beta-subunit of cytochrome b558 (gp91-phox). We studied the molecular defect in four patients with X- linked CGD. In a fifth family, we studied the mother of a patient with X-linked CGD who had died before our investigations. Gp91-phox messenger RNA (mRNA) was reverse transcribed into cDNA and the coding region was amplified by polymerase chain reaction into three fragments. Sequence analysis showed the absence of the exon 7, 5, 3, and 2 sequences in patients 1, 2, 3, and 4, respectively. In carrier 5, we found both normal cDNA and cDNA that lacked 57 3′-nucleotides of exon 6. We analyzed the splice sites of the flanking introns of the missing exons. In patients 1, 2, and 3, we found single nucleotide substitutions within the first five positions of the down-stream 5′ donor splice sites. In patient 4, a similar substitution was found at position -1 of the 3′ acceptor splice site of intron 1. In carrier 5, no mutation was found in the exon 6-intron 6 boundary sequence. Instead, a single substitution was observed in exon 6 (C----A at nucleotide 633) that created a new donor splice site. Apparently, mRNA splicing occurs preferentially at this newly created splice site. We conclude that the absence of the exon sequences in the gp91-phox mRNA of these patients is due to splicing errors. Of 30 European X-linked CGD patients studied by us so far, five appear to be caused by mutations that affect correct mRNA splicing. Thus, such mutations appear to be a common cause of X-linked CGD.


Endocrinology ◽  
2003 ◽  
Vol 144 (3) ◽  
pp. 1074-1085 ◽  
Author(s):  
Samia Selmi-Ruby ◽  
Chantal Watrin ◽  
Severine Trouttet-Masson ◽  
Françoise Bernier-Valentin ◽  
Virginie Flachon ◽  
...  

The sodium/iodide symporter (NIS) is a membrane protein mediating the active transport of iodide into the thyroid gland. NIS, expressed by human, rat, and mouse thyrocytes, is encoded by a single transcript. We identified NIS mRNA species of 3.5 and 3 kb in porcine thyrocytes. Because porcine thyrocytes in primary culture is a widely used experimental system for thyroid iodide metabolism, we further examined the origin and the function of the porcine NIS (pNIS) transcripts. We generated a porcine thyroid cDNA library from which four different clones, pNIS-D, F, J, and ΔJ were isolated. pNIS-D encodes a protein of 643 amino acids highly homologous to the human, rat, and mouse NIS. pNIS-F and J differ from each other and from pNIS-D in their C-terminal part. pNIS-ΔJ lacks a six-amino-acid segment within the putative transmembrane domain 10. Transiently expressed in Cos-7 cells, the four pNIS-cDNAs led to the synthesis of proteins targeted at the plasma membrane and conferred perchlorate-sensitive iodide uptake activities to Cos-7 cells, except pNIS-ΔJ, which was devoid of activity. PNIS-D probably derives from the 3.5-kb transcript and pNIS-F, J, and ΔJ from the 3-kb transcript. The relative abundance of pNIS-D, F, and J transcripts in porcine thyrocytes was about 60%, 35%, and 5%, respectively; the ΔJ transcript was not present in detectable amount. By comparing porcine NIS genomic and cDNA sequences, splice donor and acceptor sites accounting for the generation of pNIS-F, J, and ΔJ transcripts were identified. None of the combinations of alternative splice sites found in the pig was present in the human, rat or mouse NIS gene. Our data show that porcine NIS gene, contrary to the NIS gene from other species, gives rise to splice variants leading to three active and one inactive NIS proteins.


2009 ◽  
Vol 30 (1) ◽  
pp. 107-114 ◽  
Author(s):  
Maaike P.G. Vreeswijk ◽  
Jaennelle N. Kraan ◽  
Heleen M. van der Klift ◽  
Geraldine R. Vink ◽  
Cees J. Cornelisse ◽  
...  

2007 ◽  
Vol 4 (2) ◽  
pp. 24-46 ◽  
Author(s):  
T. Shashi Rekha ◽  
Chanchal K Mitra

Summary We have carried out a comparative analysis of the sub-sequences of size six| ten at the (donor| acceptor) splice site regions of five different organisms. The frequency analysis of the unique sub-sequences at the donor and acceptor regions suggests that the distribution of their occurrence is approximately exponential. We have observed that the number of unique sub-sequences (occurring with different frequencies) at the donor region are less than at the acceptor, suggesting that the sub-sequences at the acceptor region are more variable. The sub-sequences with high percentage of occurrence (uniqueness) are considered to be highly involved in splicing. Our analysis suggests that sub-sequences of length ~6-8 nucleotides (nt) at the splice sites – with six bases in intron (including the two central, conserved dinucleotides) and two bases in exon are optimal for the efficient assembly and binding of the spliceosomal complex during the process of splicing. The score pattern obtained by the alignment of the nucleotides at the donor region with the acceptor and vice-versa also suggests that a single sub-sequence at the donor region have different degree of similarity with sub-sequences at the acceptor thus determining that the donor sub-sequences are more crucial in pairing with the corresponding acceptor sub-sequences during the process of splicing.


1993 ◽  
Vol 13 (5) ◽  
pp. 2677-2687
Author(s):  
D A Sterner ◽  
S M Berget

Very small vertebrate exons are problematic for RNA splicing because of the proximity of their 3' and 5' splice sites. In this study, we investigated the recognition of a constitutive 7-nucleotide mini-exon from the troponin I gene that resides quite close to the adjacent upstream exon. The mini-exon failed to be included in spliced RNA when placed in a heterologous gene unless accompanied by the upstream exon. The requirement for the upstream exon disappeared when the mini-exon was internally expanded, suggesting that the splice sites bordering the mini-exon are compatible with those of other constitutive vertebrate exons and that the small size of the exon impaired inclusion. Mutation of the 5' splice site of the natural upstream exon did not result in either exon skipping or activation of a cryptic 5' splice site, the normal vertebrate phenotypes for such mutants. Instead, a spliced RNA accumulated that still contained the upstream intron. In vitro, the mini-exon failed to assemble into spliceosome complexes unless either internally expanded or accompanied by the upstream exon. Thus, impaired usage of the mini-exon in vivo was accompanied by impaired recognition in vitro, and recognition of the mini-exon was facilitated by the presence of the upstream exon in vivo and in vitro. Cumulatively, the atypical in vivo and in vitro properties of the troponin exons suggest a mechanism for the recognition of this mini-exon in which initial recognition of an exon-intron-exon unit is followed by subsequent recognition of the intron.


Sign in / Sign up

Export Citation Format

Share Document