scholarly journals The evolution of splicing: transcriptome complexity and transcript distances implemented in TranD

2021 ◽  
Author(s):  
Adalena V Nanni ◽  
James Titus-McQuillan ◽  
Oleksandr Moskalenko ◽  
Francisco Pardo-Palacios ◽  
Zihao Liu ◽  
...  

Alternative splicing contributes to organismal complexity. Comparing transcripts between and within species is an important first step toward understanding questions about how evolution of transcript structure changes between species and contributes to sub-functionalization. These questions are confounded with issues of data quality and availability. The recent explosion of affordable long read sequencing of mRNA has considerably widened the ability to study transcriptional variation in non-model species. In this work, we develop a computational framework that uses nucleotide resolution distance metrics to compare transcript models for structural phenotypes: total transcript length, intron retention, donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs are each scored qualitatively and quantitatively in terms of number of nucleotides. For a single annotation file, all differences among transcripts within a gene are summarized and transcriptome-level complexity metrics: number of variable nucleotides, unique exons per gene, exons per transcript, and transcripts per gene are calculated. To compare two transcriptomes on the same co-ordinates, a weighted total distance between pairs of transcripts for the same gene is calculated. The weight function proposed has larger penalties for intron retention and exon skipping than alternative donor/acceptor sites. Minimum distances can be used to identify both transcript pairs and transcripts missing structural elements in either of the two annotations. This enables a broad range of functionality from comparing sister species to comparing different methods of building and summarizing transcriptomes. Importantly, the philosophy here is to output metrics, enabling others to explore the nucleotide-level distance metrics. Single transcriptome annotation summaries and pairwise comparisons are implemented in a new tool, TranD, distributed as a PyPi package and in the open-source web-based Galaxy (www.galaxyproject.org) platform.

Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


2019 ◽  
Author(s):  
Alexandra Dainis ◽  
Elizabeth Tseng ◽  
Tyson A. Clark ◽  
Ting Hon ◽  
Matthew Wheeler ◽  
...  

ABSTRACTBackgroundClinical sequencing has traditionally focused on genomic DNA through the use of targeted panels and exome sequencing, rather than investigating the potential transcriptomic consequences of disease-associated variants. RNA sequencing has recently been shown to be an effective additional tool for identifying disease-causing variants. We here use targeted long-read genome and transcriptome sequencing to efficiently and economically identify molecular consequences of a rare, disease-associated variant in hypertrophic cardiomyopathy (HCM).Methods and ResultsOur study, which employed both Pacific Biosciences SMRT sequencing and Oxford Nanopore Technologies MinION sequencing, as well as two RNA targeting strategies, identified alternatively-spliced isoforms that resulted from a splice-site variant containing allele in HCM. These included a predicted in-frame exon-skipping event, as well as an abundance of additional isoforms with unexpected intron-inclusion, exon-extension, and pseudo-exon events. The use of long-read RNA sequencing allowed us to not only investigate full length alternatively-spliced transcripts but also to phase them back to the variant-containing allele.ConclusionsWe suggest that targeted, long-read RNA sequencing in conjunction with genome sequencing may provide additional molecular evidence of disease for rare or de novo variants in cardiovascular disease, as well as providing new information about the consequence of these variants on downstream RNA and protein expression.


2020 ◽  
Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

AbstractMotivationLong-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models.ResultsSwan finds 4,909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 1,021 reproducible exon skipping and 73 intron retention events not recorded in the GENCODE v29 annotation.AvailabilityThe Swan library for Python 3 is available on PyPi and on GitHub at https://pypi.org/project/swan-vis/1.0/ and https://github.com/mortazavilab/swan_paper.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1405
Author(s):  
Wen Feng ◽  
Pengju Zhao ◽  
Xianrui Zheng ◽  
Zhengzheng Hu ◽  
Jianfeng Liu

Alternative splicing (AS) is a process during gene expression that results in a single gene coding for different protein variants. AS contributes to transcriptome and proteome diversity. In order to characterize AS in pigs, genome-wide transcripts and AS events were detected using RNA sequencing of 34 different tissues in Duroc pigs. In total, 138,403 AS events and 29,270 expressed genes were identified. An alternative donor site was the most common AS form and accounted for 44% of the total AS events. The percentage of the other three AS forms (exon skipping, alternative acceptor site, and intron retention) was approximately 19%. The results showed that the most common AS events involving alternative donor sites could produce different transcripts or proteins that affect the biological processes. The expression of genes with tissue-specific AS events showed that gene functions were consistent with tissue functions. AS increased proteome diversity and resulted in novel proteins that gained or lost important functional domains. In summary, these findings extend porcine genome annotation and highlight roles that AS could play in determining tissue identity.


2019 ◽  
Author(s):  
Wen Feng ◽  
Pengju Zhao ◽  
Xianrui Zheng ◽  
Jian-Feng Liu

Abstract Background Alternative splicing (AS) is a process that mRNA precursor splices intron to form the mature mRNA. AS plays important roles in contributing to transcriptome and proteome divert. However, to date there is no research about pig AS in genome-wide level by RNA sequencing. Results To characterize the AS in pigs, herein we detected genome-wide transcripts and events by RNA sequencing technology (RNA-seq) 34 different tissues in Duroc pigs. In total, we identified 138, 403 AS events and 29, 270 expressed genes. We found alternative donor site was the most common AS form, which is accounted for 44% of the total AS events. The percentage of the other 3 AS forms (Exon skipping, Alternative acceptor site and Intron retention) are all around 19%. The results showed that the most common AS events (alternative donor site) can produce different transcripts or different proteins which affect the biological process. Among these AS events, 109, 483 were novel AS events, and the number of alternative donor splice site has increased the most (Accounting for 44% of the novel AS events). Conclusions The expression of gene with tissue specific AS events showed that the functions of these genes were consistent with the tissue function. AS increased proteome diversity and resulted in novel proteins that gained and lost important functional domains. In summary, these findings extend genome annotation and highlight roles that AS acts in tissue identity in pig.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guiomar Martín ◽  
Yamile Márquez ◽  
Federica Mantica ◽  
Paula Duque ◽  
Manuel Irimia

Abstract Background Alternative splicing (AS) is a widespread regulatory mechanism in multicellular organisms. Numerous transcriptomic and single-gene studies in plants have investigated AS in response to specific conditions, especially environmental stress, unveiling substantial amounts of intron retention that modulate gene expression. However, a comprehensive study contrasting stress-response and tissue-specific AS patterns and directly comparing them with those of animal models is still missing. Results We generate a massive resource for Arabidopsis thaliana, PastDB, comprising AS and gene expression quantifications across tissues, development and environmental conditions, including abiotic and biotic stresses. Harmonized analysis of these datasets reveals that A. thaliana shows high levels of AS, similar to fruitflies, and that, compared to animals, disproportionately uses AS for stress responses. We identify core sets of genes regulated specifically by either AS or transcription upon stresses or among tissues, a regulatory specialization that is tightly mirrored by the genomic features of these genes. Unexpectedly, non-intron retention events, including exon skipping, are overrepresented across regulated AS sets in A. thaliana, being also largely involved in modulating gene expression through NMD and uORF inclusion. Conclusions Non-intron retention events have likely been functionally underrated in plants. AS constitutes a distinct regulatory layer controlling gene expression upon internal and external stimuli whose target genes and master regulators are hardwired at the genomic level to specifically undergo post-transcriptional regulation. Given the higher relevance of AS in the response to different stresses when compared to animals, this molecular hardwiring is likely required for a proper environmental response in A. thaliana.


2014 ◽  
Author(s):  
Sean Ruddy ◽  
Marla Johnson ◽  
Elizabeth Purdom

The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is that of gene expression data, where the counts per gene are often modeled as some form of an over-dispersed Poisson. In this case, shrinkage estimates of the per-gene dispersion parameter have led to improved estimation of dispersion in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an over-dispersed binomial model. This is motivated by our interest in testing for differential exon skipping in mRNA-Seq experiments. We introduce a novel method that is developed by modeling the dispersion based on the double binomial distribution proposed by Efron (1986). Our method (WEB-Seq) is an empirical bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data (Robinson and Smyth, 2007; Robinson et al., 2010). We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and gives accurate control of the FDR compared to alternative approaches. We provide implementation of our methods in the R package DoubleExpSeq available on CRAN.


Blood ◽  
1992 ◽  
Vol 79 (12) ◽  
pp. 3212-3218 ◽  
Author(s):  
A Kato ◽  
K Yamamoto ◽  
S Miyazaki ◽  
SM Jung ◽  
M Moroi ◽  
...  

Abstract The genetic basis for Glanzmann's thrombasthenia (GT) was elucidated on a compound heterozygote with glycoprotein (GP)IIb gene: an opal mutation at the end of exon 17 (CGA----TGA) results in only a trace amount of GPIIb mRNA, and a splicing mutation at the acceptor site of exon 26 (CAG----GAG) causes an in-frame, exon skipping process from exon 25 to 27. This aberrant transcript encodes a single-chain polypeptide characterized by a 42-amino acid deletion, which includes the proteolytic cleavage site(s) and a unique, proline-rich region at the location corresponding to the carboxyl-terminal of the normal GPIIb alpha-chain. These characteristics are shared by a previously reported defective GPIIb molecule, which is neither assembled with GPIIIa nor transported to the cellular surface. Despite its normal transcription level, expression of the present defective GPIIb molecule was significantly decreased (approximately 6% of the control level). Because the precursor GPIIb molecule is assembled with GPIIIa in the endoplasmic reticulum (ER) and its processing, as well as stability, is dependent on the GPIIIa subunit, the defective GPIIb molecule may be rapidly degraded by the intrinsic quality control system of the ER due to its inability to form a stable heterodimer complex as a consequence of its misfolded structure. Although we did not confirm that the GPIIIa genes of this individual were normal, GPIIIa may be secondarily decreased (approximately 11% of control), because a large part of it could not be complexed, making it vulnerable to proteolysis. To elucidate the molecular basis for GT, we propose here a classification of GT based on the biosynthetic pathway of the GPIIb-IIIa complex.


2020 ◽  
Vol 10 (10) ◽  
pp. 3797-3810
Author(s):  
Manishi Pandey ◽  
Gary D. Stormo ◽  
Susan K. Dutcher

Genome-wide analysis of transcriptome data in Chlamydomonas reinhardtii shows periodic patterns in gene expression levels when cultures are grown under alternating light and dark cycles so that G1 of the cell cycle occurs in the light phase and S/M/G0 occurs during the dark phase. However, alternative splicing, a process that enables a greater protein diversity from a limited set of genes, remains largely unexplored by previous transcriptome based studies in C. reinhardtii. In this study, we used existing longitudinal RNA-seq data obtained during the light-dark cycle to investigate the changes in the alternative splicing pattern and found that 3277 genes (19.75% of 17,746 genes) undergo alternative splicing. These splicing events include Alternative 5′ (Alt 5′), Alternative 3′ (Alt 3′) and Exon skipping (ES) events that are referred as alternative site selection (ASS) events and Intron retention (IR) events. By clustering analysis, we identified a subset of events (26 ASS events and 10 IR events) that show periodic changes in the splicing pattern during the cell cycle. About two-thirds of these 36 genes either introduce a pre-termination codon (PTC) or introduce insertions or deletions into functional domains of the proteins, which implicate splicing in altering gene function. These findings suggest that alternative splicing is also regulated during the Chlamydomonas cell cycle, although not as extensively as changes in gene expression. The longitudinal changes in the alternative splicing pattern during the cell cycle captured by this study provides an important resource to investigate alternative splicing in genes of interest during the cell cycle in Chlamydomonas reinhardtii and other eukaryotes.


Cancers ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 295 ◽  
Author(s):  
Elisa Gelli ◽  
Mara Colombo ◽  
Anna Pinto ◽  
Giovanna De Vecchi ◽  
Claudia Foglia ◽  
...  

Highly penetrant variants of BRCA1/2 genes are involved in hereditary predisposition to breast and ovarian cancer. The detection of pathogenic BRCA variants has a considerable clinical impact, allowing appropriate cancer-risk management. However, a major drawback is represented by the identification of variants of uncertain significance (VUS). Many VUS potentially affect mRNA splicing, making transcript analysis an essential step for the definition of their pathogenicity. Here, we characterize the impact on splicing of ten BRCA1/2 variants. Aberrant splicing patterns were demonstrated for eight variants whose alternative transcripts were fully characterized. Different events were observed, including exon skipping, intron retention, and usage of de novo and cryptic splice sites. Transcripts with premature stop codons or in-frame loss of functionally important residues were generated. Partial/complete splicing effect and quantitative contribution of different isoforms were assessed, leading to variant classification according to Evidence-based Network for the Interpretation of Mutant Alleles (ENIGMA) consortium guidelines. Two variants could be classified as pathogenic and two as likely benign, while due to a partial splicing effect, six variants remained of uncertain significance. The association with an undefined tumor risk justifies caution in recommending aggressive risk-reduction treatments, but prevents the possibility of receiving personalized therapies with potential beneficial effect. This indicates the need for applying additional approaches for the analysis of variants resistant to classification by gene transcript analyses.


Sign in / Sign up

Export Citation Format

Share Document