scholarly journals ASGAL: Aligning RNA-Seq Data to a Splicing Graph to Detect Novel Alternative Splicing Events

2018 ◽  
Author(s):  
Luca Denti ◽  
Raffaella Rizzi ◽  
Stefano Beretta ◽  
Gianluca Della Vedova ◽  
Marco Previtali ◽  
...  

AbstractBackground: While the reconstruction of transcripts from a sample of RNA-Seq data is a computationally expensive and complicated task, the detection of splicing events from RNA-Seq data and a gene annotation is computationally feasible. The latter task, which is adequate for many transcriptome analyses, is usually achieved by aligning the reads to a reference genome, followed by comparing the alignments with a gene annotation, often implicitly represented by a graph: the splicing graph.Results: We present ASGAL (Alternative Splicing Graph ALigner): a tool for mapping RNA-Seq data to the splicing graph, with the main goal of detecting novel alternative splicing events. ASGAL receives in input the annotated transcripts of a gene and an RNA-Seq sample, and it computes (1) the spliced alignments of each read, and (2) a list of novel events with respect to the gene annotation.Conclusions: An experimental analysis shows that, by aligning reads directly to the splicing graph, ASGAL better predicts alternative splicing events when compared to tools requiring spliced alignments of the RNA-Seq data to a reference genome. To the best of our knowledge, ASGAL is the first tool that detects novel alternative splicing events by directly aligning reads to a splicing graph.Availability: Source code, documentation, and data are available for download at http://asgal.algolab.eu.

2021 ◽  
Vol 22 (9) ◽  
pp. 4468
Author(s):  
Naima Ahmed Fahmi ◽  
Heba Nassereddeen ◽  
Jaewoong Chang ◽  
Meeyeon Park ◽  
Hsinsung Yeh ◽  
...  

(1) Background: A simplistic understanding of the central dogma falls short in correlating the number of genes in the genome to the number of proteins in the proteome. Post-transcriptional alternative splicing contributes to the complexity of the proteome and is critical in understanding gene expression. mRNA-sequencing (RNA-seq) has been widely used to study the transcriptome and provides opportunity to detect alternative splicing events among different biological conditions. Despite the popularity of studying transcriptome variants with RNA-seq, few efficient and user-friendly bioinformatics tools have been developed for the genome-wide detection and visualization of alternative splicing events. (2) Results: We propose AS-Quant, (Alternative Splicing Quantitation), a robust program to identify alternative splicing events from RNA-seq data. We then extended AS-Quant to visualize the splicing events with short-read coverage plots along with complete gene annotation. The tool works in three major steps: (i) calculate the read coverage of the potential spliced exons and the corresponding gene; (ii) categorize the events into five different categories according to the annotation, and assess the significance of the events between two biological conditions; (iii) generate the short reads coverage plot for user specified splicing events. Our extensive experiments on simulated and real datasets demonstrate that AS-Quant outperforms the other three widely used baselines, SUPPA2, rMATS, and diffSplice for detecting alternative splicing events. Moreover, the significant alternative splicing events identified by AS-Quant between two biological contexts were validated by RT-PCR experiment. (3) Availability: AS-Quant is implemented in Python 3.0. Source code and a comprehensive user’s manual are freely available online.


Author(s):  
Naima Ahmed Fahmi ◽  
Hsin-Sung Yeh ◽  
Jae-Woong Chang ◽  
Heba Nassereddeen ◽  
Deliang Fan ◽  
...  

AbstractA simplistic understanding of the central dogma falls short in correlating the number of genes in the genome to the number of proteins in the proteome. Post-transcriptional alternative splicing contributes to the complexity of proteome and are critical in understanding gene expression. mRNA-sequencing (RNA-seq) has been widely used to study the transcriptome and provides opportunity to detect alternative splicing events among different biological conditions. Despite the popularity of studying transcriptome variants with RNA-seq, few efficient and user-friendly bioinformatics tools have been developed for the genome-wide detection and visualization of alternative splicing events. We have developed AS-Quant (Alternative Splicing Quantitation), a robust program to identify alternative splicing events and visualize the short-read coverage with gene annotations. AS-Quant works in three steps: (i) calculate the read coverage of the potential splicing exons and the corresponding gene; (ii) categorize the splicing events into five different types based on annotation, and assess the significance of the events between two biological conditions; (iii) generate the short reads coverage plot with a complete gene annotation for user specified splicing events. To evaluate the performance, two significant alternative splicing events identified by AS-Quant between two biological contexts were validated by RT-PCR.ImplementationAS-Quant is implemented in Python. Source code and a comprehensive user’s manual are freely available at https://github.com/CompbioLabUCF/AS-Quant


PLoS ONE ◽  
2015 ◽  
Vol 10 (4) ◽  
pp. e0125702 ◽  
Author(s):  
Mei Yang ◽  
Liming Xu ◽  
Yanling Liu ◽  
Pingfang Yang

2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Anne-Laure Bougé ◽  
Eva Murauer ◽  
Emmanuelle Beyne ◽  
Julie Miro ◽  
Jessica Varilh ◽  
...  

Abstract We have analysed the splicing pattern of the human Duchenne Muscular Dystrophy (DMD) transcript in normal skeletal muscle. To achieve depth of coverage required for the analysis of this lowly expressed gene in muscle, we designed a targeted RNA-Seq procedure that combines amplification of the full-length 11.3 kb DMD cDNA sequence and 454 sequencing technology. A high and uniform coverage of the cDNA sequence was obtained that allowed to draw up a reliable inventory of the physiological alternative splicing events in the muscular DMD transcript. In contrast to previous assumptions, we evidenced that most of the 79 DMD exons are constitutively spliced in skeletal muscle. Only a limited number of 12 alternative splicing events were identified, all present at a very low level. These include previously known exon skipping events but also newly described pseudoexon inclusions and alternative 3′ splice sites, of which one is the first functional NAGNAG splice site reported in the DMD gene. This study provides the first RNA-Seq-based reference of DMD splicing pattern in skeletal muscle and reports on an experimental procedure well suited to detect condition-specific differences in this low abundance transcript that may prove useful for diagnostic, research or RNA-based therapeutic applications.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 638-638 ◽  
Author(s):  
Naim Rashid ◽  
Stephane Minvielle ◽  
Florence Magrangeas ◽  
Mehmet Kemal Samur ◽  
Alice Clynen ◽  
...  

Abstract Alternative splicing is an important post-translational change that alters gene function. Misregulation of alternative splicing has been implicated in number of disease processes including cancer. Here we have analyzed alternative splicing in myeloma using high throughput RNA-seq. Our analytic pipeline for RNA-seq data used in this investigation not only provides information on expression levels for genes, but also provides information on the expression of known splice variants of genes (isoforms), and can identify novel exon level events across individuals (i.e. exon skipping events). We conducted a study of 328 newly-diagnosed patients with multiple myeloma treated homogeneously with novel agent combination containting lenalidomide, bortezomib and dexamethsone with or without high-dose melphalan followed by lenalidomide maintenance in the IFM/DFCI study. RNA isolated from purified CD138+ MM cells collected at the time of diagnosis and from 18 normal donor plasma cells were processed by RNA-seq (100 million paired end reads on Illumina HiSeq) and analyzed using a custom computational and statistical pipeline. Following read alignment to hg19, we utilized RSEM to quantify both gene-level and isoform-level expression of known ENSEMBL transcripts. We then implemented a novel testing approach based on compositional regression to discover genes that show significant isoform switching between the 328 MM samples and 18 Normal Plasma Cell (NPC) samples from healthy donors. Using various programs and their modifications, we also identified novel alternative splicing events, such as exon skipping and mutually exclusive exon usage, among others. Patient data for MM characteristics, cytogenetic and FISH as well as clinical survival outcomes were also analyzed and correlated with genomic data. We observed over 600 genes showing significant changes in relative isoform abundances (isoform switching) between MM and normal samples. A number of previously characterized genes including MYCL1 (adj. p = 0.0014) and CCND3 (adj. p = 0.0013), and MAP kinase-related genes (MAP3K8, MAPKAPK2, MAPKAPK3, MAP4K4) exhibited significant isoform switching compared to normal, in addition to some not well characterized genes. Genes showing the greatest magnitude of isoform switching include MEFV (adj. p = 2.7 x 10-5), showing a two fold change in the relative major isoform abundance compared to normal, and has been previously shown to have a role in lymphoid neoplasms. We applied hierarchical clustering to the isoforms showing significant changes in isoform-switching and identified 4 distinct clusters, which are currently being investigated for correlation with clinical subtypes of MM. Exon level analyses of alternative splicing events, such as exon skipping, are currently underway. Clinical data including MM characteristics, cytogenetics, FISH and survival outcomes was available for a subset of 265 patients. We found that 109 genes showed significant isoform switching between t(4;14) and non-t(4;14) patients, such as CD44 (adj. p =1.8 x 10-6) and WHSC1 (adj. p =5.1 x 10-28). Comparing del17p (28 in total) and non del17p patients, we found no significant splicing changes after multiple testing adjustment. Of these genes, only a subset (40%) were shown to be differentially expressed in terms of total gene expression, suggesting the importance of examining alternative splicing events in addition to total gene expression. With respect to treatment response, we compared the expression of gene isoforms between patients achieving complete response (CR) versus others and identified 38 isoforms associated with response to treatment (adj. p value < 0.05), with SEPT9, SLC2A5, and UBX6 having the strongest associations (adj. p-value < 3 x 10-4). Using a univariate cox regression model, 4 spliced isoforms relating to 3 genes were identified as having significant correlation with event-free survival (EFS) (FDR-adjusted cox p value < 0.05). We are in the process of now integrating the gene expression data with altered splicing data to develop an integrated survival model. In summary, this study highlights the significant frequency, biological and clinical importance of alternative splicing in MM and points to the need for evaluation of not only the expression level of genes but also post-translational modifications. The genes identified here are important targets for therapy as well as possible immune modulation. Disclosures Moreau: Celgene Corporation: Honoraria, Membership on an entity's Board of Directors or advisory committees.


2017 ◽  
Author(s):  
Juan P. Romero ◽  
María Ortiz-Estévez ◽  
Ander Muniategui ◽  
Soraya Carrancio ◽  
Fernando J. de Miguel ◽  
...  

AbstractRNA-seq is a reference technology for determining alternative splicing at genome-wide level. Exon arrays remain widely used for the analysis of gene expression, but show poor validation rate with regard to splicing events. Commercial arrays that include probes within exon junctions have been developed in order to overcome this problem.We compare the performance of RNA-seq (Illumina HiSeq) and junction arrays (Affymetrix Human Transcriptome array) for the analysis of transcript splicing events. Three different breast cancer cell lines were treated with CX-4945, a drug that severely affects splicing. To enable a direct comparison of the two platforms, we adapted EventPointer, an algorithm that detects and labels alternative splicing events using junction arrays, to work also on RNA-seq data. Common results and discrepancies between the technologies were validated and/or resolved by over 200 PCR experiments.As might be expected, RNA-seq appears superior in cases where the technologies disagree, and is able to discover novel splicing events beyond the limitations of physical probe-sets. We observe a high degree of coherence between the two technologies, however, with correlation of EventPointer results over 0.90. Through decimation, the detection power of the junction arrays is equivalent to RNA-seq with up to 60 million reads. Our results suggest, therefore, that exon-junction arrays are a viable alternative to RNA-seq for detection of alternative splicing events when focusing on well-described transcriptional regions.


2021 ◽  
Author(s):  
Wenbin Guo ◽  
Max Coulter ◽  
Robbie Waugh ◽  
Runxuan Zhang

High quality transcriptome assembly using short reads from RNA-seq data still heavily relies upon reference-based approaches, of which the primary step is to align RNA-seq reads to a single reference genome of haploid sequence. However, it is increasingly apparent that while different genotypes within a species share core genes, they also contain variable numbers of specific genes that are only present a subset of individuals. Using a common reference may thus lead to a loss of genotype-specific information in the assembled transcript dataset and the generation of erroneous, incomplete or misleading transcriptomics analysis results. With the recent development of pan-genome information in many species, it is important that we understand the limitations of single genotype references for transcriptomics analysis. In this study, we quantitively evaluated the advantages of using genotype-specific reference genomes for transcriptome assembly and analysis using cultivated barley as a model. We mapped barley cultivar Barke RNA-seq reads to the Barke genome and to the cultivar Morex genome (common barley genome reference) to construct a genotype specific Reference Transcript Dataset (sRTD) and a common Reference Transcript Datasets (cRTD), respectively. We compared the two RTDs according to their transcript diversity, transcript sequence and structure similarity and the accuracy they provided for transcript quantification and differential expression analysis. Our evaluation shows that the sRTD has a significantly higher diversity of transcripts and alternative splicing events. Despite using a high-quality reference genome for assembly of the cRTD, we miss ca. 40% transcripts present in the sRTD and cRTD only has ca. 70% true assemblies. We found that the sRTD is more accurate for transcript quantification as well as differential expression and differential alternative splicing analysis. However, gene level quantification and comparative expression analysis are less affected by the source RTD, which indicates that analysing transcriptomic data at the gene level may be a reasonable compromise when a high-quality genotype-specific reference is not available.


2011 ◽  
Vol 41 (10) ◽  
pp. 1016-1023
Author(s):  
Ying ZHANG ◽  
DuanQing WANG ◽  
Tao HE ◽  
YaOu HU ◽  
YuMin WANG ◽  
...  

2019 ◽  
Author(s):  
Paola Bonizzoni ◽  
Tamara Ceccato ◽  
Gianluca Della Vedova ◽  
Luca Denti ◽  
Yuri Pirola ◽  
...  

Recent advances in high throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing such useless reads from the input dataset leads to improved efficiency without compromising the results of the study.To this aim, in this paper we introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given a RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events.We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results.The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark.


Sign in / Sign up

Export Citation Format

Share Document