scholarly journals Shark: fishing in a sample to discard useless RNA-Seq reads

2019 ◽  
Author(s):  
Paola Bonizzoni ◽  
Tamara Ceccato ◽  
Gianluca Della Vedova ◽  
Luca Denti ◽  
Yuri Pirola ◽  
...  

Recent advances in high throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing such useless reads from the input dataset leads to improved efficiency without compromising the results of the study.To this aim, in this paper we introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given a RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events.We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results.The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark.

Author(s):  
Luca Denti ◽  
Yuri Pirola ◽  
Marco Previtali ◽  
Tamara Ceccato ◽  
Gianluca Della Vedova ◽  
...  

Abstract Motivation Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability and implementation The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


PLoS ONE ◽  
2015 ◽  
Vol 10 (4) ◽  
pp. e0125702 ◽  
Author(s):  
Mei Yang ◽  
Liming Xu ◽  
Yanling Liu ◽  
Pingfang Yang

2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Anne-Laure Bougé ◽  
Eva Murauer ◽  
Emmanuelle Beyne ◽  
Julie Miro ◽  
Jessica Varilh ◽  
...  

Abstract We have analysed the splicing pattern of the human Duchenne Muscular Dystrophy (DMD) transcript in normal skeletal muscle. To achieve depth of coverage required for the analysis of this lowly expressed gene in muscle, we designed a targeted RNA-Seq procedure that combines amplification of the full-length 11.3 kb DMD cDNA sequence and 454 sequencing technology. A high and uniform coverage of the cDNA sequence was obtained that allowed to draw up a reliable inventory of the physiological alternative splicing events in the muscular DMD transcript. In contrast to previous assumptions, we evidenced that most of the 79 DMD exons are constitutively spliced in skeletal muscle. Only a limited number of 12 alternative splicing events were identified, all present at a very low level. These include previously known exon skipping events but also newly described pseudoexon inclusions and alternative 3′ splice sites, of which one is the first functional NAGNAG splice site reported in the DMD gene. This study provides the first RNA-Seq-based reference of DMD splicing pattern in skeletal muscle and reports on an experimental procedure well suited to detect condition-specific differences in this low abundance transcript that may prove useful for diagnostic, research or RNA-based therapeutic applications.


2018 ◽  
Author(s):  
Luca Denti ◽  
Raffaella Rizzi ◽  
Stefano Beretta ◽  
Gianluca Della Vedova ◽  
Marco Previtali ◽  
...  

AbstractBackground: While the reconstruction of transcripts from a sample of RNA-Seq data is a computationally expensive and complicated task, the detection of splicing events from RNA-Seq data and a gene annotation is computationally feasible. The latter task, which is adequate for many transcriptome analyses, is usually achieved by aligning the reads to a reference genome, followed by comparing the alignments with a gene annotation, often implicitly represented by a graph: the splicing graph.Results: We present ASGAL (Alternative Splicing Graph ALigner): a tool for mapping RNA-Seq data to the splicing graph, with the main goal of detecting novel alternative splicing events. ASGAL receives in input the annotated transcripts of a gene and an RNA-Seq sample, and it computes (1) the spliced alignments of each read, and (2) a list of novel events with respect to the gene annotation.Conclusions: An experimental analysis shows that, by aligning reads directly to the splicing graph, ASGAL better predicts alternative splicing events when compared to tools requiring spliced alignments of the RNA-Seq data to a reference genome. To the best of our knowledge, ASGAL is the first tool that detects novel alternative splicing events by directly aligning reads to a splicing graph.Availability: Source code, documentation, and data are available for download at http://asgal.algolab.eu.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 638-638 ◽  
Author(s):  
Naim Rashid ◽  
Stephane Minvielle ◽  
Florence Magrangeas ◽  
Mehmet Kemal Samur ◽  
Alice Clynen ◽  
...  

Abstract Alternative splicing is an important post-translational change that alters gene function. Misregulation of alternative splicing has been implicated in number of disease processes including cancer. Here we have analyzed alternative splicing in myeloma using high throughput RNA-seq. Our analytic pipeline for RNA-seq data used in this investigation not only provides information on expression levels for genes, but also provides information on the expression of known splice variants of genes (isoforms), and can identify novel exon level events across individuals (i.e. exon skipping events). We conducted a study of 328 newly-diagnosed patients with multiple myeloma treated homogeneously with novel agent combination containting lenalidomide, bortezomib and dexamethsone with or without high-dose melphalan followed by lenalidomide maintenance in the IFM/DFCI study. RNA isolated from purified CD138+ MM cells collected at the time of diagnosis and from 18 normal donor plasma cells were processed by RNA-seq (100 million paired end reads on Illumina HiSeq) and analyzed using a custom computational and statistical pipeline. Following read alignment to hg19, we utilized RSEM to quantify both gene-level and isoform-level expression of known ENSEMBL transcripts. We then implemented a novel testing approach based on compositional regression to discover genes that show significant isoform switching between the 328 MM samples and 18 Normal Plasma Cell (NPC) samples from healthy donors. Using various programs and their modifications, we also identified novel alternative splicing events, such as exon skipping and mutually exclusive exon usage, among others. Patient data for MM characteristics, cytogenetic and FISH as well as clinical survival outcomes were also analyzed and correlated with genomic data. We observed over 600 genes showing significant changes in relative isoform abundances (isoform switching) between MM and normal samples. A number of previously characterized genes including MYCL1 (adj. p = 0.0014) and CCND3 (adj. p = 0.0013), and MAP kinase-related genes (MAP3K8, MAPKAPK2, MAPKAPK3, MAP4K4) exhibited significant isoform switching compared to normal, in addition to some not well characterized genes. Genes showing the greatest magnitude of isoform switching include MEFV (adj. p = 2.7 x 10-5), showing a two fold change in the relative major isoform abundance compared to normal, and has been previously shown to have a role in lymphoid neoplasms. We applied hierarchical clustering to the isoforms showing significant changes in isoform-switching and identified 4 distinct clusters, which are currently being investigated for correlation with clinical subtypes of MM. Exon level analyses of alternative splicing events, such as exon skipping, are currently underway. Clinical data including MM characteristics, cytogenetics, FISH and survival outcomes was available for a subset of 265 patients. We found that 109 genes showed significant isoform switching between t(4;14) and non-t(4;14) patients, such as CD44 (adj. p =1.8 x 10-6) and WHSC1 (adj. p =5.1 x 10-28). Comparing del17p (28 in total) and non del17p patients, we found no significant splicing changes after multiple testing adjustment. Of these genes, only a subset (40%) were shown to be differentially expressed in terms of total gene expression, suggesting the importance of examining alternative splicing events in addition to total gene expression. With respect to treatment response, we compared the expression of gene isoforms between patients achieving complete response (CR) versus others and identified 38 isoforms associated with response to treatment (adj. p value < 0.05), with SEPT9, SLC2A5, and UBX6 having the strongest associations (adj. p-value < 3 x 10-4). Using a univariate cox regression model, 4 spliced isoforms relating to 3 genes were identified as having significant correlation with event-free survival (EFS) (FDR-adjusted cox p value < 0.05). We are in the process of now integrating the gene expression data with altered splicing data to develop an integrated survival model. In summary, this study highlights the significant frequency, biological and clinical importance of alternative splicing in MM and points to the need for evaluation of not only the expression level of genes but also post-translational modifications. The genes identified here are important targets for therapy as well as possible immune modulation. Disclosures Moreau: Celgene Corporation: Honoraria, Membership on an entity's Board of Directors or advisory committees.


2017 ◽  
Author(s):  
Juan P. Romero ◽  
María Ortiz-Estévez ◽  
Ander Muniategui ◽  
Soraya Carrancio ◽  
Fernando J. de Miguel ◽  
...  

AbstractRNA-seq is a reference technology for determining alternative splicing at genome-wide level. Exon arrays remain widely used for the analysis of gene expression, but show poor validation rate with regard to splicing events. Commercial arrays that include probes within exon junctions have been developed in order to overcome this problem.We compare the performance of RNA-seq (Illumina HiSeq) and junction arrays (Affymetrix Human Transcriptome array) for the analysis of transcript splicing events. Three different breast cancer cell lines were treated with CX-4945, a drug that severely affects splicing. To enable a direct comparison of the two platforms, we adapted EventPointer, an algorithm that detects and labels alternative splicing events using junction arrays, to work also on RNA-seq data. Common results and discrepancies between the technologies were validated and/or resolved by over 200 PCR experiments.As might be expected, RNA-seq appears superior in cases where the technologies disagree, and is able to discover novel splicing events beyond the limitations of physical probe-sets. We observe a high degree of coherence between the two technologies, however, with correlation of EventPointer results over 0.90. Through decimation, the detection power of the junction arrays is equivalent to RNA-seq with up to 60 million reads. Our results suggest, therefore, that exon-junction arrays are a viable alternative to RNA-seq for detection of alternative splicing events when focusing on well-described transcriptional regions.


2011 ◽  
Vol 41 (10) ◽  
pp. 1016-1023
Author(s):  
Ying ZHANG ◽  
DuanQing WANG ◽  
Tao HE ◽  
YaOu HU ◽  
YuMin WANG ◽  
...  

2016 ◽  
Vol 32 (12) ◽  
pp. 1840-1847 ◽  
Author(s):  
André Kahles ◽  
Cheng Soon Ong ◽  
Yi Zhong ◽  
Gunnar Rätsch

Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 1929-1929
Author(s):  
Parantu K Shah ◽  
Hervé Avet-Loiseau ◽  
Stephane Minvielle ◽  
Samir B. Amin ◽  
Florence Magrangeas ◽  
...  

Abstract Abstract 1929 Considerable efforts have been spent evaluating impact of global gene expression profile on clinical outcome. Although significant correlation has been described between outcome and expressed gene signature, the overall predictability of such models have reached a plateau. Biologically this is expected as gene function is modulated at multiple levels. Besides the change in level of expression, post-transcriptional changes such as alternate splicing alter specificity of gene function and may affect the eventual outcome. Although various genes have normal alternate spliced form, dysregulation of alternative splicing that alters protein function has been implicated in number of disease processes including cancer. We have observed significant level of dysregulated splicing events in multiple myeloma (MM). We hypothesize that a combined model that includes dysregulated splicing events, besides level of expressed genes, may provide superior survival model in MM. To develop a combined model we have hybridized RNA isolated from CD138+ purified MM cells collected at the time of diagnosis from 170 newly-diagnosed patients treated homogeneously in tandem transplantation IFM trials, 23 MM cell lines and 6 Healthy donors on Affymetrix Exon 1.0 ST GeneChip arrays. Exon array not only provides an accurate measure of expression levels for genes, but also allows simultaneous identification of alternative splicing events. Pre-processing and normalization methods in aroma, affymetrix and robust multichip analysis model in FIRMA, followed by t-tests with Benjamini-Hochberg multiple hypothesis corrections were used respectively to identify differential expression and alternative splicing. We identified 1454 differentially expressed genes and 759 differential splicing events between healthy donors and MM patients, and 5476 differentially expressed genes and 4012 differential splicing events between healthy donors and MM cell lines. There are 1071 differentially expressed genes and 286 alternative spliced exons shared between MM samples and MM cell lines. Univariate survival analysis using FIRMA scores of exons identified a total of 89 genes with more than 10 alternative splicing events between healthy donors and MM patients associated with survival with Cox proportional hazard model and log-rank tests. We have now built 3 different survival models considering: 1) gene expression only, 2) alternative splicing only, and 3) a combined model integrating gene expression and alternative splicing events. We utilized refined regularized variable selection methods to handle these high-dimensional feature space. Our analysis suggests that composite model using gene expression and alternative splicing information performs significantly better than the gene expression only model in identifying high-risk patients, when the data were divided in median or quartiles. Specifically, the difference in overall survival is 32.6 months to 38.5 months using the median survival, and 18 months vs 23 months for median event free survival. We are currently in the process of validating the combined model. Our data suggests the need for inclusion of modifiers of transcriptome to develop a comprehensive model that will have higher predicative power for risk stratification as well as for selection of therapeutic intervention. Disclosures: Anderson: Millennium Pharmaceuticals: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding, Speakers Bureau. Munshi:Millennium Pharmaceuticals: Honoraria, Speakers Bureau.


Sign in / Sign up

Export Citation Format

Share Document