scholarly journals Clustering of mRNA-Seq data for detection of alternative splicing patterns

2015 ◽  
Author(s):  
Marla Johnson ◽  
Elizabeth Purdom

Current sequencing of mRNA can provide estimates of the levels of individual isoforms within the cell, where isoforms are the different distinct mRNA products or proteins created by a gene. It remains to adapt many standard statistical methods commonly used for analyzing gene expression levels to take advantage of this additional information. One novel question is whether we can find groupings or clusters of samples that are distinguished not by their gene expression but by their isoform usage. Such clusters in tumors, for example, could be the result of shared disruption to the splicing system that creates the different isoforms. We propose a novel approach to clustering mRNA-Seq data that identifies clusters of samples with common isoform usage. We show via simulation that our methods are more sensitive to finding clusters of similar alternative splicing patterns than standard clustering techniques applied directly to the estimates of isoform levels. We further demonstrate that clustering on isoform usage is more accurate than clustering directly on isoform levels by examining real data that contains a technical artifact that resulted in different batches having different isoform usage patterns.

2021 ◽  
Author(s):  
Pavel V. Mazin ◽  
Philipp Khaitovich ◽  
Margarida Cardoso-Moreira ◽  
Henrik Kaessmann

AbstractAlternative splicing (AS) is pervasive in mammalian genomes, yet cross-species comparisons have been largely restricted to adult tissues and the functionality of most AS events remains unclear. We assessed AS patterns across pre- and postnatal development of seven organs in six mammals and a bird. Our analyses revealed that developmentally dynamic AS events, which are especially prevalent in the brain, are substantially more conserved than nondynamic ones. Cassette exons with increasing inclusion frequencies during development show the strongest signals of conserved and regulated AS. Newly emerged cassette exons are typically incorporated late in testis development, but those retained during evolution are predominantly brain specific. Our work suggests that an intricate interplay of programs controlling gene expression levels and AS is fundamental to organ development, especially for the brain and heart. In these regulatory networks, AS affords substantial functional diversification of genes through the generation of tissue- and time-specific isoforms from broadly expressed genes.


Agronomy ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 92
Author(s):  
Joon Seon Lee ◽  
Lexuan Gao ◽  
Laura Melissa Guzman ◽  
Loren H. Rieseberg

Approximately 10% of agricultural land is subject to periodic flooding, which reduces the growth, survivorship, and yield of most crops, reinforcing the need to understand and enhance flooding resistance in our crops. Here, we generated RNA-Seq data from leaf and root tissue of domesticated sunflower to explore differences in gene expression and alternative splicing (AS) between a resistant and susceptible cultivar under both flooding and control conditions and at three time points. Using a combination of mixed model and gene co-expression analyses, we were able to separate general responses of sunflower to flooding stress from those that contribute to the greater tolerance of the resistant line. Both cultivars responded to flooding stress by upregulating expression levels of known submergence responsive genes, such as alcohol dehydrogenases, and slowing metabolism-related activities. Differential AS reinforced expression differences, with reduced AS frequencies typically observed for genes with upregulated expression. Significant differences were found between the genotypes, including earlier and stronger upregulation of the alcohol fermentation pathway and a more rapid return to pre-flooding gene expression levels in the resistant genotype. Our results show how changes in the timing of gene expression following both the induction of flooding and release from flooding stress contribute to increased flooding tolerance.


2019 ◽  
Vol 28 (16) ◽  
pp. 2763-2774 ◽  
Author(s):  
Nicola Jeffery ◽  
Sarah Richardson ◽  
David Chambers ◽  
Noel G Morgan ◽  
Lorna W Harries

Abstract Changes to islet cell identity in response to type 2 diabetes (T2D) have been reported in rodent models, but are less well characterized in humans. We assessed the effects of aspects of the diabetic microenvironment on hormone staining, total gene expression, splicing regulation and the alternative splicing patterns of key genes in EndoC-βH1 human beta cells. Genes encoding islet hormones [somatostatin (SST), insulin (INS), Glucagon (GCG)], differentiation markers [Forkhead box O1 (FOXO1), Paired box 6, SRY box 9, NK6 Homeobox 1, NK6 Homeobox 2] and cell stress markers (DNA damage inducible transcript 3, FOXO1) were dysregulated in stressed EndoC-βH1 cells, as were some serine arginine rich splicing factor splicing activator and heterogeneous ribonucleoprotein particle inhibitor genes. Whole transcriptome analysis of primary T2D islets and matched controls demonstrated dysregulated splicing for ~25% of splicing events, of which genes themselves involved in messenger ribonucleic acid processing and regulation of gene expression comprised the largest group. Approximately 5% of EndoC-βH1 cells exposed to these factors gained SST positivity in vitro. An increased area of SST staining was also observed ex vivo in pancreas sections recovered at autopsy from donors with type 1 diabetes (T1D) or T2D (9.3% for T1D and 3% for T2D, respectively compared with 1% in controls). Removal of the stressful stimulus or treatment with the AKT Serine/Threonine kinase inhibitor SH-6 restored splicing factor expression and reversed both hormone staining effects and patterns of gene expression. This suggests that reversible changes in hormone expression may occur during exposure to diabetomimetic cellular stressors, which may be mediated by changes in splicing regulation.


2021 ◽  
Author(s):  
Philipp Weiler ◽  
Koen Van den Berge ◽  
Kelly Street ◽  
Simone Tiberi

Technological developments have led to an explosion of high-throughput single cell data, which are revealing unprecedented perspectives on cell identity. Recently, significant attention has focused on investigating, from single-cell RNA-sequencing (scRNA-seq) data, cellular dynamic processes, such as cell differentiation, cell cycle and cell (de)activation. Trajectory inference methods estimate a trajectory, a collection of differentiation paths of a dynamic system, by ordering cells along the paths of such a dynamic process. While trajectory inference tools typically work with gene expression levels, common scRNA-seq protocols allow the identification and quantification of unspliced pre-mRNAs and mature spliced mRNAs, for each gene. By exploiting the abundance of unspliced and spliced mRNA, one can infer the RNA velocity of individual cells, i.e., the time derivative of the gene expression state of cells. Whereas traditional trajectory inference methods reconstruct cellular dynamics given a population of cells of varying maturity, RNA velocity relies on a dynamical model describing splicing dynamics. Here, we initially discuss conceptual and theoretical aspects of both approaches, then illustrate how they can be combined together, and finally present an example use-case on real data.


2006 ◽  
Vol 04 (04) ◽  
pp. 911-993 ◽  
Author(s):  
HAIFENG LI ◽  
XIN CHEN ◽  
KESHU ZHANG ◽  
TAO JIANG

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising.


2005 ◽  
Vol 15 (04) ◽  
pp. 311-322 ◽  
Author(s):  
CARLA S. MÖLLER-LEVET ◽  
HUJUN YIN

In this paper a novel approach is introduced for modeling and clustering gene expression time-series. The radial basis function neural networks have been used to produce a generalized and smooth characterization of the expression time-series. A co-expression coefficient is defined to evaluate the similarities of the models based on their temporal shapes and the distribution of the time points. The profiles are grouped using a fuzzy clustering algorithm incorporated with the proposed co-expression coefficient metric. The results on artificial and real data are presented to illustrate the advantages of the metric and method in grouping temporal profiles. The proposed metric has also been compared with the commonly used correlation coefficient under the same procedures and the results show that the proposed method produces better biologicaly relevant clusters.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiangnan Xu ◽  
Samantha M. Solon-Biet ◽  
Alistair Senior ◽  
David Raubenheimer ◽  
Stephen J. Simpson ◽  
...  

Abstract Background Nutrigenomics aims at understanding the interaction between nutrition and gene information. Due to the complex interactions of nutrients and genes, their relationship exhibits non-linearity. One of the most effective and efficient methods to explore their relationship is the nutritional geometry framework which fits a response surface for the gene expression over two prespecified nutrition variables. However, when the number of nutrients involved is large, it is challenging to find combinations of informative nutrients with respect to a certain gene and to test whether the relationship is stronger than chance. Methods for identifying informative combinations are essential to understanding the relationship between nutrients and genes. Results We introduce Local Consistency Nutrition to Graphics (LC-N2G), a novel approach for ranking and identifying combinations of nutrients with gene expression. In LC-N2G, we first propose a model-free quantity called Local Consistency statistic to measure whether there is non-random relationship between combinations of nutrients and gene expression measurements based on (1) the similarity between samples in the nutrient space and (2) their difference in gene expression. Then combinations with small LC are selected and a permutation test is performed to evaluate their significance. Finally, the response surfaces are generated for the subset of significant relationships. Evaluation on simulated data and real data shows the LC-N2G can accurately find combinations that are correlated with gene expression. Conclusion The LC-N2G is practically powerful for identifying the informative nutrition variables correlated with gene expression. Therefore, LC-N2G is important in the area of nutrigenomics for understanding the relationship between nutrition and gene expression information.


2021 ◽  
Vol 22 (19) ◽  
pp. 10213
Author(s):  
Immanuel D. Green ◽  
Renjing Liu ◽  
Justin J. L. Wong

Vascular smooth muscle cells (VSMCs) display extraordinary phenotypic plasticity. This allows them to differentiate or dedifferentiate, depending on environmental cues. The ability to ‘switch’ between a quiescent contractile phenotype to a highly proliferative synthetic state renders VSMCs as primary mediators of vascular repair and remodelling. When their plasticity is pathological, it can lead to cardiovascular diseases such as atherosclerosis and restenosis. Coinciding with significant technological and conceptual innovations in RNA biology, there has been a growing focus on the role of alternative splicing in VSMC gene expression regulation. Herein, we review how alternative splicing and its regulatory factors are involved in generating protein diversity and altering gene expression levels in VSMC plasticity. Moreover, we explore how recent advancements in the development of splicing-modulating therapies may be applied to VSMC-related pathologies.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1925
Author(s):  
Kenton Ko ◽  
Jeremy Guenther ◽  
Nicholas Ostan ◽  
Joshua Powles

Background: Four distinct rhomboid genes appear to function in Arabidopsis plastids, two “active” types from the secretases and presenilin-like associated rhomboid-like (PARL) categories (At1g25290 and At5g25752) and two “inactive” rhomboid forms (At1g74130 and At1g74140).  The number of working rhomboids is further increased by alternative splicing, two reported for At1g25290 and three for At1g74130.  Since At1g25290 and At1g74130 exist as alternative splice variants, it would be necessary to assess the splicing patterns of the other two plastid rhomboid genes, At5g25752 and At1g74140, before studying the Arabidopsis plastid rhomboid system as a whole.   Methods: This study thus specifically focused on an analysis of the At1g74140 transcript population using various RT-PCR strategies.   Results: The exon mapping results indicate splicing patterns different from the close relative At1g74130, despite similarity between the exonic sequences.  The splicing patterns indicate a high level of sequence “discontinuity” in the At1g74140 transcript population with a significant portion of the discontinuity being generated by two regions of the gene.   Conclusion: The overall discontinuous splicing pattern of At1g74140 may be reflective of its mode of involvement in activities like controlling gene expression.


2020 ◽  
Vol 477 (16) ◽  
pp. 3091-3104 ◽  
Author(s):  
Luciana E. Giono ◽  
Alberto R. Kornblihtt

Gene expression is an intricately regulated process that is at the basis of cell differentiation, the maintenance of cell identity and the cellular responses to environmental changes. Alternative splicing, the process by which multiple functionally distinct transcripts are generated from a single gene, is one of the main mechanisms that contribute to expand the coding capacity of genomes and help explain the level of complexity achieved by higher organisms. Eukaryotic transcription is subject to multiple layers of regulation both intrinsic — such as promoter structure — and dynamic, allowing the cell to respond to internal and external signals. Similarly, alternative splicing choices are affected by all of these aspects, mainly through the regulation of transcription elongation, making it a regulatory knob on a par with the regulation of gene expression levels. This review aims to recapitulate some of the history and stepping-stones that led to the paradigms held today about transcription and splicing regulation, with major focus on transcription elongation and its effect on alternative splicing.


Sign in / Sign up

Export Citation Format

Share Document