FishExp: a comprehensive database and analysis platform for gene expression and alternative splicing of fish species

The publicly archived RNA-seq data has grown exponentially, but its valuable information has not yet been fully discovered and utilized, especially for alternative splicing. This is true for fish species, which play important roles in ecology, research, and the food industry. To mitigate this, we present FishExp, a web-based data platform covering gene expression and alternative splicing in 26,081 RNA-seq experiments from 44 fishes. In addition to searching by gene identifiers and symbols, FishExp allows users to query the data using various functional terms and BLAST alignment. Notably, the user can customize experiments and tools to perform differential/specific expression and alternative splicing analysis, provided with functional enrichments. The results of retrieval and analysis can be visualized on the gene-, transcript- and splicing event-level webpage in a highly interactive and intuitive manner. The manually curated sample information, uniform data processing and visualization tools make it efficient for users to gain new insights from these large datasets. All data in FishExp can be downloaded for more in-depth analysis. FishExp is freely accessible at https://bioinfo.njau.edu.cn/fishExp.

Download Full-text

FungiExp: A Comprehensive Platform For Exploring Fungal Gene Expression and Alternative Splicing Based On 35,821 RNA-Seq Experiments From 220 Fungi

10.21203/rs.3.rs-618004/v1 ◽

2021 ◽

Author(s):

Jinding Liu ◽

Fei Yin ◽

Kun Lang ◽

Wencai Jie ◽

Suxu Tan ◽

...

Keyword(s):

Gene Expression ◽

Alternative Splicing ◽

Sequence Similarity ◽

Expression Regulation ◽

Rna Seq ◽

Specific Expression ◽

Data Accessibility ◽

Wide Range ◽

Fungal Gene Expression ◽

Fungal Gene

Abstract Background: RNA-seq has become a standard tool in biology and has produced large and diverse transcriptomic datasets for users to explore fungal expression regulation. Fungal alternative splicing, which is attracting increasing attention because of evolutionary adaptations to changing external conditions has not been thoroughly investigated in previous studies, unlike that of animals and plants. However, the analyses of RNA-seq datasets are made difficult by the heterogeneity of study design and complex bioinformatics approaches. Comprehensive analyses of these published datasets should contribute new insights into fungal expression regulation.Results: We have developed a web-based platform called FungiExp hosting fungal gene expression levels and alternative splicing profiles in 35,821 curated RNA-seq experiments from 220 species. It allows users to perform retrieval via diverse terms and sequence similarity. Moreover, users can customize experimental groups to perform differential and specific expression analyses. The wide range of data visualization is an additional important feature that should help users intuitively understand retrieval and analysis results.Conclusions: With its uniform data processing, easy data accessibility, convenient retrieval, and analysis functions, FungiExp is a valuable resource and tool that allows users to (re)use published RNA-seq datasets. It is accessible at http://bioinfo.njau.edu.cn/fungiExp.

Download Full-text

Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias

BMC Genomics ◽

10.1186/s12864-021-07577-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shuhua Zhan ◽

Cortland Griswold ◽

Lewis Lukens

Keyword(s):

Gene Expression ◽

Zea Mays ◽

Reference Genome ◽

Transcript Abundance ◽

Gene Transcript ◽

Rna Seq ◽

Individual Genome ◽

Abundance Estimates ◽

Mapping Bias ◽

Quantify Gene Expression

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.

Download Full-text

ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data

Genome Biology ◽

10.1186/gb-2011-12-7-r69 ◽

2011 ◽

Vol 12 (7) ◽

Cited By ~ 27

Author(s):

Brad A Friedman ◽

Tom Maniatis

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Web Based ◽

Microarray Gene

Download Full-text

RASflow: An RNA-Seq Analysis Workflow with Snakemake

10.1101/839191 ◽

2019 ◽

Author(s):

Xiaokang Zhang ◽

Inge Jonassen

Keyword(s):

Gene Expression ◽

Management System ◽

Workflow Management ◽

Model Organisms ◽

Gene Transcript ◽

Rna Seq ◽

Public Data ◽

Wide Range ◽

Analysis Workflow ◽

Programming Skills

AbstractBackgroundWith the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene / transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.ResultsUtilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis pipeline: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. RASflow is an open source tool and source code as well as documentation, tutorials and example data sets can be found on GitHub: https://github.com/zhxiaokang/RASflowConclusionsRASflow is a simple and reliable RNA-Seq analysis workflow which is a full pack of RNA-Seq analysis.

Download Full-text

Mapping Tumor-Specific Expression QTLs in Impure Tumor Samples

10.1101/136614 ◽

2017 ◽

Cited By ~ 3

Author(s):

Douglas R. Wilson ◽

Wei Sun ◽

Joseph G. Ibrahim

Keyword(s):

Gene Expression ◽

Type I Error ◽

The Cancer Genome Atlas ◽

Type I ◽

Eqtl Mapping ◽

Rna Seq ◽

Specific Expression ◽

Normal Cells ◽

Technology Application ◽

Tumor Tissues

AbstractThe study of gene expression quantitative trait loci (eQTL) is an effective approach to illuminate the functional roles of genetic variants. Computational methods have been developed for eQTL mapping using gene expression data from microarray or RNA-seq technology. Application of these methods for eQTL mapping in tumor tissues is problematic because tumor tissues are composed of both tumor and infiltrating normal cells (e.g. immune cells) and eQTL effects may vary between tumor and infiltrating normal cells. To address this challenge, we have developed a new method for eQTL mapping using RNA-seq data from tumor samples. Our method separately estimates the eQTL effects in tumor and infiltrating normal cells using both total expression and allele-specific expression (ASE). We demonstrate that our method controls type I error rate and has higher power than some alternative approaches. We applied our method to study RNA-seq data from The Cancer Genome Atlas and illustrated the similarities and differences of eQTL effects in tumor and normal cells.

Download Full-text

Dysregulation of Splicing in Multiple Myeloma: The Splicing Factor SRSF1 Supports MM Cell Proliferation Via Splicing Control

Blood ◽

10.1182/blood-2018-99-118845 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 4500-4500

Author(s):

Mariateresa Fulciniti ◽

Michael A Lopez ◽

Anil Aktas Samur ◽

Eugenio Morelli ◽

Hervé Avet-Loiseau ◽

...

Keyword(s):

Gene Expression ◽

Multiple Myeloma ◽

Alternative Splicing ◽

Board Of Directors ◽

Research Funding ◽

Splicing Factor ◽

Splicing Factors ◽

Rna Seq ◽

Advisory Committees ◽

Disease Biology

Abstract Gene expression profile has provided interesting insights into the disease biology, helped develop new risk stratification, and identify novel druggable targets in multiple myeloma (MM). However, there is significant impact of alternative pre-mRNA splicing (AS) as one of the key transcriptome modifier. These spliced variants increases the transcriptomic complexity and its misregulation affect disease behavior impacting therapeutic consideration in various disease processes including cancer. Our large well annotated deep RNA sequencing data from purified MM cells data from 420 newly-diagnosed patients treated homogeneously have identified 1534 genes with one or more splicing events observed in at least 10% or more patients. Median alternative splicing event per patient was 595 (range 223 - 2735). These observed global alternative splicing events in MM involves aberrant splicing of critical growth and survival genes affects the disease biology as well as overall survival. Moreover, the decrease of cell viability observed in a large panel of MM cell lines after inhibition of splicing at the pre-mRNA complex and stalling at the A complex confirmed that MM cells are exquisitely sensitive to pharmacological inhibition of splicing. Based on these data, we further focused on understanding the molecular mechanisms driving aberrant alternative splicing in MM. An increasing body of evidence indicates that altered expression of regulatory splicing factors (SF) can have oncogenic properties by impacting AS of cancer-associated genes. We used our large RNA-seq dataset to create a genome wide global alterations map of SF and identified several splicing factors significantly dysregulated in MM compared to normal plasma cells with impact on clinical outcome. The splicing factor Serine and Arginine Rich Splicing Factor 1 (SRSF1), regulating initiation of spliceosome assembly, was selected for further evaluation, as its impact on clinical outcome was confirmed in two additional independent myeloma datasets. In gain-of (GOF) studies enforced expression of SRSF1 in MM cells significantly increased proliferation, especially in the presence of bone marrow stromal cells; and conversely, in loss-of function (LOF) studies, downregulation of SRSF1, using stable or doxy-inducible shRNA systems significantly inhibited MM cell proliferation and survival over time. We utilized SRSF1 mutants to dissect the mechanisms involved in the SRSF1-mediated MM growth induction, and observed that the growth promoting effect of SRSF1 in MM cells was mainly due to its splicing activity. We next investigated the impact of SRSF1 on allelic isoforms of specific gene targets by RNA-seq in LOF and confirmed in GOF studies. Splicing profiles showed widespread changes in AS induced by SRSF1 knock down. The most recurrent splicing events were skipped exon (SE) and alternative first (AF) exon splicing as compared to control cells. SE splice events were primarily upregulated and AF splice events were evenly upregulated and downregulated. Genes in which splicing events in these categories occurred mostly did not show significant difference in overall gene expression level when compared to control, following SRSF1 depletion. When analyzing cellular functions of SRSF1-regulated splicing events, we found that SRSF1 knock down affects genes in the RNA processing pathway as well as genes involved in cancer-related functions such as mTOR and MYC-related pathways. Splicing analysis was corroborated with immunoprecipitation (IP) followed by mass spectrometry (MS) analysis of T7-tagged SRSF1 MM cells. We have observed increased levels of SRSF phosphorylation, which regulates it's subcellular localization and activity, in MM cell lines and primary patient MM cells compared to normal donor PBMCs. Moreover, we evaluated the chemical compound TG003, an inhibitor of Cdc2-like kinase (CLK) 1 and 4 that regulate splicing by fine-tuning the phosphorylation of SR proteins. Treatment with TG003 decreased SRSF1 phosphorylation preventing the spliceosome assembly and inducing a dose dependent inhibition of MM cell viability. In conclusions, here we provide mechanistic insights into myeloma-related splicing dysregulation and establish SRSF1 as a tumor promoting gene with therapeutic potential. Disclosures Avet-Loiseau: Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees; Celgene: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Sanofi: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Abbvie: Membership on an entity's Board of Directors or advisory committees; Amgen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding. Munshi:OncoPep: Other: Board of director.

Download Full-text

A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders

10.1101/2020.05.28.121483 ◽

2020 ◽

Author(s):

Abolfazl Doostparast Torshizi ◽

Jubao Duan ◽

Kai Wang

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Complex Diseases ◽

Specific Gene ◽

Cellular Composition ◽

Rna Seq ◽

Cell Type ◽

Specific Expression ◽

Cell Type Specific Expression ◽

Cell Type Specific

AbstractThe importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, the vast majority of gene expression studies are conducted on bulk tissues, necessitating computational approaches to infer biological insights on cell type-specific contribution to diseases. Several computational methods are available for cell type deconvolution (that is, inference of cellular composition) from bulk RNA-Seq data, but cannot impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq (scRNA-seq) and population-wide expression profiles, it can be a computationally tractable and identifiable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations by employing genome-wide tissue-wise expression signatures from GTEx to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations, and uses a multi-variate stochastic search algorithm to estimate the expression level of each gene in each cell type. Extensive analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease, and type 2 diabetes validated efficiency of CellR, while revealing how specific cell types contribute to different diseases. We conducted numerical simulations on human cerebellum to generate pseudo-bulk RNA-seq data and demonstrated its efficiency in inferring cell-specific expression profiles. Moreover, we inferred cell-specific expression levels from bulk RNA-seq data on schizophrenia and computed differentially expressed genes within certain cell types. Using predicted gene expression profile on excitatory neurons, we were able to reproduce our recently published findings on TCF4 being a master regulator in schizophrenia and showed how this gene and its targets are enriched in excitatory neurons. In summary, CellR compares favorably (both accuracy and stability of inference) against competing approaches on inferring cellular composition from bulk RNA-seq data, but also allows direct imputation of cell type-specific gene expression, opening new doors to re-analyze gene expression data on bulk tissues in complex diseases.

Download Full-text

Alternative Splicing Is a Frequent Event and Impacts Clinical Outcome in Myeloma: A Large RNA-Seq Data Analysis of Newly-Diagnosed Myeloma Patients

Blood ◽

10.1182/blood.v124.21.638.638 ◽

2014 ◽

Vol 124 (21) ◽

pp. 638-638 ◽

Cited By ~ 2

Author(s):

Naim Rashid ◽

Stephane Minvielle ◽

Florence Magrangeas ◽

Mehmet Kemal Samur ◽

Alice Clynen ◽

...

Keyword(s):

Gene Expression ◽

Alternative Splicing ◽

Exon Skipping ◽

P Value ◽

Newly Diagnosed ◽

Rna Seq ◽

Exon Level ◽

Total Gene Expression ◽

Isoform Switching ◽

Alternative Splicing Events

Abstract Alternative splicing is an important post-translational change that alters gene function. Misregulation of alternative splicing has been implicated in number of disease processes including cancer. Here we have analyzed alternative splicing in myeloma using high throughput RNA-seq. Our analytic pipeline for RNA-seq data used in this investigation not only provides information on expression levels for genes, but also provides information on the expression of known splice variants of genes (isoforms), and can identify novel exon level events across individuals (i.e. exon skipping events). We conducted a study of 328 newly-diagnosed patients with multiple myeloma treated homogeneously with novel agent combination containting lenalidomide, bortezomib and dexamethsone with or without high-dose melphalan followed by lenalidomide maintenance in the IFM/DFCI study. RNA isolated from purified CD138+ MM cells collected at the time of diagnosis and from 18 normal donor plasma cells were processed by RNA-seq (100 million paired end reads on Illumina HiSeq) and analyzed using a custom computational and statistical pipeline. Following read alignment to hg19, we utilized RSEM to quantify both gene-level and isoform-level expression of known ENSEMBL transcripts. We then implemented a novel testing approach based on compositional regression to discover genes that show significant isoform switching between the 328 MM samples and 18 Normal Plasma Cell (NPC) samples from healthy donors. Using various programs and their modifications, we also identified novel alternative splicing events, such as exon skipping and mutually exclusive exon usage, among others. Patient data for MM characteristics, cytogenetic and FISH as well as clinical survival outcomes were also analyzed and correlated with genomic data. We observed over 600 genes showing significant changes in relative isoform abundances (isoform switching) between MM and normal samples. A number of previously characterized genes including MYCL1 (adj. p = 0.0014) and CCND3 (adj. p = 0.0013), and MAP kinase-related genes (MAP3K8, MAPKAPK2, MAPKAPK3, MAP4K4) exhibited significant isoform switching compared to normal, in addition to some not well characterized genes. Genes showing the greatest magnitude of isoform switching include MEFV (adj. p = 2.7 x 10-5), showing a two fold change in the relative major isoform abundance compared to normal, and has been previously shown to have a role in lymphoid neoplasms. We applied hierarchical clustering to the isoforms showing significant changes in isoform-switching and identified 4 distinct clusters, which are currently being investigated for correlation with clinical subtypes of MM. Exon level analyses of alternative splicing events, such as exon skipping, are currently underway. Clinical data including MM characteristics, cytogenetics, FISH and survival outcomes was available for a subset of 265 patients. We found that 109 genes showed significant isoform switching between t(4;14) and non-t(4;14) patients, such as CD44 (adj. p =1.8 x 10-6) and WHSC1 (adj. p =5.1 x 10-28). Comparing del17p (28 in total) and non del17p patients, we found no significant splicing changes after multiple testing adjustment. Of these genes, only a subset (40%) were shown to be differentially expressed in terms of total gene expression, suggesting the importance of examining alternative splicing events in addition to total gene expression. With respect to treatment response, we compared the expression of gene isoforms between patients achieving complete response (CR) versus others and identified 38 isoforms associated with response to treatment (adj. p value < 0.05), with SEPT9, SLC2A5, and UBX6 having the strongest associations (adj. p-value < 3 x 10-4). Using a univariate cox regression model, 4 spliced isoforms relating to 3 genes were identified as having significant correlation with event-free survival (EFS) (FDR-adjusted cox p value < 0.05). We are in the process of now integrating the gene expression data with altered splicing data to develop an integrated survival model. In summary, this study highlights the significant frequency, biological and clinical importance of alternative splicing in MM and points to the need for evaluation of not only the expression level of genes but also post-translational modifications. The genes identified here are important targets for therapy as well as possible immune modulation. Disclosures Moreau: Celgene Corporation: Honoraria, Membership on an entity's Board of Directors or advisory committees.

Download Full-text

Identification of Cis-Regulatory Sequences Controlling Pollen-Specific Expression of Hydroxyproline-Rich Glycoprotein Genes in Arabidopsis thaliana

Plants ◽

10.3390/plants9121751 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1751

Author(s):

Yichao Li ◽

Maxwell Mullin ◽

Yingnan Zhang ◽

Frank Drews ◽

Lonnie R. Welch ◽

...

Keyword(s):

Gene Expression ◽

Arabidopsis Thaliana ◽

Motif Discovery ◽

Plant Cell Wall ◽

Integrative Analysis ◽

Regulatory Sequences ◽

Rna Seq ◽

Specific Expression ◽

Promoter Sequences ◽

Regulatory Motifs

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.

Download Full-text