Ranked Choice Voting for Representative Transcripts with TRaCE

Mapping Intimacies ◽

10.1101/2020.12.15.422742 ◽

2020 ◽

Author(s):

Andrew J Olson ◽

Doreen Ware

Keyword(s):

Expression Data ◽

Rna Seq ◽

Protein Coding ◽

Alternative Transcripts ◽

Multiple Transcripts ◽

Protein Length ◽

Expression Atlas ◽

Per Gene ◽

Transcript Evidence ◽

Annotate Protein

Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an 'election' in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts.

Download Full-text

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Nucleic Acids Research ◽

10.1093/nar/gkt1300 ◽

2013 ◽

Vol 42 (5) ◽

pp. 2820-2832 ◽

Cited By ~ 14

Author(s):

Nicolas Philippe ◽

Elias Bou Samra ◽

Anthony Boureux ◽

Alban Mancheron ◽

Florence Rufflé ◽

...

Keyword(s):

Human Genome ◽

Rna Sequencing ◽

Dynamic Range ◽

Tiling Array ◽

Expression Data ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Protein Coding ◽

Protein Coding Genes

Abstract Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.

Download Full-text

The mouse Gene Expression Database (GXD): 2021 update

Nucleic Acids Research ◽

10.1093/nar/gkaa914 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D924-D931 ◽

Cited By ~ 1

Author(s):

Richard M Baldarelli ◽

Constance M Smith ◽

Jacqueline H Finger ◽

Terry F Hayamizu ◽

Ingeborg J McCright ◽

...

Keyword(s):

Gene Expression ◽

Large Scale ◽

Mouse Genome Informatics ◽

Expression Data ◽

Rna Seq ◽

Gene Expression Database ◽

Heat Map ◽

Developmental Gene Expression ◽

Related Information ◽

Expression Atlas

Abstract The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental gene expression information. For many years, GXD has collected and integrated data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot, and western blot experiments through curation of the scientific literature and by collaborations with large-scale expression projects. Since our last report in 2019, we have continued to acquire these classical types of expression data; developed a searchable index of RNA-Seq and microarray experiments that allows users to quickly and reliably find specific mouse expression studies in ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) and GEO (https://www.ncbi.nlm.nih.gov/geo/); and expanded GXD to include RNA-Seq data. Uniformly processed RNA-Seq data are imported from the EBI Expression Atlas and then integrated with the other types of expression data in GXD, and with the genetic, functional, phenotypic and disease-related information in Mouse Genome Informatics (MGI). This integration has made the RNA-Seq data accessible via GXD’s enhanced searching and filtering capabilities. Further, we have embedded the Morpheus heat map utility into the GXD user interface to provide additional tools for display and analysis of RNA-Seq data, including heat map visualization, sorting, filtering, hierarchical clustering, nearest neighbors analysis and visual enrichment.

Download Full-text

Uncertainty in RNA-seq gene expression data

10.1101/445601 ◽

2018 ◽

Author(s):

Sonali Arora ◽

Siobhan S. Pattwell ◽

Eric C. Holland ◽

Hamid Bolouri

Keyword(s):

Human Tumor ◽

Expression Data ◽

Rna Seq ◽

Sequencing Data ◽

Protein Coding ◽

Disease Biomarkers ◽

Normal Tissues ◽

Protein Coding Genes ◽

Abundance Estimates ◽

Using Data

RNA-sequencing data is widely used to identify disease biomarkers and therapeutic targets. Here, using data from five RNA-seq processing pipelines applied to 6,690 human tumor and normal tissues, we show that for >12% of protein-coding genes, in at least 1% of samples, current best-in-class RNA-seq processing pipelines differ in their abundance estimates by more than four-fold using the same samples and the same set of RNA-seq reads, raising clinical concern.

Download Full-text

Integrated modeling of protein-coding genes in theManduca sextagenome using RNA-seq data from the biochemical model insect

10.1603/ice.2016.110841 ◽

2016 ◽

Cited By ~ 1

Author(s):

Xiaolong Cao

Keyword(s):

Integrated Modeling ◽

Rna Seq ◽

Protein Coding ◽

Protein Coding Genes ◽

Biochemical Model

Download Full-text

Reprogramming mRNA Expression in Response to Defect in RNA Polymerase III Assembly in the Yeast Saccharomyces cerevisiae

International Journal of Molecular Sciences ◽

10.3390/ijms22147298 ◽

2021 ◽

Vol 22 (14) ◽

pp. 7298

Author(s):

Izabela Rudzińska ◽

Małgorzata Cieśla ◽

Tomasz W. Turowski ◽

Alicja Armatowska ◽

Ewa Leśniewska ◽

...

Keyword(s):

Mrna Expression ◽

Ribosome Biogenesis ◽

Rna Polymerase Iii ◽

Mrna Levels ◽

Rna Seq ◽

Yeast Saccharomyces Cerevisiae ◽

Protein Coding ◽

General Transcription Factor ◽

Pol I ◽

Pol Iii

The coordinated transcription of the genome is the fundamental mechanism in molecular biology. Transcription in eukaryotes is carried out by three main RNA polymerases: Pol I, II, and III. One basic problem is how a decrease in tRNA levels, by downregulating Pol III efficiency, influences the expression pattern of protein-coding genes. The purpose of this study was to determine the mRNA levels in the yeast mutant rpc128-1007 and its overdose suppressors, RBS1 and PRT1. The rpc128-1007 mutant prevents assembly of the Pol III complex and functionally mimics similar mutations in human Pol III, which cause hypomyelinating leukodystrophies. We applied RNAseq followed by the hierarchical clustering of our complete RNA-seq transcriptome and functional analysis of genes from the clusters. mRNA upregulation in rpc128-1007 cells was generally stronger than downregulation. The observed induction of mRNA expression was mostly indirect and resulted from the derepression of general transcription factor Gcn4, differently modulated by suppressor genes. rpc128-1007 mutation, regardless of the presence of suppressors, also resulted in a weak increase in the expression of ribosome biogenesis genes. mRNA genes that were downregulated by the reduction of Pol III assembly comprise the proteasome complex. In summary, our results provide the regulatory links affected by Pol III assembly that contribute differently to cellular fitness.

Download Full-text

Differential Expression of BARD1 Isoforms in Melanoma

Genes ◽

10.3390/genes12020320 ◽

2021 ◽

Vol 12 (2) ◽

pp. 320

Author(s):

Lorissa I. McDougall ◽

Ryan M. Powell ◽

Magdalena Ratajska ◽

Chi F. Lynch-Sutherland ◽

Sultana Mehbuba Hossain ◽

...

Keyword(s):

Long Range ◽

Patient Outcomes ◽

Splice Variants ◽

Significant Proportion ◽

Nanopore Sequencing ◽

Rna Seq ◽

Splice Isoforms ◽

Tissue Samples ◽

Multiple Transcripts ◽

Mrna Variants

Melanoma comprises <5% of cutaneous malignancies, yet it causes a significant proportion of skin cancer-related deaths worldwide. While new therapies for melanoma have been developed, not all patients respond well. Thus, further research is required to better predict patient outcomes. Using long-range nanopore sequencing, RT-qPCR, and RNA sequencing analyses, we examined the transcription of BARD1 splice isoforms in melanoma cell lines and patient tissue samples. Seventy-six BARD1 mRNA variants were identified in total, with several previously characterised isoforms (γ, φ, δ, ε, and η) contributing to a large proportion of the expressed transcripts. In addition, we identified four novel splice events, namely, Δ(E3_E9), ▼(i8), IVS10+131▼46, and IVS10▼176, occurring in various combinations in multiple transcripts. We found that short-read RNA-Seq analyses were limited in their ability to predict isoforms containing multiple non-contiguous splicing events, as compared to long-range nanopore sequencing. These studies suggest that further investigations into the functional significance of the identified BARD1 splice variants in melanoma are warranted.

Download Full-text

Non-Coding RNA Signatures of B-Cell Acute Lymphoblastic Leukemia

International Journal of Molecular Sciences ◽

10.3390/ijms22052683 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2683

Author(s):

Princess D. Rodriguez ◽

Hana Paculova ◽

Sophie Kogut ◽

Jessica Heath ◽

Hilde Schjerven ◽

...

Keyword(s):

Acute Lymphoblastic Leukemia ◽

B Cell ◽

Molecular Mechanisms ◽

Lymphoblastic Leukemia ◽

Transcriptome Profiling ◽

Rna Seq ◽

Protein Coding ◽

Non Coding Rna ◽

Technological Developments ◽

Cell Acute Lymphoblastic Leukemia

Non-coding RNAs (ncRNAs) comprise a diverse class of non-protein coding transcripts that regulate critical cellular processes associated with cancer. Advances in RNA-sequencing (RNA-Seq) have led to the characterization of non-coding RNA expression across different types of human cancers. Through comprehensive RNA-Seq profiling, a growing number of studies demonstrate that ncRNAs, including long non-coding RNA (lncRNAs) and microRNAs (miRNA), play central roles in progenitor B-cell acute lymphoblastic leukemia (B-ALL) pathogenesis. Furthermore, due to their central roles in cellular homeostasis and their potential as biomarkers, the study of ncRNAs continues to provide new insight into the molecular mechanisms of B-ALL. This article reviews the ncRNA signatures reported for all B-ALL subtypes, focusing on technological developments in transcriptome profiling and recently discovered examples of ncRNAs with biologic and therapeutic relevance in B-ALL.

Download Full-text

Meta-analysis of RNA-seq expression data across species, tissues and studies

Genome Biology ◽

10.1186/s13059-015-0853-4 ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 64

Author(s):

Peter H. Sudmant ◽

Maria S. Alexis ◽

Christopher B. Burge

Keyword(s):

Meta Analysis ◽

Expression Data ◽

Rna Seq

Download Full-text

Annotation of snoRNA abundance across human tissues reveals complex snoRNA-host gene relationships

Genome Biology ◽

10.1186/s13059-021-02391-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Étienne Fafard-Couture ◽

Danny Bergeron ◽

Sonia Couture ◽

Sherif Abou-Elela ◽

Michelle S. Scott

Keyword(s):

Housekeeping Genes ◽

Host Gene ◽

Rna Modification ◽

Human Tissues ◽

Rna Seq ◽

Healthy Human ◽

Protein Coding ◽

Conservation Level ◽

Nucleolar Rnas ◽

Host Genes

Abstract Background Small nucleolar RNAs (snoRNAs) are mid-size non-coding RNAs required for ribosomal RNA modification, implying a ubiquitous tissue distribution linked to ribosome synthesis. However, increasing numbers of studies identify extra-ribosomal roles of snoRNAs in modulating gene expression, suggesting more complex snoRNA abundance patterns. Therefore, there is a great need for mapping the snoRNome in different human tissues as the blueprint for snoRNA functions. Results We used a low structure bias RNA-Seq approach to accurately quantify snoRNAs and compare them to the entire transcriptome in seven healthy human tissues (breast, ovary, prostate, testis, skeletal muscle, liver, and brain). We identify 475 expressed snoRNAs categorized in two abundance classes that differ significantly in their function, conservation level, and correlation with their host gene: 390 snoRNAs are uniformly expressed and 85 are enriched in the brain or reproductive tissues. Most tissue-enriched snoRNAs are embedded in lncRNAs and display strong correlation of abundance with them, whereas uniformly expressed snoRNAs are mostly embedded in protein-coding host genes and are mainly non- or anticorrelated with them. Fifty-nine percent of the non-correlated or anticorrelated protein-coding host gene/snoRNA pairs feature dual-initiation promoters, compared to only 16% of the correlated non-coding host gene/snoRNA pairs. Conclusions Our results demonstrate that snoRNAs are not a single homogeneous group of housekeeping genes but include highly regulated tissue-enriched RNAs. Indeed, our work indicates that the architecture of snoRNA host genes varies to uncouple the host and snoRNA expressions in order to meet the different snoRNA abundance levels and functional needs of human tissues.

Download Full-text