Phenotype-tissue expression and exploration (PTEE) resource facilitates the choice of tissue for RNA-seq-based clinical genetics studies

Abstract Background RNA-seq emerges as a valuable method for clinical genetics. The transcriptome is “dynamic” and tissue-specific, but typically the probed tissues to analyze (TA) are different from the tissue of interest (TI) based on pathophysiology. Results We developed Phenotype-Tissue Expression and Exploration (PTEE), a tool to facilitate the decision about the most suitable TA for RNA-seq. We integrated phenotype-annotated genes, used 54 tissues from GTEx to perform correlation analyses and identify expressed genes and transcripts between TAs and TIs. We identified skeletal muscle as the most appropriate TA to inquire for cardiac arrhythmia genes and skin as a good proxy to study neurodevelopmental disorders. We also explored RNA-seq limitations and show that on-off switching of gene expression during ontogenesis or circadian rhythm can cause blind spots for RNA-seq-based analyses. Conclusions PTEE aids the identification of tissues suitable for RNA-seq for a given pathology to increase the success rate of diagnosis and gene discovery. PTEE is freely available at https://bioinf.eva.mpg.de/PTEE/

Download Full-text

Single-nucleus RNA-seq and FISH identify coordinated transcriptional activity in mammalian myofibers

Nature Communications ◽

10.1038/s41467-020-18789-8 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Matthieu Dos Santos ◽

Stéphanie Backer ◽

Benjamin Saintpierre ◽

Brigitte Izac ◽

Muriel Andrieu ◽

...

Keyword(s):

Gene Expression ◽

Skeletal Muscle ◽

Transcription Factors ◽

Neuromuscular Junction ◽

Heavy Chain ◽

Transcriptional Activity ◽

Rna Seq ◽

Hybrid Fibers ◽

Adult Mice ◽

Single Nucleus

Abstract Skeletal muscle fibers are large syncytia but it is currently unknown whether gene expression is coordinately regulated in their numerous nuclei. Here we show by snRNA-seq and snATAC-seq that slow, fast, myotendinous and neuromuscular junction myonuclei each have different transcriptional programs, associated with distinct chromatin states and combinations of transcription factors. In adult mice, identified myofiber types predominantly express either a slow or one of the three fast isoforms of Myosin heavy chain (MYH) proteins, while a small number of hybrid fibers can express more than one MYH. By snRNA-seq and FISH, we show that the majority of myonuclei within a myofiber are synchronized, coordinately expressing only one fast Myh isoform with a preferential panel of muscle-specific genes. Importantly, this coordination of expression occurs early during post-natal development and depends on innervation. These findings highlight a previously undefined mechanism of coordination of gene expression in a syncytium.

Download Full-text

RNA sequencing analysis for profiling activation of cancer-associated molecular pathways.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e13032 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e13032-e13032 ◽

Cited By ~ 2

Author(s):

Anton Buzdin ◽

Andrew Garazha ◽

Maxim Sorokin ◽

Alex Glusker ◽

Alexey Aleshin ◽

...

Keyword(s):

Gene Expression ◽

Original Data ◽

Tissue Expression ◽

Molecular Pathways ◽

Sequencing Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Healthy Human ◽

Tissue Samples ◽

Normal Tissues

e13032 Background: Intracellular molecular pathways (IMPs) control all major events in the living cell. They are considered hotspots in contemporary oncology because knowledge of IMPs activation is essential for understanding mechanisms of molecular pathogenesis in oncology. Profiling IMPs requires RNA-seq data for tumors and for a collection of reference normal tissues. However, there is a shortage now in such profiles for normal tissues from healthy human donors, uniformly profiled in a single series of experiments. Access to the largest dataset of normal profiles GTEx is only partly available through the dbGaP. In TCGA database, norms are adjacent to surgically removed tumors and may be affected by tumor-linked growth factors, inflammation and altered vascularization. ENCODE datasets were for the autopsies of normal tissues, but they can’t form statistically significant reference groups. Methods: Tissue samples representing 20 organs were taken from post-mortal human healthy donors killed in road accidents no later than 36 hours after death, blood samples were taken from healthy volunteers. Gene expression was profiled in RNA-seq experiments using the same reagents, equipment and protocols. Bioinformatic algorithms for IMP analysis were developed and validated using experimental and public gene expression datasets. Results: From original sequencing data we constructed the biggest fully open reference expression database of normal human tissues including 465 profiles termed Oncobox Atlas of Normal Tissue Expression (ANTE, original data: GSE120795). We next developed a method termed Oncobox for interrogating activation of IMPs in human cancers. It includes modules of expression data harmonization and comparison and an algorithm for automatic annotation of molecular pathways. The Oncobox system enables accurate scoring of thousands molecular pathways using RNA-seq data. Oncobox pathway analysis is also applicable for quantitative proteomics and microRNA data in oncology. Conclusions: The Oncobox system can be used for a plethora of applications in cancer research including finding differentially regulated genes and IMPs, and for discovery of new pathway-related diagnostic and prognostic biomarkers.

Download Full-text

Basal Contamination of Sequencing: Lessons from the GTEx dataset

10.1101/602367 ◽

2019 ◽

Author(s):

Tim O. Nieuwenhuis ◽

Stephanie Yang ◽

Rohan X. Verma ◽

Vamsee Pillalamarri ◽

Dan E. Arking ◽

...

Keyword(s):

Gene Expression ◽

Tissue Expression ◽

Rna Seq ◽

Data Set ◽

Low Level ◽

Allelic Differences ◽

Highly Expressed Genes ◽

Next Generation Sequencing Ngs ◽

Cell Data ◽

Generation Sequencing

AbstractOne of the challenges of next generation sequencing (NGS) is read contamination. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, to understand the factors that contribute to contamination. We obtained GTEx datasets and technical metadata and validating RNA-Seq from other studies. Of 48 analyzed tissues in GTEx, 26 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicated contamination. Sample contamination by non-native genes was associated with a sample being sequenced on the same day as a tissue that natively expressed those genes. This was highly significant for pancreas and esophagus genes (linear model, p=9.5e-237 and p=5e-260 respectively). Nine SNPs in four genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes, validating the contamination. Low-level contamination affected 4,497 (39.6%) samples (defined as 10 PRSS1 TPM). It also led ≥ to eQTL assignments in inappropriate tissues among these 18 genes. We note this type of contamination occurs widely, impacting bulk and single cell data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses. Awareness of this process is necessary to avoid assigning inaccurate importance to low-level gene expression in inappropriate tissues and cells.

Download Full-text

PL-008 Adaptation of skeletal muscle to aerobic exercise: specific transcriptome response to acute exercise and training

Exercise Biochemistry Review ◽

10.14428/ebr.v1i1.8113 ◽

2018 ◽

Vol 1 (1) ◽

Author(s):

Daniil Popov ◽

Pavel Makhnovskii ◽

Evgeny Lysenko ◽

Olga Vinogradova

Keyword(s):

Gene Expression ◽

Skeletal Muscle ◽

Circadian Rhythm ◽

Human Skeletal Muscle ◽

Acute Exercise ◽

Aerobic Training ◽

Contractile Activity ◽

Training Programme ◽

Specific Response ◽

Transcriptome Response

Objective Variety of processes including circadian rhythm and systemic factors affect expression of many genes in skeletal muscle during a day. Therefore, post-exercise gene expression depends on many factors: contractile activity per seas well as circadian rhythm, nerve activity, concentration of different substances in blood, feeding and fasting. In our study, we investigated specific for contractile activity changes in the transcriptome in untrained and trained (after an aerobic training programme) human skeletal muscle. The second goal was to examine effect of aerobic training on gene expression in muscle in basal state. Methods Seven untrained males performed the one-legged knee extension exercise (for 60 min) with the same relative intensity before and after a 2 month aerobic training programme (1 h/day, 5/week). Biopsy samples were taken at rest (basal state, 48 h after the previous exercise), 1 and 4 h after one-legged exercise from m. vastus lateralisof either leg. This approach allowed us to evaluate specific changes in the transcriptome associated with contractile activity. RNAsequencing (84 samples in total; ~42 million reads/sample) was performed by HiSeq 2500 (Illumina). Results Two months aerobic training increased the aerobic capacity of the knee-extensor muscles (power at anaerobic threshold in incremental one-legged and cycling tests), the maximum rate of ADP-stimulated mitochondrial respiration in permeabilized muscle fibres and amounts of oxidative phosphorylation proteins. After one-legged exercise, expression of many genes was changed in exercised muscle (~1500) as well as in non-exercised muscle (~400). Pronounced changes in gene expression in non-exercised muscle may be associated with many factors, including circadian rhythm (result of GO analysis). To examine transcriptome changes specific for contractile activity, the difference in gene expression between legs was examined. In untrained muscle, one-legged exercise changed expression of ~1200 genes specific for contractile activity at each time point. Despite the same relative intensity of one-legged exercise, transcriptomic response in trained muscle was markedly lower (~300 genes) compare to untrained. We observed a strong overlap between transcriptomic responses (~250 genes) and particularly between enriched transcription factor binding sites in promoters of these genes in untrained and trained muscles. These sets of genes and transcription factors play the key role in adaptation of muscle to contractile activity independently on the level of muscular fitness. Surprisingly, 2 months aerobic training changed the expression of more than 1500 genes in basal state. Noteworthy, these genes demonstrated a small overlap (~200 genes) with genes related to specific response to acute exercise. Moreover, these genes were associated with significantly different biological processes than genes related to specific response to acute exercise. Conclusions Specific for contractile activity changes in the transcriptome in untrained and trained human skeletal muscle were revealed for the first time. After 2 month aerobic training, the specific transcriptome response to acute exercise become much less pronounced. A computational approach reveals common transcription factors important for adaptation of both untrained and trained muscle. We found out that adaptation of muscle to aerobic training associates not only with the transitory changes in gene expression after each exercise, but also with the marked changes in transcriptome in basal state. This work was supported by the Russian Science Foundation (141500768).

Download Full-text

Genetic Variants Associated mRNA Stability in Lung

10.21203/rs.3.rs-770241/v1 ◽

2021 ◽

Author(s):

Jian-Rong Li ◽

Mabel Tang ◽

Yafang Li ◽

Christopher I Amos ◽

Chao Cheng

Keyword(s):

Gene Expression ◽

Mrna Stability ◽

Genetic Variants ◽

Molecular Mechanisms ◽

Rna Binding ◽

Tissue Expression ◽

Rna Seq ◽

Genetic Traits ◽

Expression Levels ◽

Gene Expression Levels

Abstract Background: Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs).Results: Here, we presented a computational framework that take the advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3,942 genes and 186,132 eQTLs for 4,751 genes from 15,122,700 genetic variants for 13,476 genes, respectively. Interesting, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels.

Download Full-text

RNA sequencing (RNA-seq) analysis of gene expression provides new insights into hindlimb unloading-induced skeletal muscle atrophy

Annals of Translational Medicine ◽

10.21037/atm-20-7400 ◽

2020 ◽

Vol 8 (23) ◽

pp. 1595-1595

Author(s):

Qihao Cui ◽

Hua Yang ◽

Yuming Gu ◽

Chenyu Zong ◽

Xin Chen ◽

...

Keyword(s):

Gene Expression ◽

Skeletal Muscle ◽

Rna Sequencing ◽

Muscle Atrophy ◽

Hindlimb Unloading ◽

Skeletal Muscle Atrophy ◽

Rna Seq

Download Full-text

Mechanical Stress Affects Circadian Rhythm in Skeletal Muscle (C2C12 Myoblasts) by Reducing Per/Cry Gene Expression and Increasing Bmal1 Gene Expression

Medical Science Monitor ◽

10.12659/msm.928359 ◽

2021 ◽

Vol 27 ◽

Author(s):

Mengjia Wang ◽

Da Yu ◽

Lichun Zheng ◽

Bing Hong ◽

Houxuan Li ◽

...

Keyword(s):

Gene Expression ◽

Skeletal Muscle ◽

Circadian Rhythm ◽

Mechanical Stress ◽

C2c12 Myoblasts ◽

Cry Gene

Download Full-text

Normalization of Gene Expression Data Revisited: The Three Viewpoints of the Transcriptome in Human Skeletal Muscle Undergoing Load-induced Hypertrophy and Why They Matter

10.21203/rs.3.rs-1008326/v1 ◽

2021 ◽

Author(s):

Yusuf Khan ◽

Daniel Hammarström ◽

Stian Ellefsen ◽

Rafi Ahmad

Keyword(s):

Gene Expression ◽

Skeletal Muscle ◽

Sample Size ◽

Gene Expression Data ◽

Human Skeletal Muscle ◽

Expression Data ◽

Rna Seq ◽

Library Size ◽

Total Rna ◽

Cellular Models

Abstract BackgroundThe biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, normalization is often treated in serendipitous manners. This is especially true for the viewpoint perspective, which may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as i) per-library-size, ii) per-total-RNA, and iii) per-sample-size (per-mg-tissue). ResultInitially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 28% and 24% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 5% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively). ConclusionScientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.

Download Full-text

Diagnosing Cornelia de Lange syndrome and related neurodevelopmental disorders using RNA-sequencing

10.1101/19008300 ◽

2019 ◽

Author(s):

Stefan Rentas ◽

Komal S. Rathi ◽

Maninder Kaur ◽

Pichai Raman ◽

Ian D. Krantz ◽

...

Keyword(s):

Rna Sequencing ◽

Neurodevelopmental Disorders ◽

Diagnostic Testing ◽

Tissue Expression ◽

Cornelia De Lange Syndrome ◽

Rna Seq ◽

Genetic Syndromes ◽

Gene Testing ◽

Mendelian Gene ◽

Cornelia De Lange

ABSTRACTPurposeNeurodevelopmental phenotypes represent major indications for children undergoing clinical exome sequencing. However, 50% of cases remain undiagnosed even upon exome reanalysis. Here we show RNA sequencing (RNA-seq) on human B lymphoblastoid cell lines (LCL) is highly suitable for neurodevelopmental Mendelian gene testing and demonstrate the utility of this approach in suspected cases of Cornelia de Lange syndrome (CdLS).MethodsGenotype-Tissue Expression project transcriptome data for LCL, blood, and brain was assessed for neurodevelopmental Mendelian gene expression. Detection of abnormal splicing and pathogenic variants in these genes was performed with a novel RNA-seq diagnostic pipeline and using a validation CdLS-LCL cohort (n=10) and test cohort of patients who carry a clinical diagnosis of CdLS but negative genetic testing (n=5).ResultsLCLs share isoform diversity of brain tissue for a large subset of neurodevelopmental genes and express 1.8-fold more of these genes compared to blood (LCL, n=1706; whole blood, n=917). This enables testing of over 1000 genetic syndromes. The RNA-seq pipeline had 90% sensitivity for detecting pathogenic events and revealed novel diagnoses such as abnormal splice products in NIPBL and pathogenic coding variants in BRD4 and ANKRD11.ConclusionThe LCL transcriptome enables robust frontline and/or reflexive diagnostic testing for neurodevelopmental disorders.

Download Full-text

Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data

10.1101/2020.09.22.308577 ◽

2020 ◽

Author(s):

Kayla A Johnson ◽

Arjun Krishnan

Keyword(s):

Gene Expression ◽

Tissue Expression ◽

Specific Gene ◽

Expression Data ◽

Rna Seq ◽

Functional Relationships ◽

Gene Coexpression ◽

Network Transformation ◽

Coexpression Networks ◽

The Impact

AbstractBackgroundConstructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks – including good choices for data pre-processing, normalization, and network transformation – have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing/normalization methods for RNA-seq focus on the end goal of determining differential gene expression.ResultsHere, we present a comprehensive benchmarking and analysis of 30 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We tested these workflows on both large, homogenous datasets (Genotype-Tissue Expression project) and small, heterogeneous datasets from various labs (submitted to the Sequence Read Archive). We analyzed the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with trimmed mean of M-values or upper quartile normalization producing networks that most accurately recapitulate known tissue-naive and tissue-specific gene functional relationships.ConclusionsBased on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/norm_for_RNAseq_coexp to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.

Download Full-text