scholarly journals Tissue heterogeneity is prevalent in gene expression studies

2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Gregor Sturm ◽  
Markus List ◽  
Jitao David Zhang

Abstract Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data.

2020 ◽  
Author(s):  
Gregor Sturm ◽  
Markus List ◽  
Jitao David Zhang

Background: Lack of reproducibility in gene expression studies has recently attracted much attention in and beyond the biomedical research community. Previous efforts have identified many underlying factors, such as batch effects and incorrect sample annotations. Recently, tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, was proposed as a source of variance that exacerbates irreproducibility and is commonly ignored. Results: Here, we systematically analyzed 2,692 publicly available gene expression datasets including 78,332 samples for tissue heterogeneity. We found a prevalence of tissue heterogeneity in gene expression data that affects on average 5-15% of the samples, depending on the tissue type. We distinguish cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, from cases of moderate heterogeneity, which are more likely caused by tissue infiltration or sample contamination. Conclusions: Tissue heterogeneity is a widespread issue in publicly available gene expression datasets and thus an important source of variance that should not be ignored. We advocate the application of quality control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data.


2019 ◽  
Vol 16 (3) ◽  
Author(s):  
Nimisha Asati ◽  
Abhinav Mishra ◽  
Ankita Shukla ◽  
Tiratha Raj Singh

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 945
Author(s):  
Samarendra Das ◽  
Shesh N. Rai

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.


Plants ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 1272
Author(s):  
Judit Tajti ◽  
Magda Pál ◽  
Tibor Janda

Oat (Avena sativa L.) is a widely cultivated cereal with high nutritional value and it is grown mainly in temperate regions. The number of studies dealing with gene expression changes in oat continues to increase, and to obtain reliable RT-qPCR results it is essential to establish and use reference genes with the least possible influence caused by experimental conditions. However, no detailed study has been conducted on reference genes in different tissues of oat under diverse abiotic stress conditions. In our work, nine candidate reference genes (ACT, TUB, CYP, GAPD, UBC, EF1, TBP, ADPR, PGD) were chosen and analysed by four statistical methods (GeNorm, Normfinder, BestKeeper, RefFinder). Samples were taken from two tissues (leaves and roots) of 13-day-old oat plants exposed to five abiotic stresses (drought, salt, heavy metal, low and high temperatures). ADPR was the top-rated reference gene for all samples, while different genes proved to be the most stable depending on tissue type and treatment combinations. TUB and EF1 were most affected by the treatments in general. Validation of reference genes was carried out by PAL expression analysis, which further confirmed their reliability. These results can contribute to reliable gene expression studies for future research in cultivated oat.


BMC Genomics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Jitao David Zhang ◽  
Klas Hatje ◽  
Gregor Sturm ◽  
Clemens Broger ◽  
Martin Ebeling ◽  
...  

Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 2779-2779
Author(s):  
Naomi Galili ◽  
Pablo Tamayo ◽  
Olga B Botvinnik ◽  
Jill P Mesirov ◽  
Jennifer Zikria ◽  
...  

Abstract Abstract 2779 Interpretation of gene expression studies in MDS have been especially challenging due to the heterogeneity of the cell lineages that comprise the malignant clone. In attempting to overcome these difficulties we have used a bedside-to-bench approach to define an expression signature that may identify patients likely to respond. Ezatiostat hydrochloride (TLK199) is an inhibitor of glutathione S-transferase, an enzyme that is over expressed in many cancers, and has been shown in vitro to stimulate growth and differentiation of hematopoietic progenitor cells and to induce apoptosis in leukemia cells. Based on multilineage responses in low-Int1 MDS patients in our phase 2 study of oral TLK199, a multi institutional phase 2 study was conducted in low-Int1 patients. Response was evaluated by International Working Group (IWG 2006) criteria. Pre-therapy bone marrow mononuclear cells of patients treated with TLK199 were analyzed for gene expression on the Illumina HT12v4 whole genome array with IRB approval. RNA isolated from the marrow mononuclear cells was available on 9 responders (R) and 21 non-responders (NR). Five R and 13 NR were randomly chosen to create a training set with the intent to later use the remaining samples for model testing. We identified the top 100 differentially expressed genes using a sensitive metric based on the normalized mutual information. We also performed single-sample Gene Set Enrichment Analysis to find the most salient differences in terms of pathways and biological processes between R/NR. Of special note are the 4 microRNA s differentially expressed between R/NR. Three miRNAs are under-expressed (miR-129, 802 and 548e) and one (miR-155) is over-expressed in R. Reduced expression of miR-129 has been reported in solid tumors when over-expressed has been shown to have anti-proliferative activity in cell lines. SOX4 is a target gene for miR129 and reduced expression of miR-129 results in concomitant up-regulation of SOX4 mRNA which can function as both an oncogene and a tumor suppressor gene depending on tumor lineage. Over-expression of SOX4 inhibited cytokine induced granulocyte maturation in the myeloid 32Dcl3 cell line suggesting a possible role in MDS. MiR-802 targets the receptor for angiotensin II and when expression is decreased there is increased angiotensin II activity. It has recently been shown that angiotensin is a pro-inflammatory mediator that participates in apoptosis, angiogenesis and promotes mitochondrial dysfunction, all characteristics of MDS. In addition, the transcription factor ZFHX3, a predicted target of miR-802, is a negative regulator of c-MYB which has been shown to be up-regulated in all subtypes of MDS. Similarly, c-MYB is a predicted target of miR-155, which is over-expressed in TLK199 responders. MiR-155 was shown to be over-expressed in marrow cells of a subset of human AML patients. Of particular note are the studies showing that sustained expression of miR-155 in mouse hematopoietic stem cells cause a myeloproliferative/myelodysplastic disorder. Subsequent pathway analysis of this expression data revealed that a JNK gene set as defined from the GEO dataset GDSS8081 was consistently under-expressed in responders and over-expressed in non-responders. TLK199 has been shown to induce JUN/JNK by binding to glutathione S-transferase, a key inhibitor of this pathway. The expression data confirms that patients whose pre-therapy marrow shows under-expression of the JNK gene set are precisely those who benefit from this drug therapy and those patients who already over-express these genes are unlikely to respond. This study highlights two important points: 1) Using a bedside-to-bench strategy yielded a signature that distinguished responders from non-responders 2) The signature identified genes and signaling pathways that shed light on both the biology of the disease and the mechanism of action of the drug. In conclusion, if these results are confirmed in the test set, we will use the signature in a future prospective study to preselect MDS patients for therapy with this promising drug. Disclosures: Brown: Telik, Inc.: Employment, Equity Ownership.


2019 ◽  
Author(s):  
Ashkaun Razmara ◽  
Shannon E. Ellis ◽  
Dustin J. Sokolowski ◽  
Sean Davis ◽  
Michael D. Wilson ◽  
...  

AbstractThe usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information (“metadata”). We released the recount2 project, which involved re-processing ∼70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ∼50,000 RNA-seq samples from SRA in recount2 are inconsistently annotated with metadata. Tissue type, sex, and library type can be estimated from the RNA sequencing (RNA-seq) data itself. However, more detailed and harder to predict metadata, like age and diagnosis, must ideally be provided by labs that deposit the data.To facilitate more analyses within human brain tissue data, we have complemented phenotype predictions by manually constructing a uniformly-curated database of public RNA-seq samples present in SRA and recount2. We describe the reproducible curation process for constructing recount-brain that involves systematic review of the primary manuscript, which can serve as a guide to annotate other studies and tissues. We further expanded recount-brain by merging it with GTEx and TCGA brain samples as well as linking to controlled vocabulary terms for tissue, Brodmann area and disease. Furthermore, we illustrate how to integrate the sample metadata in recount-brain with the gene expression data in recount2 to perform differential expression analysis. We then provide three analysis examples involving modeling postmortem interval, glioblastoma, and meta-analyses across GTEx and TCGA. Overall, recount-brain facilitates expression analyses and improves their reproducibility as individual researchers do not have to manually curate the sample metadata. recount-brain is available via the add_metadata() function from the recount Bioconductor package at bioconductor.org/packages/recount.


2017 ◽  
Author(s):  
Weiguang Mao ◽  
Elena Zaslavsky ◽  
Boris M. Hartmann ◽  
Stuart C. Sealfon ◽  
Maria Chikina

AbstractA major challenge in gene expression analysis is to accurately infer relevant biological insight, such as regulation of cell type proportion or pathways, from global gene expression studies. We present a general solution for this problem that outperforms available cell proportion inference algorithms, and is more widely useful to automatically identify specific pathways that regulate gene expression. Our method improves replicability and biological insight when applied to trans-eQTL identification.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Juliana Albano de Guimarães ◽  
Bidossessi Wilfried Hounpke ◽  
Bruna Duarte ◽  
Ana Luiza Mylla Boso ◽  
Marina Gonçalves Monteiro Viturino ◽  
...  

AbstractPterygium is a common ocular surface condition frequently associated with irritative symptoms. The precise identity of its critical triggers as well as the hierarchical relationship between all the elements involved in the pathogenesis of this disease are not yet elucidated. Meta-analysis of gene expression studies represents a novel strategy capable of identifying key pathogenic mediators and therapeutic targets in complex diseases. Samples from nine patients were collected during surgery after photo documentation and clinical characterization of pterygia. Gene expression experiments were performed using Human Clariom D Assay gene chip. Differential gene expression analysis between active and atrophic pterygia was performed using limma package after adjusting variables by age. In addition, a meta-analysis was performed including recent gene expression studies available at the Gene Expression Omnibus public repository. Two databases including samples from adults with pterygium and controls fulfilled our inclusion criteria. Meta-analysis was performed using the Rank Production algorithm of the RankProd package. Gene set analysis was performed using ClueGO and the transcription factor regulatory network prediction was performed using appropriate bioinformatics tools. Finally, miRNA-mRNA regulatory network was reconstructed using up-regulated genes identified in the gene set analysis from the meta-analysis and their interacting miRNAs from the Brazilian cohort expression data. The meta-analysis identified 154 up-regulated and 58 down-regulated genes. A gene set analysis with the top up-regulated genes evidenced an overrepresentation of pathways associated with remodeling of extracellular matrix. Other pathways represented in the network included formation of cornified envelopes and unsaturated fatty acid metabolic processes. The miRNA-mRNA target prediction network, also reconstructed based on the set of up-regulated genes presented in the gene ontology and biological pathways network, showed that 17 target genes were negatively correlated with their interacting miRNAs from the Brazilian cohort expression data. Once again, the main identified cluster involved extracellular matrix remodeling mechanisms, while the second cluster involved formation of cornified envelope, establishment of skin barrier and unsaturated fatty acid metabolic process. Differential expression comparing active pterygium with atrophic pterygium using data generated from the Brazilian cohort identified differentially expressed genes between the two forms of presentation of this condition. Our results reveal differentially expressed genes not only in pterygium, but also in active pterygium when compared to the atrophic ones. New insights in relation to pterygium’s pathophysiology are suggested.


Sign in / Sign up

Export Citation Format

Share Document