Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.

Download Full-text

Transcriptomics and network analysis highlight potential pathways in the pathogenesis of pterygium

Scientific Reports ◽

10.1038/s41598-021-04248-x ◽

2022 ◽

Vol 12 (1) ◽

Author(s):

Juliana Albano de Guimarães ◽

Bidossessi Wilfried Hounpke ◽

Bruna Duarte ◽

Ana Luiza Mylla Boso ◽

Marina Gonçalves Monteiro Viturino ◽

...

Keyword(s):

Gene Expression ◽

Extracellular Matrix ◽

Regulatory Network ◽

Meta Analysis ◽

Gene Set Analysis ◽

Expression Data ◽

Gene Set ◽

Expression Studies ◽

Gene Expression Studies ◽

Brazilian Cohort

AbstractPterygium is a common ocular surface condition frequently associated with irritative symptoms. The precise identity of its critical triggers as well as the hierarchical relationship between all the elements involved in the pathogenesis of this disease are not yet elucidated. Meta-analysis of gene expression studies represents a novel strategy capable of identifying key pathogenic mediators and therapeutic targets in complex diseases. Samples from nine patients were collected during surgery after photo documentation and clinical characterization of pterygia. Gene expression experiments were performed using Human Clariom D Assay gene chip. Differential gene expression analysis between active and atrophic pterygia was performed using limma package after adjusting variables by age. In addition, a meta-analysis was performed including recent gene expression studies available at the Gene Expression Omnibus public repository. Two databases including samples from adults with pterygium and controls fulfilled our inclusion criteria. Meta-analysis was performed using the Rank Production algorithm of the RankProd package. Gene set analysis was performed using ClueGO and the transcription factor regulatory network prediction was performed using appropriate bioinformatics tools. Finally, miRNA-mRNA regulatory network was reconstructed using up-regulated genes identified in the gene set analysis from the meta-analysis and their interacting miRNAs from the Brazilian cohort expression data. The meta-analysis identified 154 up-regulated and 58 down-regulated genes. A gene set analysis with the top up-regulated genes evidenced an overrepresentation of pathways associated with remodeling of extracellular matrix. Other pathways represented in the network included formation of cornified envelopes and unsaturated fatty acid metabolic processes. The miRNA-mRNA target prediction network, also reconstructed based on the set of up-regulated genes presented in the gene ontology and biological pathways network, showed that 17 target genes were negatively correlated with their interacting miRNAs from the Brazilian cohort expression data. Once again, the main identified cluster involved extracellular matrix remodeling mechanisms, while the second cluster involved formation of cornified envelope, establishment of skin barrier and unsaturated fatty acid metabolic process. Differential expression comparing active pterygium with atrophic pterygium using data generated from the Brazilian cohort identified differentially expressed genes between the two forms of presentation of this condition. Our results reveal differentially expressed genes not only in pterygium, but also in active pterygium when compared to the atrophic ones. New insights in relation to pterygium’s pathophysiology are suggested.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Gene Expression Studies May Identify Lower Risk Myelodysplastic Syndrome Patients Likely to Respond to Therapy with Ezatiostat Hydrochloride (TLK199)

Blood ◽

10.1182/blood.v118.21.2779.2779 ◽

2011 ◽

Vol 118 (21) ◽

pp. 2779-2779

Author(s):

Naomi Galili ◽

Pablo Tamayo ◽

Olga B Botvinnik ◽

Jill P Mesirov ◽

Jennifer Zikria ◽

...

Keyword(s):

Gene Expression ◽

Angiotensin Ii ◽

Mononuclear Cells ◽

Expression Data ◽

Phase 2 ◽

Glutathione S Transferase ◽

Gene Set ◽

Expression Studies ◽

Phase 2 Study ◽

Gene Expression Studies

Abstract Abstract 2779 Interpretation of gene expression studies in MDS have been especially challenging due to the heterogeneity of the cell lineages that comprise the malignant clone. In attempting to overcome these difficulties we have used a bedside-to-bench approach to define an expression signature that may identify patients likely to respond. Ezatiostat hydrochloride (TLK199) is an inhibitor of glutathione S-transferase, an enzyme that is over expressed in many cancers, and has been shown in vitro to stimulate growth and differentiation of hematopoietic progenitor cells and to induce apoptosis in leukemia cells. Based on multilineage responses in low-Int1 MDS patients in our phase 2 study of oral TLK199, a multi institutional phase 2 study was conducted in low-Int1 patients. Response was evaluated by International Working Group (IWG 2006) criteria. Pre-therapy bone marrow mononuclear cells of patients treated with TLK199 were analyzed for gene expression on the Illumina HT12v4 whole genome array with IRB approval. RNA isolated from the marrow mononuclear cells was available on 9 responders (R) and 21 non-responders (NR). Five R and 13 NR were randomly chosen to create a training set with the intent to later use the remaining samples for model testing. We identified the top 100 differentially expressed genes using a sensitive metric based on the normalized mutual information. We also performed single-sample Gene Set Enrichment Analysis to find the most salient differences in terms of pathways and biological processes between R/NR. Of special note are the 4 microRNA s differentially expressed between R/NR. Three miRNAs are under-expressed (miR-129, 802 and 548e) and one (miR-155) is over-expressed in R. Reduced expression of miR-129 has been reported in solid tumors when over-expressed has been shown to have anti-proliferative activity in cell lines. SOX4 is a target gene for miR129 and reduced expression of miR-129 results in concomitant up-regulation of SOX4 mRNA which can function as both an oncogene and a tumor suppressor gene depending on tumor lineage. Over-expression of SOX4 inhibited cytokine induced granulocyte maturation in the myeloid 32Dcl3 cell line suggesting a possible role in MDS. MiR-802 targets the receptor for angiotensin II and when expression is decreased there is increased angiotensin II activity. It has recently been shown that angiotensin is a pro-inflammatory mediator that participates in apoptosis, angiogenesis and promotes mitochondrial dysfunction, all characteristics of MDS. In addition, the transcription factor ZFHX3, a predicted target of miR-802, is a negative regulator of c-MYB which has been shown to be up-regulated in all subtypes of MDS. Similarly, c-MYB is a predicted target of miR-155, which is over-expressed in TLK199 responders. MiR-155 was shown to be over-expressed in marrow cells of a subset of human AML patients. Of particular note are the studies showing that sustained expression of miR-155 in mouse hematopoietic stem cells cause a myeloproliferative/myelodysplastic disorder. Subsequent pathway analysis of this expression data revealed that a JNK gene set as defined from the GEO dataset GDSS8081 was consistently under-expressed in responders and over-expressed in non-responders. TLK199 has been shown to induce JUN/JNK by binding to glutathione S-transferase, a key inhibitor of this pathway. The expression data confirms that patients whose pre-therapy marrow shows under-expression of the JNK gene set are precisely those who benefit from this drug therapy and those patients who already over-express these genes are unlikely to respond. This study highlights two important points: 1) Using a bedside-to-bench strategy yielded a signature that distinguished responders from non-responders 2) The signature identified genes and signaling pathways that shed light on both the biology of the disease and the mechanism of action of the drug. In conclusion, if these results are confirmed in the test set, we will use the signature in a future prospective study to preselect MDS patients for therapy with this promising drug. Disclosures: Brown: Telik, Inc.: Employment, Equity Ownership.

Download Full-text

Gene Expression Studies to Identify Significant Genes in AR, MTOR, MAPK Pathways and their Overlapping Regulatory Role in Prostate Cancer

Journal of Integrative Bioinformatics ◽

10.1515/jib-2018-0080 ◽

2019 ◽

Vol 16 (3) ◽

Author(s):

Nimisha Asati ◽

Abhinav Mishra ◽

Ankita Shukla ◽

Tiratha Raj Singh

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Meta Analysis ◽

Expression Patterns ◽

Mitogen Activated Protein Kinase ◽

Expression Data ◽

Mapk Pathways ◽

Expression Studies ◽

Gene Expression Studies

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.

Download Full-text

Correction: Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1004446 ◽

2015 ◽

Vol 11 (8) ◽

pp. e1004446

Author(s):

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Time Course ◽

Gene Set Analysis ◽

Expression Data ◽

Gene Set

Download Full-text

Tissue heterogeneity is prevalent in gene expression studies

10.1101/2020.12.02.407809 ◽

2020 ◽

Author(s):

Gregor Sturm ◽

Markus List ◽

Jitao David Zhang

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Tissue Type ◽

Control Methods ◽

Expression Data ◽

Batch Effects ◽

Tissue Heterogeneity ◽

Sample Handling ◽

Expression Studies ◽

Gene Expression Studies

Background: Lack of reproducibility in gene expression studies has recently attracted much attention in and beyond the biomedical research community. Previous efforts have identified many underlying factors, such as batch effects and incorrect sample annotations. Recently, tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, was proposed as a source of variance that exacerbates irreproducibility and is commonly ignored. Results: Here, we systematically analyzed 2,692 publicly available gene expression datasets including 78,332 samples for tissue heterogeneity. We found a prevalence of tissue heterogeneity in gene expression data that affects on average 5-15% of the samples, depending on the tissue type. We distinguish cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, from cases of moderate heterogeneity, which are more likely caused by tissue infiltration or sample contamination. Conclusions: Tissue heterogeneity is a widespread issue in publicly available gene expression datasets and thus an important source of variance that should not be ignored. We advocate the application of quality control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data.

Download Full-text

Genes identified in rodent studies of alcohol intake are enriched for heritability of human substance use

10.1101/2021.03.22.436527 ◽

2021 ◽

Author(s):

Spencer B. Huggett ◽

Emma C. Johnson ◽

Alexander S. Hatoum ◽

Dongbing Lai ◽

Jason A. Bubier ◽

...

Keyword(s):

Gene Expression ◽

Smoking Cessation ◽

Substance Use ◽

Alcohol Use ◽

Gene Set ◽

Problematic Alcohol ◽

Gene Sets ◽

Expression Studies ◽

Gene Expression Studies ◽

Problematic Alcohol Use

ABSTRACTBackgroundRodent paradigms and human genome-wide association studies (GWASs) on drug use have the potential to provide biological insight into the pathophysiology of addiction.MethodsUsing GeneWeaver, we created rodent alcohol and nicotine gene-sets derived from 19 gene expression studies on alcohol and nicotine outcomes. We partitioned the SNP-heritability of these gene-sets using four large human GWASs: 1) alcoholic drinks per week, 2) problematic alcohol use, 3) cigarettes per day and 4) smoking cessation. We benchmarked our findings with curated human alcoholism and nicotine addiction gene-sets and performed specificity analyses using other rodent gene-sets (e.g., locomotor behavior) and other human GWASs (e.g., height).ResultsThe rodent alcohol gene-set was enriched for heritability of drinks per week, cigarettes per day, and smoking cessation, but not problematic alcohol use. However, the rodent nicotine gene-set was not significantly associated with any of these traits. Both rodent gene-sets showed enrichment for several non-substance use GWASs, and the extent of this relationship tended to increase as a function of trait heritability. In general, larger gene-sets demonstrated more significant enrichment. Finally, when evaluating human traits with similar heritabilities, both rodent gene-sets showed greater enrichment for substance use traits.ConclusionOur results suggest that rodent gene expression studies can help to identify genes that capture heritability of substance use traits in humans, yet the specificity to human substance use was less than expected due to various factors such as the genetic architecture of a trait. We outline various limitations, interpretations and considerations for future research.

Download Full-text

GSMA: Gene Set Matrix Analysis, An Automated Method for Rapid Hypothesis Testing of Gene Expression Data

Bioinformatics and Biology Insights ◽

10.1177/117793220700100003 ◽

2007 ◽

Vol 1 ◽

pp. 117793220700100 ◽

Cited By ~ 1

Author(s):

Chris Cheadle ◽

Tonya Watkins ◽

Jinshui Fan ◽

Marc A. Williams ◽

Steven Georas ◽

...

Keyword(s):

Gene Expression ◽

Hypothesis Testing ◽

Gene Expression Data ◽

Expression Patterns ◽

Matrix Analysis ◽

Global Changes ◽

Expression Data ◽

Gene Set ◽

Automated Method ◽

Gene Sets

Background Microarray technology has become highly valuable for identifying complex global changes in gene expression patterns. The assignment of functional information to these complex patterns remains a challenging task in effectively interpreting data and correlating results from across experiments, projects and laboratories. Methods which allow the rapid and robust evaluation of multiple functional hypotheses increase the power of individual researchers to data mine gene expression data more efficiently. Results We have developed (gene set matrix analysis) GSMA as a useful method for the rapid testing of group-wise up- or down-regulation of gene expression simultaneously for multiple lists of genes (gene sets) against entire distributions of gene expression changes (datasets) for single or multiple experiments. The utility of GSMA lies in its flexibility to rapidly poll gene sets related by known biological function or as designated solely by the end-user against large numbers of datasets simultaneously. Conclusions GSMA provides a simple and straightforward method for hypothesis testing in which genes are tested by groups across multiple datasets for patterns of expression enrichment.

Download Full-text

Covariance thresholding to detect differentially co-expressed genes from microarray gene expression data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972002050002x ◽

2020 ◽

Vol 18 (01) ◽

pp. 2050002

Author(s):

Mingyu Oh ◽

Kipoong Kim ◽

Hokeun Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Gene Set Analysis ◽

Expression Data ◽

Microarray Gene Expression ◽

Expression Levels ◽

Gene Set ◽

Significant Difference ◽

Microarray Gene

Gene set analysis aims to identify differentially expressed or co-expressed genes within a biological pathway between two experimental conditions, so that it can eventually reveal biological processes and pathways involved in disease development. In the last few decades, various statistical and computational methods have been proposed to improve statistical power of gene set analysis. In recent years, much attention has been paid to differentially co-expressed genes since they can be potentially disease-related genes without significant difference in average expression levels between two conditions. In this paper, we propose a new statistical method to identify differentially co-expressed genes from microarray gene expression data. The proposed method first estimates co-expression levels of paired genes using covariance regularization by thresholding, and then significance of difference in covariance estimation between two conditions is evaluated. We demonstrated that the proposed method is more powerful than the existing main-stream methods to detect co-expressed genes through extensive simulation studies. Also, we applied it to various microarray gene expression datasets related with mutant p53 transcriptional activity, and epithelium and stroma breast cancer.

Download Full-text