TOXPANEL: A Gene-Set Analysis Tool to Assess Liver and Kidney Injuries

Gene-set analysis is commonly used to identify trends in gene expression when cells, tissues, organs, or organisms are subjected to conditions that differ from those within the normal physiological range. However, tools for gene-set analysis to assess liver and kidney injury responses are less common. Furthermore, most websites for gene-set analysis lack the option for users to customize their gene-set database. Here, we present the ToxPanel website, which allows users to perform gene-set analysis to assess liver and kidney injuries using activation scores based on gene-expression fold-change values. The results are graphically presented to assess constituent injury phenotypes (histopathology), with interactive result tables that identify the main contributing genes to a given signal. In addition, ToxPanel offers the flexibility to analyze any set of custom genes based on gene fold-change values. ToxPanel is publically available online at https://toxpanel.bhsai.org. ToxPanel allows users to access our previously developed liver and kidney injury gene sets, which we have shown in previous work to yield robust results that correlate with the degree of injury. Users can also test and validate their customized gene sets using the ToxPanel website.

Download Full-text

Disease and phenotype gene set analysis of disease-based gene expression in mouse and human

Physiological Genomics ◽

10.1152/physiolgenomics.00008.2010 ◽

2010 ◽

Vol 42A (2) ◽

pp. 162-167 ◽

Cited By ~ 11

Author(s):

Supriyo De ◽

Yongqing Zhang ◽

John R. Garner ◽

S. Alex Wang ◽

Kevin G. Becker

Keyword(s):

Gene Expression ◽

Human Disease ◽

Complex Disease ◽

Population Based ◽

Gene Set Analysis ◽

Common Disease ◽

Analysis Tool ◽

Gene Set ◽

Disease Phenotypes ◽

Gene Sets

The genetic contributions to common disease and complex disease phenotypes are pleiotropic, multifactorial, and combinatorial. Gene set analysis is a computational approach used in the analysis of microarray data to rapidly query gene combinations and multifactorial processes. Here we use novel gene sets based on population-based human genetic associations in common human disease or experimental genetic mouse models to analyze disease-related microarray studies. We developed a web-based analysis tool that uses these novel disease- and phenotype-related gene sets to analyze microarray-based gene expression data. These gene sets show disease and phenotype specificity in a species-specific and cross-species fashion. In this way, we integrate population-based common human disease genetics, mouse genetically determined phenotypes, and disease or phenotype structured ontologies, with gene expression studies relevant to human disease. This may aid in the translation of large-scale high-throughput datasets into the context of clinically relevant disease phenotypes.

Download Full-text

Excitatory/inhibitory imbalance in autism: the role of glutamate and GABA gene-sets in symptoms and cortical brain structure

10.1101/2021.12.20.473501 ◽

2021 ◽

Author(s):

Viola Hollestein ◽

Geert Poelmans ◽

Natalie Forde ◽

Christian F Beckmann ◽

Christine Ecker ◽

...

Keyword(s):

Gene Expression ◽

Sensory Processing ◽

Brain Structure ◽

Symptom Severity ◽

Expression Profiles ◽

Brain Regions ◽

Gene Set Analysis ◽

Gene Set ◽

Gene Sets ◽

Autism Symptomatology

Background: The excitatory/inhibitory (E/I) imbalance hypothesis posits that an imbalance between excitatory (glutamatergic) and inhibitory (GABAergic) mechanisms underlies the behavioral characteristics of autism spectrum disorder (autism). However, how E/I imbalance arises and how it may differ across autism symptomatology and brain regions is not well understood. Methods: We used innovative analysis methods - combining competitive gene-set analysis and gene-expression profiles in relation to cortical thickness (CT)- to investigate the relationship between genetic variance, brain structure and autism symptomatology of participants from the EU-AIMS LEAP cohort (autism=360, male/female=259/101; neurotypical control participants=279, male/female=178/101) aged 6 to 30 years. Competitive gene-set analysis investigated associations between glutamatergic and GABAergic signaling pathway gene-sets and clinical measures, and CT. Additionally, we investigated expression profiles of the genes within those sets throughout the brain and how those profiles relate to differences in CT between autistic and neurotypical control participants in the same regions. Results: The glutamate gene-set was associated with all autism symptom severity scores on the Autism Diagnostic Observation Schedule-2 (ADOS-2) and the Autism Diagnostic Interview-Revised (ADI-R) within the autistic group, while the GABA set was associated with sensory processing measures (using the SSP subscales) across all participants. Brain regions with greater gene expression of both glutamate and GABA genes showed greater differences in CT between autistic and neurotypical control participants. Conclusions: Our results suggest crucial roles for glutamate and GABA genes in autism symptomatology as well as CT, where GABA is more strongly associated with sensory processing and glutamate more with autism symptom severity.

Download Full-text

Linear Combination Test for Hierarchical Gene Set Analysis

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1641 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 8

Author(s):

Xiaoming Wang ◽

Irina Dinu ◽

Wei Liu ◽

Yutaka Yasui

Keyword(s):

Gene Expression ◽

Linear Combination ◽

Covariance Matrix ◽

Gene Set Analysis ◽

Gene Set ◽

Hotelling's T2 ◽

Gene Sets ◽

Hotelling’S T2 ◽

Combination Test ◽

Linear Combination Test

Gene-set analysis (GSA) aims to identify sets of differentially expressed genes by a phenotype in DNA microarray studies. Challenges occur due to the salient characteristics of the data: (1) the number of genes is far larger than the number of observations; (2) gene expression measurements, especially within each gene set, can be highly correlated; and (3) the number of gene sets that can be examined is large and increasing rapidly. These challenges call for gene-set testing procedures that have both efficiency in computation for large GSAs and high power in the presence of the high correlation.We propose a new GSA approach called Linear Combination Test (LCT), incorporating the covariance matrix estimator of gene expression into the test statistic. The proposed LCT and two other GSA methods, a mod-ification of Hotelling’s T2 using a shrinkage covariance matrix and our SAM-GS (Dinu et. al. 2007), the two methods that have been reported by Tsai and Chen (2009) to perform best in terms of power, are evaluated in simulation studies and a real microarray study. The LCT method is more computationally efficient than the modified Hotelling’s T2 and approximates the superb power of the modified Hotelling’s T2. LCT is slightly faster than SAM-GS, but more powerful, due to incorporating the covariance matrix estimator. An extra step to enhance the interpretation of GSA results is also proposed in the form of a hierarchical LC (HLC) testing procedure, providing scientists useful hierarchical information on gene sets that LCT identified as differentially expressed.Availability: A free R-code to perform LCT-GSA and HLC test is available at http://www.ualberta.ca/~yyasui/homepage.html.

Download Full-text

Distance-correlation based gene set analysis in longitudinal studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2017-0053 ◽

2018 ◽

Vol 17 (1) ◽

Author(s):

Jiehuan Sun ◽

Jose D. Herazo-Maya ◽

Xiu Huang ◽

Naftali Kaminski ◽

Hongyu Zhao

Keyword(s):

Gene Expression ◽

Disease Progression ◽

Clinical Outcomes ◽

Expression Profiles ◽

Gene Set Analysis ◽

Related Gene ◽

Disease Etiology ◽

Distance Correlation ◽

Gene Set ◽

Gene Sets

AbstractLongitudinal gene expression profiles of subjects are collected in some clinical studies to monitor disease progression and understand disease etiology. The identification of gene sets that have coordinated changes with relevant clinical outcomes over time from these data could provide significant insights into the molecular basis of disease progression and lead to better treatments. In this article, we propose a Distance-Correlation based Gene Set Analysis (dcGSA) method for longitudinal gene expression data. dcGSA is a non-parametric approach, statistically robust, and can capture both linear and nonlinear relationships between gene sets and clinical outcomes. In addition, dcGSA is able to identify related gene sets in cases where the effects of gene sets on clinical outcomes differ across subjects due to the subject heterogeneity, remove the confounding effects of some unobserved time-invariant covariates, and allow the assessment of associations between gene sets and multiple related outcomes simultaneously. Through extensive simulation studies, we demonstrate that dcGSA is more powerful of detecting relevant genes than other commonly used gene set analysis methods. When dcGSA is applied to a real dataset on systemic lupus erythematosus, we are able to identify more disease related gene sets than other methods.

Download Full-text

Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500153 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1650015 ◽

Cited By ~ 3

Author(s):

Worrawat Engchuan ◽

Asawin Meechai ◽

Sissades Tongsima ◽

Narumol Doungpan ◽

Jonathan H. Chan

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Complex Disease ◽

Classification Model ◽

Gene Set Analysis ◽

Diagnostic Model ◽

Analysis Tool ◽

Gene Set ◽

Analysis Methods

Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT’s java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded.

Download Full-text

Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies

Entropy ◽

10.3390/e23080945 ◽

2021 ◽

Vol 23 (8) ◽

pp. 945

Author(s):

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Statistical Approach ◽

Gene Set Analysis ◽

Expression Data ◽

Expression Study ◽

Gene Set ◽

Gene Sets ◽

Expression Studies ◽

Gene Expression Studies

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.

Download Full-text

Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1574888x13666180105125347 ◽

2018 ◽

Vol 21 (2) ◽

pp. 74-83

Author(s):

Tzu-Hung Hsiao ◽

Yu-Chiao Chiu ◽

Yu-Heng Chen ◽

Yu-Ching Hsu ◽

Hung-I Harry Chen ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Therapy ◽

Cancer Treatment ◽

Cancer Survival ◽

Expression Profiles ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Enhancing gene set enrichment using networks

F1000Research ◽

10.12688/f1000research.17824.2 ◽

2019 ◽

Vol 8 ◽

pp. 129 ◽

Cited By ~ 1

Author(s):

Michael Prummer

Keyword(s):

Biological Function ◽

Automated Analysis ◽

Gene Set Analysis ◽

Molecular Pathways ◽

Human Intervention ◽

Gene Set Enrichment ◽

Topological Information ◽

Gene Set ◽

Gene Sets ◽

Differential Gene

Differential gene expression (DGE) studies often suffer from poor interpretability of their primary results, i.e., thousands of differentially expressed genes. This has led to the introduction of gene set analysis (GSA) methods that aim at identifying interpretable global effects by grouping genes into sets of common context, such as, molecular pathways, biological function or tissue localization. In practice, GSA often results in hundreds of differentially regulated gene sets. Similar to the genes they contain, gene sets are often regulated in a correlative fashion because they share many of their genes or they describe related processes. Using these kind of neighborhood information to construct networks of gene sets allows to identify highly connected sub-networks as well as poorly connected islands or singletons. We show here how topological information and other network features can be used to filter and prioritize gene sets in routine DGE studies. Community detection in combination with automatic labeling and the network representation of gene set clusters further constitute an appealing and intuitive visualization of GSA results. The RICHNET workflow described here does not require human intervention and can thus be conveniently incorporated in automated analysis pipelines.

Download Full-text

Measuring consistency among gene set analysis methods: A systematic study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400109 ◽

2019 ◽

Vol 17 (05) ◽

pp. 1940010 ◽

Cited By ~ 1

Author(s):

Farhad Maleki ◽

Katie L. Ovens ◽

Daniel J. Hogan ◽

Elham Rezaei ◽

Alan M. Rosenberg ◽

...

Keyword(s):

Gene Set Analysis ◽

Rna Seq ◽

Systematic Analysis ◽

Gene Set ◽

Large Gene ◽

Analysis Methods ◽

Gene Sets ◽

Significant Gene ◽

Biological Insight ◽

Relevant Gene

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

Download Full-text