Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types

Mapping Intimacies ◽

10.1101/103069 ◽

2017 ◽

Cited By ~ 19

Author(s):

Hilary K. Finucane ◽

Yakir A. Reshef ◽

Verneri Anttila ◽

Kamil Slowikowski ◽

Alexander Gusev ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Complex Disease ◽

Genome Wide Association Study ◽

Ex Vivo ◽

Cell Types ◽

Inhibitory Neurons ◽

Biliary Cirrhosis ◽

Expression Data ◽

Specific Expression

ABSTRACTGenetics can provide a systematic approach to discovering the tissues and cell types relevant for a complex disease or trait. Identifying these tissues and cell types is critical for following up on non-coding allelic function, developing ex-vivo models, and identifying therapeutic targets. Here, we analyze gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with genome-wide association study (GWAS) summary statistics for 48 diseases and traits with an average sample size of 169,331, to identify disease-relevant tissues and cell types. We develop and apply an approach that uses stratified LD score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We detect tissue-specific enrichments at FDR < 5% for 34 diseases and traits across a broad range of tissues that recapitulate known biology. In our analysis of traits with observed central nervous system enrichment, we detect an enrichment of neurons over other brain cell types for several brain-related traits, enrichment of inhibitory over excitatory neurons for bipolar disorder but excitatory over inhibitory neurons for schizophrenia and body mass index, and enrichments in the cortex for schizophrenia and in the striatum for migraine. In our analysis of traits with observed immunological enrichment, we identify enrichments of T cells for asthma and eczema, B cells for primary biliary cirrhosis, and myeloid cells for Alzheimer's disease, which we validated with independent chromatin data. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signal.

Download Full-text

GiniClust: detecting rare cell types from single-cell gene expression data with Gini index

Genome Biology ◽

10.1186/s13059-016-1010-4 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 126

Author(s):

Lan Jiang ◽

Huidong Chen ◽

Luca Pinello ◽

Guo-Cheng Yuan

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Gini Index ◽

Cell Types ◽

Expression Data ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Identification of stable reference genes in differentiating human pluripotent stem cells

Physiological Genomics ◽

10.1152/physiolgenomics.00130.2014 ◽

2015 ◽

Vol 47 (6) ◽

pp. 232-239 ◽

Cited By ~ 11

Author(s):

Gustav Holmgren ◽

Nidal Ghosheh ◽

Xianmin Zeng ◽

Yalda Bogestål ◽

Peter Sartipy ◽

...

Keyword(s):

Gene Expression ◽

Stem Cells ◽

Gene Expression Data ◽

Pluripotent Stem Cells ◽

Reference Genes ◽

Cell Types ◽

Human Pluripotent Stem Cells ◽

Expression Data ◽

Large Variability ◽

The Stability

Reference genes, often referred to as housekeeping genes (HKGs), are frequently used to normalize gene expression data based on the assumption that they are expressed at a constant level in the cells. However, several studies have shown that there may be a large variability in the gene expression levels of HKGs in various cell types. In a previous study, employing human embryonic stem cells (hESCs) subjected to spontaneous differentiation, we observed that the expression of commonly used HKG varied to a degree that rendered them inappropriate to use as reference genes under those experimental settings. Here we present a substantially extended study of the HKG signature in human pluripotent stem cells (hPSC), including nine global gene expression datasets from both hESC and human induced pluripotent stem cells, obtained during directed differentiation toward endoderm-, mesoderm-, and ectoderm derivatives. Sets of stably expressed genes were compiled, and a handful of genes (e.g., EID2, ZNF324B, CAPN10, and RABEP2) were identified as generally applicable reference genes in hPSCs across all cell lines and experimental conditions. The stability in gene expression profiles was confirmed by reverse transcription quantitative PCR analysis. Taken together, the current results suggest that differentiating hPSCs have a distinct HKG signature, which in some aspects is different from somatic cell types, and underscore the necessity to validate the stability of reference genes under the actual experimental setup used. In addition, the novel putative HKGs identified in this study can preferentially be used for normalization of gene expression data obtained from differentiating hPSCs.

Download Full-text

Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

eLife ◽

10.7554/elife.26476 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 107

Author(s):

Julien Racle ◽

Kaat de Jonge ◽

Petra Baumgaertner ◽

Daniel E Speiser ◽

David Gfeller

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Immune Cell ◽

Expression Profiles ◽

Cell Types ◽

Response To Therapy ◽

Expression Data ◽

Cell Type ◽

Tumor Gene Expression ◽

Tumor Gene

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).

Download Full-text

EPIC: A Tool to Estimate the Proportions of Different Cell Types from Bulk Gene Expression Data

Bioinformatics for Cancer Immunotherapy - Methods in Molecular Biology ◽

10.1007/978-1-0716-0327-7_17 ◽

2020 ◽

pp. 233-248

Author(s):

Julien Racle ◽

David Gfeller

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cell Types ◽

Expression Data ◽

Different Cell Types

Download Full-text

Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data

Nature Communications ◽

10.1038/s41467-021-26328-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Bárbara Andrade Barbosa ◽

Saskia D. van Asten ◽

Ji Won Oh ◽

Arantza Farina-Sarasqueta ◽

Joanne Verheij ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Expression Data ◽

Cell Type ◽

Expression Variability ◽

Variable Nature ◽

Log Normal

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.

Download Full-text

BLADE: Bayesian Log-normAl DEconvolution for enhanced in silico microdissection of bulk gene expression data

10.21203/rs.3.rs-123595/v1 ◽

2020 ◽

Author(s):

Bárbara Andrade Barbosa ◽

Saskia van Asten ◽

Ji-won Oh ◽

Arantza Fariña-Sarasqueta ◽

Joanne Verheij ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Expression Data ◽

Cell Type ◽

Expression Variability ◽

Variable Nature ◽

Log Normal

Abstract High-resolution deconvolution of bulk gene expression profiles is pivotal to characterize the complex cellular make-up of tissues, such as tumor microenvironment. Single-cell RNA-seq provides reliable prior knowledge for deconvolution, however, a comprehensive statistical model is required for efficient utilization due to the inherently variable nature of gene expression. We introduce BLADE (Bayesian Log-normAl Deconvolution), a comprehensive probabilistic framework to estimate both cellular make-up and gene expression profiles of each cell type in each sample. Unlike previous comprehensive statistical approaches, BLADE can handle >20 cell types thanks to the efficient variational inference. Throughout an intensive evaluation using >700 datasets, BLADE showed enhanced robustness against gene expression variability and better completeness than conventional methods, in particular to reconstruct gene expression profiles of each cell type. All-in-all, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems based on standard bulk gene expression data.

Download Full-text

Comprehensive analysis of immune cell enrichment in the tumor microenvironment of head and neck squamous cell carcinoma

Scientific Reports ◽

10.1038/s41598-021-95718-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ikko Mito ◽

Hideyuki Takahashi ◽

Reika Kawabata-Iwakawa ◽

Shota Ida ◽

Hiroe Tada ◽

...

Keyword(s):

Gene Expression ◽

T Cells ◽

Tumor Microenvironment ◽

Head And Neck ◽

Gene Expression Data ◽

Immune Cell ◽

Cell Types ◽

Expression Data ◽

Immune Microenvironment ◽

Cancer Tissues

AbstractHead and neck squamous carcinoma (HNSCC) is highly infiltrated by immune cells, including tumor-infiltrating lymphocytes and myeloid lineage cells. In the tumor microenvironment, tumor cells orchestrate a highly immunosuppressive microenvironment by secreting immunosuppressive mediators, expressing immune checkpoint ligands, and downregulating human leukocyte antigen expression. In the present study, we aimed to comprehensively profile the immune microenvironment of HNSCC using gene expression data obtained from public database. We calculated enrichment scores of 33 immune cell types based on gene expression data of HNSCC tissues and adjacent non-cancer tissues. Based on these scores, we performed non-supervised clustering and identified three immune signatures—cold, lymphocyte, and myeloid/dendritic cell (DC)—based on the clustering results. We then compared the clinical and biological features of the three signatures. Among HNSCC and non-cancer tissues, human papillomavirus (HPV)-positive HNSCCs exhibited the highest scores in various immune cell types, including CD4+ T cells, CD8+ T cells, B cells, plasma cells, basophils, and their subpopulations. Among the three immune signatures, the proportions of HPV-positive tumors, oropharyngeal cancers, early T tumors, and N factor positive cases were significantly higher in the lymphocyte signature than in other signatures. Among the three signatures, the lymphocyte signature showed the longest overall survival (OS), especially in HPV-positive patients, whereas the myeloid/DC signature demonstrated the shortest OS in these patients. Gene set enrichment analysis revealed the upregulation of several pathways related to inflammatory and proinflammatory responses in the lymphocyte signature. The expression of PRF1, IFNG, GZMB, CXCL9, CXCL10, PDCD1, LAG3, CTLA4, HAVCR2, and TIGIT was the highest in the lymphocyte signature. Meanwhile, the expression of PD-1 ligand genes CD274 and PDCD1LG2 was highest in the myeloid/DC signature. Herein, our findings revealed the transcriptomic landscape of the immune microenvironment that closely reflects the clinical and biological significance of HNSCC, indicating that molecular profiling of the immune microenvironment can be employed to develop novel biomarkers and precision immunotherapies for HNSCC.

Download Full-text

Novel modelling of clustering for enhanced classification performance on gene expression data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp2060-2068 ◽

2020 ◽

Vol 10 (2) ◽

pp. 2060

Author(s):

Sudha V. ◽

Girijamma H. A.

Keyword(s):

Gene Expression ◽

Computational Complexity ◽

Gene Expression Data ◽

Complex Disease ◽

Classification Performance ◽

Expression Data ◽

Microarray Database ◽

Conventional Procedure ◽

Cancer Review ◽

Eigen Value

Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expression data. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.

Download Full-text

Decision letter: Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

10.7554/elife.26476.048 ◽

2017 ◽

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Immune Cell ◽

Cell Types ◽

Expression Data ◽

Tumor Gene Expression ◽

Tumor Gene

Download Full-text

A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset

Frontiers in Genetics ◽

10.3389/fgene.2021.644378 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ge Zhang ◽

Zijing Xue ◽

Chaokun Yan ◽

Jianlin Wang ◽

Huimin Luo

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Dna Methylation ◽

Feature Selection ◽

Gene Expression Data ◽

Complex Disease ◽

Biological Data ◽

Computational Method ◽

Superior Performance ◽

Expression Data

As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.

Download Full-text