scholarly journals Comprehensive Benchmarking and Integration of Tumour Microenvironment Cell Estimation Methods

2018 ◽  
Author(s):  
Alejandro Jiménez-Sánchez ◽  
Oliver Cast ◽  
Martin Miller

ABSTRACTThe tumour microenvironment comprises complex cellular compositions and interactions between cancer, immune, and stromal components which all play crucial roles in cancer. Various computational approaches have been developed during the last decade that estimate the relative abundance of different cell types in an unbiased manner using bulk tumour RNA data. However, a comparison that objectively evaluates the performance of these approaches against one another has not been conducted. Here we benchmarked six widely used tools and gene sets: Bindea et al. gene sets, Davoli et al. gene sets, CIBERSORT, MCP-counter, TIMER, and xCell. We also introduce ConsensusTME, a consensus approach that uses the union of genes that the six tools used for cell estimation, and corrects for tumour-type specificity. We benchmarked the seven tools using TCGA DNA-derived purity scores (33 tumour-types), methylation-derived leukocyte scores (30 tumour-types), and H&E deep learning derived lymphocyte counts (13 tumour types), and individual benchmark data sets (PBMCs and 2 tumour types). Although none of the seven tools outperformed others in every benchmark, ConsensusTME ranked consistently well in all cancer-related benchmarks making it the top performing method overall. Computational methods that provide robust and accurate estimates of non-cancerous cell populations in the tumour microenvironment from tumour bulk expression data are important tools that can advance our understanding of tumour, immune, and stroma interactions, with potential clinical application if high accuracy estimates are achieved.

2008 ◽  
Vol 6 ◽  
pp. CIN.S606 ◽  
Author(s):  
Attila Frigyesi ◽  
Mattias Höglund

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.


2020 ◽  
Author(s):  
Mohammed Samer Shaban ◽  
Christin Müller ◽  
Christin Mayr-Buro ◽  
Hendrik Weiser ◽  
Benadict Vincent Albert ◽  
...  

AbstractCoronaviruses (CoVs) are important human pathogens for which no specific treatment is available. Here, we provide evidence that pharmacological reprogramming of ER stress pathways can be exploited to suppress CoV replication. We found that the ER stress inducer thapsigargin efficiently inhibits coronavirus (HCoV-229E, MERS-CoV, SARS-CoV-2) replication in different cell types, (partially) restores the virus-induced translational shut-down, and counteracts the CoV-mediated downregulation of IRE1α and the ER chaperone BiP. Proteome-wide data sets revealed specific pathways, protein networks and components that likely mediate the thapsigargin-induced antiviral state, including HERPUD1, an essential factor of ER quality control, and ER-associated protein degradation complexes. The data show that thapsigargin hits a central mechanism required for CoV replication, suggesting that thapsigargin (or derivatives thereof) may be developed into broad-spectrum anti-CoV drugs.One Sentence Summary / Running titleSuppression of coronavirus replication through thapsigargin-regulated ER stress, ERQC / ERAD and metabolic pathways


2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


2004 ◽  
Vol 3 (1) ◽  
pp. 1-29 ◽  
Author(s):  
Jörg Rahnenführer ◽  
Francisco S Domingues ◽  
Jochen Maydt ◽  
Thomas Lengauer

We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.


Author(s):  
Keunsoo Kang ◽  
Lothar Hennighausen

AbstractThe signal transducer and activator of transcription (STAT) family is activated by cytokines and conveys biochemical signals to the genome through binding to specific regulatory sequences, called IFN-γ-activated sequence (GAS) motifs. As common GAS motifs (TTCnnnGAA) contain only six conserved nucleotides, the mammalian genome harbors hundreds of thousands of copies of this sequence. However, it is not possible to predict which specific GAS motifs bind to STATs and are of functional significance. Here, we apply several layers of statistical, bioinformatics and experimental analyses to narrow down the number of GAS sites that might be of biological relevance. In particular, we determined the number of bona fide GAS motifs by utilizing publically available genome-wide STAT5 ChIP-seq data sets. Less than 10% of GAS motifs within the mouse genome are recognized by STAT5 in vivo and only a small portion of them are shared across different cell types. However, even bona fide STAT5 binding did not predict that the respective gene was under cytokine-STAT control. Therefore, additional bioinformatics, genomic and epigenetic parameters, such as patterns of histone modifications, are required to more reliably predict the behavior of cytokine-STAT regulatory networks.


Author(s):  
Weiwei Zhang ◽  
Hao Wu ◽  
Ziyi Li

Abstract Motivation It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information. Results We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions, and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real data sets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose. Availability The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] and [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Sara Younes ◽  
Alva Annett ◽  
Patricia Stoll ◽  
Klev Diamanti ◽  
Linda Holmfeldt ◽  
...  

Abstract Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes i) induced by interferons (IFI35 and OTOF), ii) key to SLE cell types (KLRB1 encoding CD161), or iii) with roles in autophagy and NF-κB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification.


2018 ◽  
Vol 115 (12) ◽  
pp. E2734-E2741 ◽  
Author(s):  
Chiahao Tsui ◽  
Carla Inouye ◽  
Michaella Levy ◽  
Andrew Lu ◽  
Laurence Florens ◽  
...  

Eukaryotic gene regulation is a complex process, often coordinated by the action of tens to hundreds of proteins. Although previous biochemical studies have identified many components of the basal machinery and various ancillary factors involved in gene regulation, numerous gene-specific regulators remain undiscovered. To comprehensively survey the proteome directing gene expression at a specific genomic locus of interest, we developed an in vitro nuclease-deficient Cas9 (dCas9)-targeted chromatin-based purification strategy, called “CLASP” (Cas9 locus-associated proteome), to identify and functionally test associated gene-regulatory factors. Our CLASP method, coupled to mass spectrometry and functional screens, can be efficiently adapted for isolating associated regulatory factors in an unbiased manner targeting multiple genomic loci across different cell types. Here, we applied our method to isolate the Drosophila melanogaster histone cluster in S2 cells to identify several factors including Vig and Vig2, two proteins that bind and regulate core histone H2A and H3 mRNA via interaction with their 3′ UTRs.


Author(s):  
Adrian L. Harris ◽  
Margaret Ashcroft

Oxygen is required for most multicellular, aerobic organisms to survive and function. The vasculature provides the conduit for delivering oxygen via haemoglobin in the blood to organs, tissues, and cells. In diseases such as cancer, low tissue oxygenation or hypoxia occurs in solid tumours because of an inadequate supply of oxygen due to aberrant tumour vasculature. Hypoxia is a key feature of most solid tumours and underlies many of the processes associated with how cancer progresses; including tumour cell survival and proliferation, genetic instability, immune responses, angiogenesis, invasion and metastasis, and metabolic adaptive responses. Solid tumours contain several different cell types that respond to hypoxia within the tumour microenvironment. Hypoxia-inducible factors (HIFs) are a highly evolutionarily conserved family of dimeric transcription factors that are central to mediating the cellular response to hypoxia by regulating the expression of a diverse array of targets. Hypoxia and HIF activation is associated with treatment failure, resistance, and poor clinical outcomes. This chapter will provide an overview of the role of hypoxia in cancer, outline the methods used to measure hypoxia clinically, and discuss the impact of hypoxia on current front-line therapies being used to treat cancer.


Sign in / Sign up

Export Citation Format

Share Document