scholarly journals ChEA3: transcription factor enrichment analysis by orthogonal omics integration

2019 ◽  
Vol 47 (W1) ◽  
pp. W212-W224 ◽  
Author(s):  
Alexandra B Keenan ◽  
Denis Torre ◽  
Alexander Lachmann ◽  
Ariel K Leong ◽  
Megan L Wojciechowicz ◽  
...  

AbstractIdentifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF–gene co-expression from RNA-seq studies, TF–target associations from ChIP-seq experiments, and TF–gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

2020 ◽  
Author(s):  
Jonathan D. Rubin ◽  
Jacob T. Stanley ◽  
Rutendo F. Sigauke ◽  
Cecilia B. Levandowski ◽  
Zachary L. Maas ◽  
...  

1AbstractDetecting differential activation of transcription factors (TFs) in response to perturbation provides insight into cellular processes. Transcription Factor Enrichment Analysis (TFEA) is a robust and reliable computational method that detects differential activity of hundreds of TFs given any set of perturbation data. TFEA draws inspiration from GSEA and detects positional motif enrichment within a list of ranked regions of interest (ROIs). As ROIs are typically inferred from the data, we also introduce muMerge, a statistically principled method of generating a consensus list of ROIs from multiple replicates and conditions. TFEA is broadly applicable to data that informs on transcriptional regulation including nascent (eg. PRO-Seq), CAGE, ChIP-Seq, and accessibility (e.g. ATAC-Seq). TFEA not only identifies the key regulators responding to a perturbation, but also temporally unravels regulatory networks with time series data. Consequently, TFEA serves as a hypothesis-generating tool that provides an easy, rigorous, and cost-effective means to broadly assess TF activity yielding new biological insights.


2018 ◽  
Author(s):  
Viren Amin ◽  
Murat Can Cobanoglu

AbstractWe present EPEE (Effector and Perturbation Estimation Engine), a method for differential analysis of transcription factor (TF) activity from gene expression data. EPEE addresses two principal challenges in the field, namely incorporating context-specific TF-gene regulatory networks, and accounting for the fact that TF activity inference is intrinsically coupled for all TFs that share targets. Our validations in well-studied immune and cancer contexts show that addressing the overlap challenge and using state-of-the-art regulatory networks enable EPEE to consistently produce accurate results. (Accessible at: https://github.com/Cobanoglu-Lab/EPEE)


2020 ◽  
Vol 10 (10) ◽  
pp. 3675-3686 ◽  
Author(s):  
Sophie A. Harrington ◽  
Anna E. Backhaus ◽  
Ajit Singh ◽  
Keywan Hassani-Pak ◽  
Cristobal Uauy

Gene regulatory networks are powerful tools which facilitate hypothesis generation and candidate gene discovery. However, the extent to which the network predictions are biologically relevant is often unclear. Recently a GENIE3 network which predicted targets of wheat transcription factors was produced. Here we used an independent RNA-Seq dataset to test the predictions of the wheat GENIE3 network for the senescence-regulating transcription factor NAM-A1 (TraesCS6A02G108300). We re-analyzed the RNA-Seq data against the RefSeqv1.0 genome and identified a set of differentially expressed genes (DEGs) between the wild-type and nam-a1 mutant which recapitulated the known role of NAM-A1 in senescence and nutrient remobilisation. We found that the GENIE3-predicted target genes of NAM-A1 overlap significantly with the DEGs, more than would be expected by chance. Based on high levels of overlap between GENIE3-predicted target genes and the DEGs, we identified candidate senescence regulators. We then explored genome-wide trends in the network related to polyploidy and found that only homeologous transcription factors are likely to share predicted targets in common. However, homeologs which vary in expression levels across tissues are less likely to share predicted targets than those that do not, suggesting that they may be more likely to act in distinct pathways. This work demonstrates that the wheat GENIE3 network can provide biologically-relevant predictions of transcription factor targets, which can be used for candidate gene prediction and for global analyses of transcription factor function. The GENIE3 network has now been integrated into the KnetMiner web application, facilitating its use in future studies.


2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


2017 ◽  
Author(s):  
Yijie Wang ◽  
Dong-Yeon Cho ◽  
Hangnoh Lee ◽  
Justin Fear ◽  
Brian Oliver ◽  
...  

AbstractUnderstanding gene regulation is a fundamental step towards understanding of how cells function and respond to environmental cues and perturbations. An important step in this direction is the ability to infer the transcription factor (TF)-gene regulatory network (GRN). However gene regulatory networks are typically constructed disregarding the fact that regulatory programs are conditioned on tissue type, developmental stage, sex, and other factors. Due to lack of the biological context specificity, these context-agnostic networks may not provide insight for revealing the precise actions of genes for a specific biological system under concern. Collecting multitude of features required for a reliable construction of GRNs such as physical features (TF binding, chromatin accessibility) and functional features (correlation of expression or chromatin patterns) for every context of interest is costly. Therefore we need methods that is able to utilize the knowledge about a context-agnostic network (or a network constructed in a related context) for construction of a context specific regulatory network.To address this challenge we developed a computational approach that utilizes expression data obtained in a specific biological context such as a particular development stage, sex, tissue type and a GRN constructed in a different but related context (alternatively an incomplete or a noisy network for the same context) to construct a context specific GRN. Our method, NetREX, is inspired by network component analysis (NCA) that estimates TF activities and their influences on target genes given predetermined topology of a TF-gene network. To predict a network under a different condition, NetREX removes the restriction that the topology of the TF-gene network is fixed and allows for adding and removing edges to that network. To solve the corresponding optimization problem, which is non-convex and non-smooth, we provide a general mathematical framework allowing use of the recently proposed Proximal Alternative Linearized Maximization technique and prove that our formulation has the properties required for convergence.We tested our NetREX on simulated data and subsequently applied it to gene expression data in adult females from 99 hemizygotic lines of the Drosophila deletion (DrosDel) panel. The networks predicted by NetREX showed higher biological consistency than alternative approaches. In addition, we used the list of recently identified targets of the Doublesex (DSX) transcription factor to demonstrate the predictive power of our method.


2019 ◽  
Vol 18 (1) ◽  
Author(s):  
Liu-An Zhuo ◽  
Yi-Tao Wen ◽  
Yong Wang ◽  
Zhi-Fang Liang ◽  
Gang Wu ◽  
...  

Abstract Background Long noncoding RNAs (lncRNAs) are involved in numerous physiological functions. However, their mechanisms in acute myocardial infarction (AMI) are not well understood. Methods We performed an RNA-seq analysis to explore the molecular mechanism of AMI by constructing a lncRNA-miRNA-mRNA axis based on the ceRNA hypothesis. The target microRNA data were used to design a global AMI triple network. Thereafter, a functional enrichment analysis and clustering topological analyses were conducted by using the triple network. The expression of lncRNA SNHG8, SOCS3 and ICAM1 was measured by qRT-PCR. The prognostic values of lncRNA SNHG8, SOCS3 and ICAM1 were evaluated using a receiver operating characteristic (ROC) curve. Results An AMI lncRNA-miRNA-mRNA network was constructed that included two mRNAs, one miRNA and one lncRNA. After RT-PCR validation of lncRNA SNHG8, SOCS3 and ICAM1 between the AMI and normal samples, only lncRNA SNHG8 had significant diagnostic value for further analysis. The ROC curve showed that SNHG8 presented an AUC of 0.850, while the AUC of SOCS3 was 0.633 and that of ICAM1 was 0.594. After a pairwise comparison, we found that SNHG8 was statistically significant (PSNHG8-ICAM1 = 0.002; PSNHG8-SOCS3 = 0.031). The results of a functional enrichment analysis of the interacting genes and microRNAs showed that the shared lncRNA SNHG8 may be a new factor in AMI. Conclusions Our investigation of the lncRNA-miRNA-mRNA regulatory networks in AMI revealed a novel lncRNA, lncRNA SNHG8, as a risk factor for AMI and expanded our understanding of the mechanisms involved in the pathogenesis of AMI.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jeong-Hyun Kang

AbstractThis study aimed to investigate immune-related pathophysiology of the temporomandibular joint (TMJ) osteoarthritis (OA) in young females by analyzing transcriptional profiles of peripheral blood mononuclear cells. The RNA-sequencing (RNA-seq) was conducted on 24 young females with TMJ OA (mean age 19.3 ± 3.1 years) (RNAOA) and 11 age and sex matched healthy controls (mean age 20.5 ± 3.7 years) (CON). RNA-seq datasets were analyzed to identify genes, pathways, and regulatory networks of those which were involved in the development of TMJ OA. RNA-seq data analysis revealed 41 differentially expressed genes (DEGs) between RNAOA and CON. A total of 16 gene ontology (GO) terms including three molecular and 13 biological terms were annotated via the GO function of molecular function and biological process. Through ingenuity pathway analysis (IPA), 21 annotated categories of diseases and functions were identified. There were six hub genes which showed significant results in both GO enrichment analysis and IPA, namely HLA-C, HLA-F, CXCL8, IL11RA, IL13RA1, and FCGR3B. The young females with TMJ OA showed alterations of the genes related to immune function in the blood and some of changes may reflect inflammation, autoimmunity, and abnormal T cell functions.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e15593-e15593
Author(s):  
Huat Chye Lim ◽  
John Dozier Gordan

e15593 Background: HBV replication contributes to HCC initiation and is associated with worse patient outcomes. Prior tumor genomic studies of HBV-positive and -negative (HBV+/-) HCC have used detection of HBV surface antigen (HBsAg) in serum to annotate HBV status. However, a substantial proportion of HBsAg+ patients lack HBV replication in tumor, suggesting a potentially distinct patient subset. In this study, we determined HBV status by measuring tumor HBV RNA, a proxy for active replication. We then investigated HBV RNA+/- association with somatic mutations, gene sets, homologous recombination deficiency (HRD) and tumor mutation burden (TMB). Methods: RNA-Seq data for 371 HCC tumors were obtained from TCGA. Tumors were classified as HBV RNA+ if they harbored more than 1 HBV RNA read per million human reads, as measured using GATK PathSeq software. Associations between HBV RNA status and somatic mutations, gene sets, HRD and TMB were investigated. HRD score was calculated as the sum of 3 independent HRD measures (large scale state transitions, loss of heterozygosity and telomeric allelic imbalance). Results: HBV RNA+ status was associated with a higher rate of nonsynonymous somatic mutations in multiple genes, including the tumor suppressors TP53, CDKN2A, CHD5 and TET1, as well as AXIN2 and the proto-oncogene BCL11A ( p < 0.05 for all), while HBV RNA- status was associated with a higher rate of nonsynonymous mutations in the chromatin modifier BAP1 ( p = 0.03). In gene set enrichment analysis of normalized RNA-Seq expression data, HBV RNA+ status was associated with increased transcription of DNA repair genes, as well as genes upregulated by mTORC1 and MYC (FDR < 0.03 for all). HBV RNA status was also associated with HRD score (22.19 for HBV RNA+ vs. 15.97 for HBV RNA-, p = 1e-6), but not with TMB. A substantial subset of HBV RNA+ patients (33/100) were not annotated as HBV+ in the TCGA clinical database. Conclusions: HBV status based on tumor HBV RNA detection identifies a genetically distinct subset within all HBV-infected HCC patients that is associated with nonsynonymous somatic mutations in several genes and differential transcription of gene sets, some of which have not been previously reported, as well as with HRD score. These findings suggest potential for differential responsiveness to targeted therapies.


2019 ◽  
Vol 35 (23) ◽  
pp. 5018-5029
Author(s):  
Viren Amin ◽  
Didem Ağaç ◽  
Spencer D Barnes ◽  
Murat Can Çobanoğlu

Abstract Motivation Activity of transcriptional regulators is crucial in elucidating the mechanism of phenotypes. However regulatory activity hypotheses are difficult to experimentally test. Therefore, we need accurate and reliable computational methods for regulator activity inference. There is extensive work in this area, however, current methods have difficulty with one or more of the following: resolving activity of TFs with overlapping regulons, reflecting known regulatory relationships, or flexible modeling of TF activity over the regulon. Results We present Effector and Perturbation Estimation Engine (EPEE), a method for differential analysis of transcription factor (TF) activity from gene expression data. EPEE addresses each of these principal challenges in the field. Firstly, EPEE collectively models all TF activity in a single multivariate model, thereby accounting for the intrinsic coupling among TFs that share targets, which is highly frequent. Secondly, EPEE incorporates context-specific TF-gene regulatory networks and therefore adapts the analysis to each biological context. Finally, EPEE can flexibly reflect different regulatory activity of a single TF among its potential targets. This allows the flexibility to implicitly recover other regulatory influences such as co-activators or repressors. We comparatively validated EPEE in 15 datasets from three well-studied contexts, namely immunology, cancer, and hematopoiesis. We show that addressing the aforementioned challenges enable EPEE to outperform alternative methods and reliably produce accurate results. Availability and implementation https://github.com/Cobanoglu-Lab/EPEE. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 47 (W1) ◽  
pp. W571-W577 ◽  
Author(s):  
Alexander Lachmann ◽  
Brian M Schilder ◽  
Megan L Wojciechowicz ◽  
Denis Torre ◽  
Maxim V Kuleshov ◽  
...  

Abstract The frequency by which genes are studied correlates with the prior knowledge accumulated about them. This leads to an imbalance in research attention where some genes are highly investigated while others are ignored. Geneshot is a search engine developed to illuminate this gap and to promote attention to the under-studied genome. Through a simple web interface, Geneshot enables researchers to enter arbitrary search terms, to receive ranked lists of genes relevant to the search terms. Returned ranked gene lists contain genes that were previously published in association with the search terms, as well as genes predicted to be associated with the terms based on data integration from multiple sources. The search results are presented with interactive visualizations. To predict gene function, Geneshot utilizes gene–gene similarity matrices from processed RNA-seq data, or from gene–gene co-occurrence data obtained from multiple sources. In addition, Geneshot can be used to analyze the novelty of gene sets and augment gene sets with additional relevant genes. The Geneshot web-server and API are freely and openly available from https://amp.pharm.mssm.edu/geneshot.


Sign in / Sign up

Export Citation Format

Share Document