TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Laura Puente-Santamaria; Wyeth W Wasserman; Luis del Peso

doi:10.1093/bioinformatics/btz573

TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Bioinformatics ◽

10.1093/bioinformatics/btz573 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5339-5340 ◽

Cited By ~ 8

Author(s):

Laura Puente-Santamaria ◽

Wyeth W Wasserman ◽

Luis del Peso

Keyword(s):

Genomic Analysis ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Web Based ◽

Factor Binding Site ◽

Gene Sets ◽

Transcription Regulators ◽

Computational Identification ◽

On Chip

Abstract Summary The computational identification of the transcription factors (TFs) [more generally, transcription regulators, (TR)] responsible for the co-regulation of a specific set of genes is a common problem found in genomic analysis. Herein, we describe TFEA.ChIP, a tool that makes use of ChIP-seq datasets to estimate and visualize TR enrichment in gene lists representing transcriptional profiles. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TR was correctly identified in 126 of a total of 174 analyzed. Comparison with other TR enrichment tools demonstrates that TFEA.ChIP is an highly customizable package with an outstanding performance. Availability and implementation TFEA.ChIP is implemented as an R package available at Bioconductor https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github https://github.com/LauraPS1/TFEA.ChIP_downloads. A web-based GUI to the package is also available at https://www.iib.uam.es/TFEA.ChIP/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

WormExp: a web-based application for a Caenorhabditis elegans-specific gene expression enrichment analysis

Bioinformatics ◽

10.1093/bioinformatics/btv667 ◽

2015 ◽

Vol 32 (6) ◽

pp. 943-945 ◽

Cited By ~ 35

Author(s):

Wentao Yang ◽

Katja Dierking ◽

Hinrich Schulenburg

Keyword(s):

Gene Expression ◽

Caenorhabditis Elegans ◽

Enrichment Analysis ◽

Supplementary Information ◽

Specific Gene ◽

Data Sets ◽

Complete Collection ◽

Web Based ◽

Expression Of Genes ◽

Gene Sets

Abstract Motivation: A particular challenge of the current omics age is to make sense of the inferred differential expression of genes and proteins. The most common approach is to perform a gene ontology (GO) enrichment analysis, thereby relying on a database that has been extracted from a variety of organisms and that can therefore only yield reliable information on evolutionary conserved functions. Results: We here present a web-based application for a taxon-specific gene set exploration and enrichment analysis, which is expected to yield novel functional insights into newly determined gene sets. The approach is based on the complete collection of curated high-throughput gene expression data sets for the model nematode Caenorhabditis elegans, including 1786 gene sets from more than 350 studies. Availability and implementation: WormExp is available at http://wormexp.zoologie.uni-kiel.de. Contacts: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

NoRCE: non-coding RNA sets cis enrichment tool

BMC Bioinformatics ◽

10.1186/s12859-021-04112-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gulden Olgun ◽

Afshan Nabi ◽

Oznur Tastan

Keyword(s):

Expression Patterns ◽

Target Prediction ◽

Enrichment Analysis ◽

Fruit Fly ◽

Relevant Information ◽

R Package ◽

Data Repository ◽

Biologically Relevant ◽

Gene Sets ◽

Data Files

Abstract Background While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. Results We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. Conclusions NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users’ preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE.

Download Full-text

Lipid Mini-On: mining and ontology tool for enrichment analysis of lipidomic data

Bioinformatics ◽

10.1093/bioinformatics/btz250 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4507-4508 ◽

Cited By ~ 9

Author(s):

Geremy Clair ◽

Sarah Reehl ◽

Kelly G Stratton ◽

Matthew E Monroe ◽

Malak M Tfaily ◽

...

Keyword(s):

Peat Soil ◽

Enrichment Analysis ◽

R Package ◽

Lipid Classes ◽

Supplementary Information ◽

Mass Spec ◽

Shiny App ◽

Lung Endothelial Cells ◽

Lipid Enrichment ◽

The Individual

Abstract Summary Here we introduce Lipid Mini-On, an open-source tool that performs lipid enrichment analyses and visualizations of lipidomics data. Lipid Mini-On uses a text-mining process to bin individual lipid names into multiple lipid ontology groups based on the classification (e.g. LipidMaps) and other characteristics, such as chain length. Lipid Mini-On provides users with the capability to conduct enrichment analysis of the lipid ontology terms using a Shiny app with options of five statistical approaches. Lipid classes can be added to customize the user’s database and remain updated as new lipid classes are discovered. Visualization of results is available for all classification options (e.g. lipid subclass and individual fatty acid chains). Results are also visualized through an editable network of relationships between the individual lipids and their associated lipid ontology terms. The utility of the tool is demonstrated using biological (e.g. human lung endothelial cells) and environmental (e.g. peat soil) samples. Availability and implementation Rodin (R package: https://github.com/PNNL-Comp-Mass-Spec/Rodin), Lipid Mini-On Shiny app (https://github.com/PNNL-Comp-Mass-Spec/LipidMiniOn) and Lipid Mini-On online tool (https://omicstools.pnnl.gov/shiny/lipid-mini-on/). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TF2LncRNA: Identifying Common Transcription Factors for a List of lncRNA Genes from ChIP-Seq Data

BioMed Research International ◽

10.1155/2014/317642 ◽

2014 ◽

Vol 2014 ◽

pp. 1-5 ◽

Cited By ~ 27

Author(s):

Qinghua Jiang ◽

Jixuan Wang ◽

Yadong Wang ◽

Rui Ma ◽

Xiaoliang Wu ◽

...

Keyword(s):

Transcription Factors ◽

Cell Line ◽

Binding Sites ◽

Regulatory Region ◽

Specific Cell ◽

Web Based ◽

Genomic Technologies ◽

Number Of Binding Sites ◽

Transcription Regulators ◽

On Chip

High-throughput genomic technologies like lncRNA microarray and RNA-Seq often generate a set of lncRNAs of interest, yet little is known about the transcriptional regulation of the set of lncRNA genes. Here, based on ChIP-Seq peak lists of transcription factors (TFs) from ENCODE and annotated human lncRNAs from GENCODE, we developed a web-based interface titled “TF2lncRNA,” where TF peaks from each ChIP-Seq experiment are crossed with the genomic coordinates of a set of input lncRNAs, to identify which TFs present a statistically significant number of binding sites (peaks) within the regulatory region of the input lncRNA genes. The input can be a set of coexpressed lncRNA genes or any other cluster of lncRNA genes. Users can thus infer which TFs are likely to be common transcription regulators of the set of lncRNAs. In addition, users can retrieve all lncRNAs potentially regulated by a specific TF in a specific cell line of interest or retrieve all TFs that have one or more binding sites in the regulatory region of a given lncRNA in the specific cell line. TF2LncRNA is an efficient and easy-to-use web-based tool.

Download Full-text

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets

10.1101/2021.06.29.450324 ◽

2021 ◽

Author(s):

Alejandro Cisterna García ◽

Aurora González-Vidal ◽

Daniel Ruiz Villa ◽

Jordi Ortiz Murillo ◽

Alicia Gómez-Pascual ◽

...

Keyword(s):

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Web Interface ◽

Gene Set ◽

New Genes ◽

Gene Sets ◽

Phenotype Analysis ◽

New Gene ◽

Early Onset Parkinson’S Disease

Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.

Download Full-text

simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/bty898 ◽

2018 ◽

Vol 35 (11) ◽

pp. 1901-1906 ◽

Cited By ~ 4

Author(s):

Mary D Fortune ◽

Chris Wallace

Keyword(s):

Large Scale ◽

Simulated Data ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Supplementary Information ◽

Intermediate Step ◽

Fast Method ◽

Summary Statistics ◽

Causal Variants

Abstract Motivation Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. Results We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. Availability and implementation Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

simGWAS: a fast method for simulation of large scale case-control GWAS summarystatistics

10.1101/313023 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mary D. Fortune ◽

Chris Wallace

Keyword(s):

Large Scale ◽

Simulated Data ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Supplementary Information ◽

Intermediate Step ◽

Fast Method ◽

Summary Statistics ◽

Causal Variants

AbstractMotivationMethods for analysis of GWAS summary statistics have encouraged data sharing and democratised the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some “truth” is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study.ResultsWe have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.Availability and ImplementationOur method is available under a GPL license as an R package from http://github.com/chr1swallace/[email protected] InformationSupplementary Information is appended.

Download Full-text

ClusterMine: a Knowledge-integrated Clustering Approach based on Expression Profiles of Gene Sets

10.1101/255711 ◽

2018 ◽

Author(s):

Hong-Dong Li ◽

Yunpei Xu ◽

Xiaoshu Zhu ◽

Quan Liu ◽

Gilbert S. Omenn ◽

...

Keyword(s):

Expression Profiles ◽

R Package ◽

Biological Data ◽

Supplementary Information ◽

Consensus Clustering ◽

Cluster Membership ◽

Link Type ◽

Novel Approach ◽

Gene Sets ◽

Biological Interpretation

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations

Bioinformatics ◽

10.1093/bioinformatics/btaa957 ◽

2020 ◽

Author(s):

Sisheng Liu ◽

Jinpeng Liu ◽

Yanqi Xie ◽

Tingting Zhai ◽

Eugene W Hinderer ◽

...

Keyword(s):

Mutation Rate ◽

De Novo ◽

R Package ◽

Supplementary Information ◽

Driver Mutations ◽

Mutual Exclusivity ◽

Statistical Framework ◽

Gene Sets ◽

Genome Wide ◽

Background Mutation Rate

Abstract Motivation Cancer somatic driver mutations associated with genes within a pathway often show a mutually exclusive pattern across a cohort of patients. This mutually exclusive mutational signal has been frequently used to distinguish driver from passenger mutations and to investigate relationships among driver mutations. Current methods for de novo discovery of mutually exclusive mutational patterns are limited because the heterogeneity in background mutation rate can confound mutational patterns, and the presence of highly mutated genes can lead to spurious patterns. In addition, most methods only focus on a limited number of pre-selected genes and are unable to perform genome-wide analysis due to computational inefficiency. Results We introduce a statistical framework, MEScan, for accurate and efficient mutual exclusivity analysis at the genomic scale. Our framework contains a fast and powerful statistical test for mutual exclusivity with adjustment of the background mutation rate and impact of highly mutated genes, and a multi-step procedure for genome-wide screening with the control of false discovery rate. We demonstrate that MEScan more accurately identifies mutually exclusive gene sets than existing methods and is at least two orders of magnitude faster than most methods. By applying MEScan to data from four different cancer types and pan-cancer, we have identified several biologically meaningful mutually exclusive gene sets. Availability and implementation MEScan is available as an R package at https://github.com/MarkeyBBSRF/MEScan. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

netGO: R-Shiny package for network-integrated pathway enrichment analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa077 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3283-3285

Author(s):

Jinhwan Kim ◽

Sora Yoon ◽

Dougu Nam

Keyword(s):

Target Genes ◽

Genome Wide Association Study ◽

Enrichment Analysis ◽

Supplementary Information ◽

Pathway Enrichment Analysis ◽

Pathway Enrichment ◽

Pathway Gene ◽

Gene Sets ◽

R Shiny ◽

Integrated Pathway

Abstract Summary We present an R-Shiny package, netGO, for novel network-integrated pathway enrichment analysis. The conventional Fisher’s exact test (FET) considers the extent of overlap between target genes and pathway gene-sets, while recent network-based analysis tools consider only network interactions between the two. netGO implements an intuitive framework to integrate both the overlap and networks into a single score, and adaptively resamples genes based on network degrees to assess the pathway enrichment. In benchmark tests for gene expression and genome-wide association study (GWAS) data, netGO captured the relevant gene-sets better than existing tools, especially when analyzing a small number of genes. Specifically, netGO provides user-interactive visualization of the target genes, enriched gene-set and their network interactions for both netGO and FET results for further analysis. For this visualization, we also developed a standalone R-Shiny package shinyCyJS to connect R-shiny and the JavaScript version of cytoscape. Availability and implementation netGO R-Shiny package is freely available from github, https://github.com/unistbig/netGO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text