Lipid Mini-On: mining and ontology tool for enrichment analysis of lipidomic data

Geremy Clair; Sarah Reehl; Kelly G Stratton; Matthew E Monroe; Malak M Tfaily; Charles Ansong; Jennifer E Kyle

doi:10.1093/bioinformatics/btz250

Lipid Mini-On: mining and ontology tool for enrichment analysis of lipidomic data

Bioinformatics ◽

10.1093/bioinformatics/btz250 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4507-4508 ◽

Cited By ~ 9

Author(s):

Geremy Clair ◽

Sarah Reehl ◽

Kelly G Stratton ◽

Matthew E Monroe ◽

Malak M Tfaily ◽

...

Keyword(s):

Peat Soil ◽

Enrichment Analysis ◽

R Package ◽

Lipid Classes ◽

Supplementary Information ◽

Mass Spec ◽

Shiny App ◽

Lung Endothelial Cells ◽

Lipid Enrichment ◽

The Individual

Abstract Summary Here we introduce Lipid Mini-On, an open-source tool that performs lipid enrichment analyses and visualizations of lipidomics data. Lipid Mini-On uses a text-mining process to bin individual lipid names into multiple lipid ontology groups based on the classification (e.g. LipidMaps) and other characteristics, such as chain length. Lipid Mini-On provides users with the capability to conduct enrichment analysis of the lipid ontology terms using a Shiny app with options of five statistical approaches. Lipid classes can be added to customize the user’s database and remain updated as new lipid classes are discovered. Visualization of results is available for all classification options (e.g. lipid subclass and individual fatty acid chains). Results are also visualized through an editable network of relationships between the individual lipids and their associated lipid ontology terms. The utility of the tool is demonstrated using biological (e.g. human lung endothelial cells) and environmental (e.g. peat soil) samples. Availability and implementation Rodin (R package: https://github.com/PNNL-Comp-Mass-Spec/Rodin), Lipid Mini-On Shiny app (https://github.com/PNNL-Comp-Mass-Spec/LipidMiniOn) and Lipid Mini-On online tool (https://omicstools.pnnl.gov/shiny/lipid-mini-on/). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Bioinformatics ◽

10.1093/bioinformatics/btz573 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5339-5340 ◽

Cited By ~ 8

Author(s):

Laura Puente-Santamaria ◽

Wyeth W Wasserman ◽

Luis del Peso

Keyword(s):

Genomic Analysis ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Web Based ◽

Factor Binding Site ◽

Gene Sets ◽

Transcription Regulators ◽

Computational Identification ◽

On Chip

Abstract Summary The computational identification of the transcription factors (TFs) [more generally, transcription regulators, (TR)] responsible for the co-regulation of a specific set of genes is a common problem found in genomic analysis. Herein, we describe TFEA.ChIP, a tool that makes use of ChIP-seq datasets to estimate and visualize TR enrichment in gene lists representing transcriptional profiles. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TR was correctly identified in 126 of a total of 174 analyzed. Comparison with other TR enrichment tools demonstrates that TFEA.ChIP is an highly customizable package with an outstanding performance. Availability and implementation TFEA.ChIP is implemented as an R package available at Bioconductor https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github https://github.com/LauraPS1/TFEA.ChIP_downloads. A web-based GUI to the package is also available at https://www.iib.uam.es/TFEA.ChIP/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DiffNetFDR: differential network analysis with false discovery rate control

Bioinformatics ◽

10.1093/bioinformatics/btz051 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3184-3186

Author(s):

Xiao-Fei Zhang ◽

Le Ou-Yang ◽

Shuo Yang ◽

Xiaohua Hu ◽

Hong Yan

Keyword(s):

False Discovery Rate ◽

Graphical Models ◽

Biological Significance ◽

R Package ◽

Supplementary Information ◽

Gaussian Graphical Models ◽

Multiple Testing Procedure ◽

False Discovery ◽

Differential Network ◽

Shiny App

Abstract Summary To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. Availability and implementation R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EnImpute: imputing dropout events in single-cell RNA-sequencing data via ensemble learning

Bioinformatics ◽

10.1093/bioinformatics/btz435 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4827-4829 ◽

Cited By ~ 6

Author(s):

Xiao-Fei Zhang ◽

Le Ou-Yang ◽

Shuo Yang ◽

Xing-Ming Zhao ◽

Xiaohua Hu ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Ensemble Learning ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

The Individual ◽

Downstream Analysis ◽

Shiny Application

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics

Bioinformatics ◽

10.1093/bioinformatics/bty898 ◽

2018 ◽

Vol 35 (11) ◽

pp. 1901-1906 ◽

Cited By ~ 4

Author(s):

Mary D Fortune ◽

Chris Wallace

Keyword(s):

Large Scale ◽

Simulated Data ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Supplementary Information ◽

Intermediate Step ◽

Fast Method ◽

Summary Statistics ◽

Causal Variants

Abstract Motivation Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. Results We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. Availability and implementation Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

simGWAS: a fast method for simulation of large scale case-control GWAS summarystatistics

10.1101/313023 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mary D. Fortune ◽

Chris Wallace

Keyword(s):

Large Scale ◽

Simulated Data ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Supplementary Information ◽

Intermediate Step ◽

Fast Method ◽

Summary Statistics ◽

Causal Variants

AbstractMotivationMethods for analysis of GWAS summary statistics have encouraged data sharing and democratised the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some “truth” is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study.ResultsWe have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.Availability and ImplementationOur method is available under a GPL license as an R package from http://github.com/chr1swallace/[email protected] InformationSupplementary Information is appended.

Download Full-text

Varanto: variant enrichment analysis and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz046 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3154-3156 ◽

Cited By ~ 1

Author(s):

Oskari Timonen ◽

Mikko Särkkä ◽

Tibor Fülöp ◽

Anton Mattsson ◽

Juha Kekäläinen ◽

...

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Genetic Variations ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Shiny App ◽

Specific Trait ◽

Diverse Data

Abstract Summary Genome-wide association studies (GWAS) aim to identify associations of genetic variations such as single-nucleotide polymorphisms (SNPs) to a specific trait or a disease. Identifying common themes such as pathways, biological processes and diseases associations is needed to further explore and interpret these results. Varanto is a novel web tool for annotating, visualizing and analyzing human genetic variations using diverse data sources. Varanto can be used to query a set of input variations, retrieve their associated variation and gene level annotations, perform annotation enrichment analysis and visualize the results. Availability and implementation Varanto web app is developed with R and implemented as Shiny app with PostgreSQL database and is freely available at http://bioinformatics.uef.fi/varanto. Source code for the tool is available at https://github.com/oqe/varanto. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

deTS: tissue-specific enrichment analysis to decode tissue specificity

Bioinformatics ◽

10.1093/bioinformatics/btz138 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3842-3845 ◽

Cited By ~ 8

Author(s):

Guangsheng Pei ◽

Yulin Dai ◽

Zhongming Zhao ◽

Peilin Jia

Keyword(s):

Expression Profiles ◽

Association Studies ◽

Gene Expression Profiles ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Genome Wide ◽

Specific Regulation

Abstract Motivation Diseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles. Results We present deTS, an R package, to conduct tissue-specific enrichment analysis with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association studies data and cancer expression data showed that deTS could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies. Availability and implementation https://github.com/bsml320/deTS and CRAN https://cran.r-project.org/web/packages/deTS/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Transcriptogramer: an R/Bioconductor package for transcriptional analysis based on protein–protein interaction

Bioinformatics ◽

10.1093/bioinformatics/btz007 ◽

2019 ◽

Vol 35 (16) ◽

pp. 2875-2876 ◽

Cited By ~ 2

Author(s):

Diego A A Morais ◽

Rita M C Almeida ◽

Rodrigo J S Dalmolin

Keyword(s):

Differential Expression ◽

Protein Interaction ◽

Topological Analysis ◽

Gene List ◽

Enrichment Analysis ◽

R Package ◽

Transcriptional Analysis ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Data Platform

Abstract Motivation Several freely available tools perform analysis using algorithms developed to identify significant variation of gene expression individually. The transcriptogramer R package uses protein–protein interaction to perform differential expression of functionally associated genes. The software assesses expression profile of entire genetic systems and reveals which biological systems are significantly altered in case-control designed transcriptome experiments. Results R/Bioconductor transcriptogramer package projects expression values on an ordered gene list to perform topological analysis, differential expression and gene ontology enrichment analysis, independently of data platform or operating system. Availability and implementation http://bioconductor.org/packages/transcriptogramer. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Conditional canonical correlation estimation based on covariates with random forests

Bioinformatics ◽

10.1093/bioinformatics/btab158 ◽

2021 ◽

Author(s):

Cansu Alakuş ◽

Denis Larocque ◽

Sébastien Jacquemont ◽

Fanny Barlaam ◽

Charles-Olivier Martin ◽

...

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

R Package ◽

Significance Test ◽

Supplementary Information ◽

Canonical Correlations ◽

Correlation Estimation ◽

The Individual ◽

Individual Trees

Abstract Motivation Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. Results We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. Availability and implementation RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text