PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS

AbstractSummaryElectronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities to use big data for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for high-throughput evaluation of relationships between a set of genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between pairs of a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene-disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. Here, we present a web application to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results in an interactive dashboard, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data.AvailabilityA demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. A sample simulated-dataset is provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer).

Download Full-text

PheWAS-ME: a web-app for interactive exploration of multimorbidity patterns in PheWAS

Bioinformatics ◽

10.1093/bioinformatics/btaa870 ◽

2020 ◽

Author(s):

Nick Strayer ◽

Jana K Shirey-Rice ◽

Yu Shyr ◽

Joshua C Denny ◽

Jill M Pulley ◽

...

Keyword(s):

Genetic Variant ◽

Statistical Tests ◽

R Package ◽

Supplementary Information ◽

Health Records ◽

Individual Level ◽

Level Data ◽

Phenotype Data ◽

Tests Of Association ◽

Web App

Abstract Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene–disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability and implementation A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

AB0210 ACREULAR: AN R PACKAGE FOR THE CALCULATION AND VISUALISATION OF ACR/EULAR RELATED RHEUMATOID ARTHRITIS MEASURES

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.2326 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1405.1-1406

Author(s):

F. Morton ◽

J. Nijjar ◽

C. Goodyear ◽

D. Porter

Keyword(s):

Rheumatoid Arthritis ◽

Functional Status ◽

Rheumatic Diseases ◽

Web Application ◽

R Package ◽

Diagnostic Classification ◽

Microsoft Excel ◽

Link Type ◽

Large Joint ◽

Programming Skills

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared

Download Full-text

Meffil: efficient normalisation and analysis of very large DNA methylation samples

10.1101/125963 ◽

2017 ◽

Cited By ~ 17

Author(s):

Josine Min ◽

Gibran Hemani ◽

George Davey Smith ◽

Caroline Relton ◽

Matthew Suderman

Keyword(s):

Dna Methylation ◽

Association Studies ◽

R Package ◽

Individual Level ◽

Technological Advances ◽

Level Data ◽

Fixed And Random Effects ◽

R Packages ◽

Meta Analyses ◽

Dramatic Growth

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.

Download Full-text

LandScape: a web application for interactive genomic summary visualization

10.1101/866087 ◽

2019 ◽

Author(s):

Wenlong Jia ◽

Hechen Li ◽

Shiying Li ◽

Shuaicheng Li

Keyword(s):

Genetic Information ◽

Web Application ◽

Genomic Research ◽

File Format ◽

Data Types ◽

Web Based ◽

Link Type ◽

Level Data ◽

Real Time Visualization ◽

Information Landscape

ABSTRACTSummaryVisualizing integrated-level data from genomic research remains a challenge, as it requires sufficient coding skills and experience. Here, we present LandScapeoviz, a web-based application for interactive and real-time visualization of summarized genetic information. LandScape utilizes a well-designed file format that is capable of handling various data types, and offers a series of built-in functions to customize the appearance, explore results, and export high-quality diagrams that are available for publication.Availability and implementationLandScape is deployed at bio.oviz.org/demo-project/analyses/landscape for online use. Documentation and demo data are freely available on this website and GitHub (github.com/Nobel-Justin/Oviz-Bio-demo)[email protected]

Download Full-text

ORFhunteR: an accurate approach for the automatic identification and annotation of open reading frames in human mRNA molecules

10.1101/2021.02.05.429963 ◽

2021 ◽

Author(s):

Vasily V. Grinev ◽

Mikalai M. Yatskou ◽

Victor V. Skakun ◽

Maryna K. Chepeleva ◽

Petr V. Nazarov

Keyword(s):

Single Molecule ◽

Web Application ◽

R Package ◽

Nucleotide Sequences ◽

Open Reading Frames ◽

Classification Model ◽

Automatic Identification ◽

Large Set ◽

Link Type ◽

Reading Frames

AbstractMotivationModern methods of whole transcriptome sequencing accurately recover nucleotide sequences of RNA molecules present in cells and allow for determining their quantitative abundances. The coding potential of such molecules can be estimated using open reading frames (ORF) finding algorithms, implemented in a number of software packages. However, these algorithms show somewhat limited accuracy, are intended for single-molecule analysis and do not allow selecting proper ORFs in the case of long mRNAs containing multiple ORF candidates.ResultsWe developed a computational approach, corresponding machine learning model and a package, dedicated to automatic identification of the ORFs in large sets of human mRNA molecules. It is based on vectorization of nucleotide sequences into features, followed by classification using a random forest. The predictive model was validated on sets of human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. The developed methods and pre-trained classification model were implemented in a powerful ORFhunteR computational tool that performs an automatic identification of true ORFs among large set of human mRNA molecules.Availability and implementationThe developed open-source R package ORFhunteR is available for the community at GitHub repository (https://github.com/rfctbio-bsu/ORFhunteR), from Bioconductor (https://bioconductor.org/packages/devel/bioc/html/ORFhunteR.html) and as a web application (http://orfhunter.bsu.by).

Download Full-text

Using the MR-Base platform to investigate risk factors and drug targets for thousands of phenotypes

Wellcome Open Research ◽

10.12688/wellcomeopenres.15334.2 ◽

2019 ◽

Vol 4 ◽

pp. 113 ◽

Cited By ~ 12

Author(s):

Venexia M Walker ◽

Neil M Davies ◽

Gibran Hemani ◽

Jie Zheng ◽

Philip C Haycock ◽

...

Keyword(s):

Risk Factors ◽

Web Application ◽

Drug Targets ◽

Genome Wide Association Study ◽

Mendelian Randomization ◽

Causal Effect ◽

R Package ◽

Sensitivity Analyses ◽

Link Type ◽

Study Results

Mendelian randomization (MR) estimates the causal effect of exposures on outcomes by exploiting genetic variation to address confounding and reverse causation. This method has a broad range of applications, including investigating risk factors and appraising potential targets for intervention. MR-Base has become established as a freely accessible, online platform, which combines a database of complete genome-wide association study results with an interface for performing Mendelian randomization and sensitivity analyses. This allows the user to explore millions of potentially causal associations. MR-Base is available as a web application or as an R package. The technical aspects of the tool have previously been documented in the literature. The present article is complementary to this as it focuses on the applied aspects. Specifically, we describe how MR-Base can be used in several ways, including to perform novel causal analyses, replicate results and enable transparency, amongst others. We also present three use cases, which demonstrate important applications of Mendelian randomization and highlight the benefits of using MR-Base for these types of analyses.

Download Full-text

MetENP/MetENPWeb: An R package and web application for metabolomics enrichment and pathway analysis in Metabolomics Workbench

10.1101/2020.11.20.391912 ◽

2020 ◽

Author(s):

Kumari Sonal Choudhary ◽

Eoin Fahy ◽

Kevin Coakley ◽

Manish Sud ◽

Mano R Maurya ◽

...

Keyword(s):

Pathway Analysis ◽

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Pathway Enrichment Analysis ◽

Pathway Enrichment ◽

Kegg Pathways ◽

Link Type ◽

Species Specific ◽

User Friendly

ABSTRACTWith the advent of high throughput mass spectrometric methods, metabolomics has emerged as an essential area of research in biomedicine with the potential to provide deep biological insights into normal and diseased functions in physiology. However, to achieve the potential offered by metabolomics measures, there is a need for biologist-friendly integrative analysis tools that can transform data into mechanisms that relate to phenotypes. Here, we describe MetENP, an R package, and a user-friendly web application deployed at the Metabolomics Workbench site extending the metabolomics enrichment analysis to include species-specific pathway analysis, pathway enrichment scores, gene-enzyme information, and enzymatic activities of the significantly altered metabolites. MetENP provides a highly customizable workflow through various user-specified options and includes support for all metabolite species with available KEGG pathways. MetENPweb is a web application for calculating metabolite and pathway enrichment analysis.Availability and ImplementationThe MetENP package is freely available from Metabolomics Workbench GitHub: (https://github.com/metabolomicsworkbench/MetENP), the web application, is freely available at (https://www.metabolomicsworkbench.org/data/analyze.php)

Download Full-text

LAB-AID (Laboratory Automated Interrogation of Data): an interactive web application for visualization of multi-level data from biological experiments

10.1101/763318 ◽

2019 ◽

Author(s):

Zrinko Kozic ◽

Sam Booker ◽

Owen Dando ◽

Giles Hardingham ◽

Peter Kind

Keyword(s):

Experimental Data ◽

Web Application ◽

Laboratory Experiments ◽

Statistical Tests ◽

Hierarchical Data ◽

Batch Effects ◽

Considerable Effort ◽

Interactive Tool ◽

Level Data ◽

Multi Level

AbstractA key step in understanding the results of biological experiments is visualization of the data. Many laboratory experiments contain a range of measurements that exist within a hierarchy of interdependence. An automated way to visualise and interrogate experimental data would: 1) lead to improved understanding of the results, 2) help to determine which statistical tests should be performed, and 3) easily identify outliers and sources of batch effects. Unfortunately, existing graphing solutions often demand expertise in programming, require considerable effort to import and examine such multi-level data, or are unnecessarily complex for the task at hand. Here we present LAB-AID (Laboratory Automated Interrogation of Data), an interactive tool specifically designed to automatically visualize and query hierarchical data resulting from biological experiments.

Download Full-text

SCOPIT: sample size calculations for single-cell sequencing experiments

BMC Bioinformatics ◽

10.1186/s12859-019-3167-9 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 6

Author(s):

Alexander Davis ◽

Ruli Gao ◽

Nicholas E. Navin

Keyword(s):

Single Cell ◽

Web Application ◽

Multinomial Distribution ◽

R Package ◽

Cell Type ◽

Single Cell Sequencing ◽

Link Type ◽

Dna And Rna ◽

Sample Size Calculations ◽

Number Of Cells

Abstract Background In single cell DNA and RNA sequencing experiments, the number of cells to sequence must be decided before running an experiment, and afterwards, it is necessary to decide whether sufficient cells were sampled. These questions can be addressed by calculating the probability of sampling at least a defined number of cells from each subpopulation (cell type or cancer clone). Results We developed an interactive web application called SCOPIT (Single-Cell One-sided Probability Interactive Tool), which calculates the required probabilities using a multinomial distribution (www.navinlab.com/SCOPIT). In addition, we created an R package called pmultinom for scripting these calculations. Conclusions Our tool for fast multinomial calculations provide a simple and intuitive procedure for prospectively planning single-cell experiments or retrospectively evaluating if sufficient numbers of cells have been sequenced. The web application can be accessed at navinlab.com/SCOPIT.

Download Full-text

Summix: A method for detecting and adjusting for population structure in genetic summary data

10.1101/2021.02.03.429446 ◽

2021 ◽

Author(s):

IS Arriaga-MacKenzie ◽

G Matesi ◽

S Chen ◽

A Ronco ◽

KM Marker ◽

...

Keyword(s):

Population Structure ◽

South Asian ◽

R Package ◽

Individual Level ◽

Level Data ◽

Causal Variants ◽

High Utility ◽

Ancestry Proportions ◽

Summary Data ◽

Reference Samples

AbstractPublicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.

Download Full-text