gwasurvivr: an R package for genome wide survival analysis

ABSTRACTSummaryTo address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points. We benchmarked gwasurvivr with other software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). gwasurvivr is significantly faster and shows better scalability as sample size, number of SNPs and number of covariates increases.Availability and implementationgwasurvivr, including source code, documentation, and vignette are available at: http://bioconductor.org/packages/gwasurvivrContactAbbas Rizvi, [email protected]; Lara E Sucheston-Campbell, [email protected] information: Supplementary data are available at https://github.com/suchestoncampbelllab/gwasurvivr_manuscript

Download Full-text

MutSpot: detection of non-coding mutation hotspots in cancer genomes

10.1101/740944 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Amanda Guo ◽

Mei Mei Chang ◽

Anders Jacobsen Skanderup

Keyword(s):

Somatic Mutations ◽

R Package ◽

Supplementary Information ◽

Patient Specific ◽

Supplementary Data ◽

Link Type ◽

Genome Wide ◽

Cancer Genomes ◽

User Friendly ◽

Regulatory Dna

AbstractSummaryRecurrence and clustering of somatic mutations (hotspots) in cancer genomes may indicate positive selection and involvement in tumorigenesis. MutSpot performs genome-wide inference of mutation hotspots in non-coding and regulatory DNA of cancer genomes. MutSpot performs feature selection across hundreds of epigenetic and sequence features followed by estimation of position and patient-specific background somatic mutation probabilities. MutSpot is user-friendly, works on a standard workstation, and scales to thousands of cancer genomes.Availability and implementationMutSpot is implemented as an R package and is available at https://github.com/skandlab/MutSpot/Supplementary informationSupplementary data are available at https://github.com/skandlab/MutSpot/

Download Full-text

ampir: an R package for fast genome-wide prediction of antimicrobial peptides

10.1101/2020.05.07.082412 ◽

2020 ◽

Author(s):

Legana C.H.W Fingerhut ◽

David J. Miller ◽

Jan M. Strugnell ◽

Norelle L. Daly ◽

Ira R. Cooke

Keyword(s):

Antimicrobial Peptides ◽

Pharmaceutical Research ◽

R Package ◽

Supplementary Information ◽

Link Type ◽

Genome Data ◽

Classification Framework ◽

Genome Wide ◽

Genes Encoding ◽

Feature Calculation

AbstractSummaryAntimicrobial peptides (AMPs) are key components of the innate immune system that protect against pathogens, regulate the microbiome, and are promising targets for pharmaceutical research. Computational tools based on machine learning have the potential to aid discovery of genes encoding novel AMPs but existing approaches are not designed for genome-wide scans. To facilitate such genome-wide discovery of AMPs we developed a fast and accurate AMP classification framework, ampir. ampir is designed for high throughput, integrates well with existing bioinformatics pipelines, and has much higher classification accuracy than existing methods when applied to whole genome data.Availability and Implementationampir is implemented primarily in R with core feature calculation methods written in C++. Release versions are available via CRAN and work on all major operating systems. The development version is maintained at https://github.com/legana/[email protected]; [email protected] informationSupplementary data are available at https://github.com/legana/amp_pub

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

CASAS: Cancer Survival Analysis Suite, a web based application

F1000Research ◽

10.12688/f1000research.11830.1 ◽

2017 ◽

Vol 6 ◽

pp. 919 ◽

Cited By ~ 6

Author(s):

Manali Rupji ◽

Xinyan Zhang ◽

Jeanne Kowalski

Keyword(s):

Survival Analysis ◽

Cancer Survival ◽

Univariate Analysis ◽

Multiple Cancer ◽

Web Based ◽

Survival Analyses ◽

Link Type ◽

Kaplan Meier ◽

Hazard Ratios ◽

One Stop

We present CASAS, a shiny R based tool for interactive survival analysis and visualization of results. The tool provides a web-based one stop shop to perform the following types of survival analysis: quantile, landmark and competing risks, in addition to standard survival analysis. The interface makes it easy to perform such survival analyses and obtain results using the interactive Kaplan-Meier and cumulative incidence plots. Univariate analysis can be performed on one or several user specified variable(s) simultaneously, the results of which are displayed in a single table that includes log rank p-values and hazard ratios along with their significance. For several quantile survival analyses from multiple cancer types, a single summary grid is constructed. The CASAS package has been implemented in R and is available via http://shinygispa.winship.emory.edu/CASAS/. The developmental repository is available at https://github.com/manalirupji/CASAS/.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Qtlizer: comprehensive QTL annotation of GWAS results

Scientific Reports ◽

10.1038/s41598-020-75770-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Matthias Munz ◽

Inken Wohlers ◽

Eric Simon ◽

Tobias Reinberger ◽

Hauke Busch ◽

...

Keyword(s):

Association Studies ◽

Housekeeping Genes ◽

R Package ◽

Genome Wide Association Studies ◽

Protein Abundance ◽

Base Pairs ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Distance Limit

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

blupADC: An R package and shiny toolkit for comprehensive genetic data analysis in animal and plant breeding

10.1101/2021.09.09.459557 ◽

2021 ◽

Author(s):

Quanshun Mei ◽

Chuanke Fu ◽

Jieling Li ◽

Shuhong Zhao ◽

Tao Xiang

Keyword(s):

Genetic Analysis ◽

Plant Breeding ◽

Genomic Data ◽

R Package ◽

Genotype Imputation ◽

Supplementary Information ◽

Composition Analysis ◽

Relationship Matrix ◽

Link Type ◽

Plant Breeding Program

AbstractSummaryGenetic analysis is a systematic and complex procedure in animal and plant breeding. With fast development of high-throughput genotyping techniques and algorithms, animal and plant breeding has entered into a genomic era. However, there is a lack of software, which can be used to process comprehensive genetic analyses, in the routine animal and plant breeding program. To make the whole genetic analysis in animal and plant breeding straightforward, we developed a powerful, robust and fast R package that includes genomic data format conversion, genomic data quality control and genotype imputation, breed composition analysis, pedigree tracing, analysis and visualization, pedigree-based and genomic-based relationship matrix construction, and genomic evaluation. In addition, to simplify the application of this package, we also developed a shiny toolkit for users.Availability and implementationblupADC is developed primarily in R with core functions written in C++. The development version is maintained at https://github.com/TXiang-lab/blupADC.Supplementary informationSupplementary data are available online

Download Full-text

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text