scholarly journals blupADC: An R package and shiny toolkit for comprehensive genetic data analysis in animal and plant breeding

2021 ◽  
Author(s):  
Quanshun Mei ◽  
Chuanke Fu ◽  
Jieling Li ◽  
Shuhong Zhao ◽  
Tao Xiang

AbstractSummaryGenetic analysis is a systematic and complex procedure in animal and plant breeding. With fast development of high-throughput genotyping techniques and algorithms, animal and plant breeding has entered into a genomic era. However, there is a lack of software, which can be used to process comprehensive genetic analyses, in the routine animal and plant breeding program. To make the whole genetic analysis in animal and plant breeding straightforward, we developed a powerful, robust and fast R package that includes genomic data format conversion, genomic data quality control and genotype imputation, breed composition analysis, pedigree tracing, analysis and visualization, pedigree-based and genomic-based relationship matrix construction, and genomic evaluation. In addition, to simplify the application of this package, we also developed a shiny toolkit for users.Availability and implementationblupADC is developed primarily in R with core functions written in C++. The development version is maintained at https://github.com/TXiang-lab/blupADC.Supplementary informationSupplementary data are available online

2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


2018 ◽  
Author(s):  
Abbas A Rizvi ◽  
Ezgi Karaesmen ◽  
Martin Morgan ◽  
Leah Preus ◽  
Junke Wang ◽  
...  

ABSTRACTSummaryTo address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points. We benchmarked gwasurvivr with other software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). gwasurvivr is significantly faster and shows better scalability as sample size, number of SNPs and number of covariates increases.Availability and implementationgwasurvivr, including source code, documentation, and vignette are available at: http://bioconductor.org/packages/gwasurvivrContactAbbas Rizvi, [email protected]; Lara E Sucheston-Campbell, [email protected] information: Supplementary data are available at https://github.com/suchestoncampbelllab/gwasurvivr_manuscript


Author(s):  
Grégoire Versmée ◽  
Laura Versmée ◽  
Mikaël Dusenne ◽  
Niloofar Jalali ◽  
Paul Avillach

Abstract Summary Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1,000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP. Availability and implementation dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Florian Privé ◽  
Hugues Aschard ◽  
Michael G.B. Blum

AbstractMotivation:Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses. Specialized software for every part of the analysis pipeline have been developed to handle large genomic data. However, combining all these software into a single data analysis pipeline might be technically difficult.Results:Here we present two R packages, bigstatsr and bigsnpr, allowing for management and analysis of large scale genomic data to be performed within a single comprehensive framework. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement a fast derivation of Principal Component Analysis, functions to remove SNPs in Linkage Disequilibrium, and algorithms to learn Polygenic Risk Scores on millions of SNPs. We illustrate applications of the two R packages by analysing a case-control genomic dataset for the celiac disease, performing an association study and computing Polygenic Risk Scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500,000 individuals and 1 million markers on a single desktop computer.Availability:https://privefl.github.io/bigstatsr/ & https://privefl.github.io/bigsnpr/Contact:[email protected] & [email protected] information:Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Theodore G. Drivas ◽  
Anastasia Lucas ◽  
Marylyn D. Ritchie

SummaryGenomic studies increasingly integrate expression quantitative trait loci (eQTL) information into their analysis pipelines, but few tools exist for the visualization of colocalization between eQTL and GWAS results. To address this issue, we developed the intuitive R package eQTpLot, which takes as input GWAS and eQTL summary statistics to generate a series of plots visualizing colocalization, correlation, and enrichment between eQTL and GWAS signals for a given gene-trait pair. We believe eQTpLot will prove a useful tool for investigators seeking a convenient and customizable visualization of genomic data colocalization.Availability and Implementationthe eQTpLot R package and tutorial are available at https://github.com/RitchieLab/[email protected]


2018 ◽  
Author(s):  
Hong-Dong Li ◽  
Yunpei Xu ◽  
Xiaoshu Zhu ◽  
Quan Liu ◽  
Gilbert S. Omenn ◽  
...  

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Josie Hayes ◽  
William M. B. Edmands ◽  
Yukiko Yano ◽  
Hasmik Grigoryan ◽  
Courtney Schiffman ◽  
...  

ABSTRACTSummaryLiquid chromatography-high resolution mass spectrometry (LC-HRMS) has been used to establish a method, referred to as ‘adductomics’, for characterisation of putative protein adducts at selected loci in human serum albumin (HSA). Applications of this method have been limited by the lack of software for untargeted analysis of modified peptides in protein digests. Here we present adductomicsR, an open-source R package for processing LC-HRMS data from analysis of adducted HSA peptides. The software interrogates mass spectra to correct for retention-time drift, and to discover and quantify putative adducts along with those for a housekeeping peptide and internal standard.Availability and implementationadductomicsR is written in R and publicly available at https://github.com/JosieLHayes/adductomicsR, which includes a vignette with example data.Supplementary informationmzXML files for the vignette and test dataset are available in an associated data package adductData (https://github.com/JosieLHayes/adductData)[email protected] SectionAPPLICATIONS NOTE


2017 ◽  
Author(s):  
Shannon E. Ellis ◽  
Leonardo Collado-Torres ◽  
Jeffrey T. Leek

AbstractBackgroundPublicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions.ResultsWe develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using, well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70,000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project (https://jhubiostatistics.shinyapps.io/recount/). We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package (https://github.com/leekgroup/phenopredict) and the predictions for recount2 are available from the recount R package (https://bioconductor.org/packages/release/bioc/html/recount.html)ConclusionHaving leveraging massive public data sets to generate a well-phenotyped set of expression data for more than 70,000 human samples, expression data is available for use on a scale that was not previously feasible.


2019 ◽  
Author(s):  
Yu Amanda Guo ◽  
Mei Mei Chang ◽  
Anders Jacobsen Skanderup

AbstractSummaryRecurrence and clustering of somatic mutations (hotspots) in cancer genomes may indicate positive selection and involvement in tumorigenesis. MutSpot performs genome-wide inference of mutation hotspots in non-coding and regulatory DNA of cancer genomes. MutSpot performs feature selection across hundreds of epigenetic and sequence features followed by estimation of position and patient-specific background somatic mutation probabilities. MutSpot is user-friendly, works on a standard workstation, and scales to thousands of cancer genomes.Availability and implementationMutSpot is implemented as an R package and is available at https://github.com/skandlab/MutSpot/Supplementary informationSupplementary data are available at https://github.com/skandlab/MutSpot/


2021 ◽  
Author(s):  
Isaac Fink ◽  
Richard J. Abdill ◽  
Ran Blekhman ◽  
Laura Grieneisen

AbstractSummaryA key aspect of microbiome research is analysis of longitudinal dynamics using time series data. A method to visualize both the proportional and absolute change in the abundance of multiple taxa across multiple subjects over time is needed. We developed BiomeHorizon, an open-source R package that visualizes longitudinal compositional microbiome data using horizon plots.Availability and ImplementationBiomeHorizon is available at https://github.com/blekhmanlab/biomehorizon/ and released under the MIT license. A guide with step-by-step instructions for using the package is provided at https://blekhmanlab.github.io/biomehorizon/. The guide also provides code to reproduce all plots in this [email protected], [email protected], [email protected] informationNone


Sign in / Sign up

Export Citation Format

Share Document