scholarly journals decorate: differential epigenetic correlation test

2020 ◽  
Vol 36 (9) ◽  
pp. 2856-2861
Author(s):  
Gabriel E Hoffman ◽  
Jaroslav Bendl ◽  
Kiran Girdhar ◽  
Panos Roussos

Abstract Motivation Identifying correlated epigenetic features and finding differences in correlation between individuals with disease compared to controls can give novel insight into disease biology. This framework has been successful in analysis of gene expression data, but application to epigenetic data has been limited by the computational cost, lack of scalable software and lack of robust statistical tests. Results Decorate, differential epigenetic correlation test, identifies correlated epigenetic features and finds clusters of features that are differentially correlated between two or more subsets of the data. The software scales to genome-wide datasets of epigenetic assays on hundreds of individuals. We apply decorate to four large-scale datasets of DNA methylation, ATAC-seq and histone modification ChIP-seq. Availability and implementation decorate R package is available from https://github.com/GabrielHoffman/decorate. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 36 (3) ◽  
pp. 773-781 ◽  
Author(s):  
Hannah De los Santos ◽  
Emily J Collins ◽  
Catherine Mann ◽  
April W Sagan ◽  
Meaghan S Jankowski ◽  
...  

Abstract Motivation Time courses utilizing genome scale data are a common approach to identifying the biological pathways that are controlled by the circadian clock, an important regulator of organismal fitness. However, the methods used to detect circadian oscillations in these datasets are not able to accommodate changes in the amplitude of the oscillations over time, leading to an underestimation of the impact of the clock on biological systems. Results We have created a program to efficaciously identify oscillations in large-scale datasets, called the Extended Circadian Harmonic Oscillator application, or ECHO. ECHO utilizes an extended solution of the fixed amplitude oscillator that incorporates the amplitude change coefficient. Employing synthetic datasets, we determined that ECHO outperforms existing methods in detecting rhythms with decreasing oscillation amplitudes and in recovering phase shift. Rhythms with changing amplitudes identified from published biological datasets revealed distinct functions from those oscillations that were harmonic, suggesting purposeful biologic regulation to create this subtype of circadian rhythms. Availability and implementation ECHO’s full interface is available at https://github.com/delosh653/ECHO. An R package for this functionality, echo.find, can be downloaded at https://CRAN.R-project.org/package=echo.find. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Debajyoti Sinha ◽  
Pradyumn Sinha ◽  
Ritwik Saha ◽  
Sanghamitra Bandyopadhyay ◽  
Debarka Sengupta

Abstract Summary DropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. Here we present the improved dropClust, a complete R package that is, fast, interoperable and minimally resource intensive. The new dropClust features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets. Availability and implementation dropClust is freely available at https://github.com/debsin/dropClust as an R package. A lightweight online version of the dropClust is available at https://debsinha.shinyapps.io/dropClust/. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Patryk Orzechowski ◽  
Artur Pańszczyk ◽  
Xiuzhen Huang ◽  
Jason H. Moore

AbstractMotivationBiclustering (called also co-clustering) is an unsupervised technique of simultaneous analysis of rows and columns of input matrix. From the first application to gene expression data, multiple algorithms have been proposed. Only a handful of them were able to provide accurate results and were fast enough to be suitable for large-scale genomic datasets.ResultsIn this paper we introduce a Bioconductor package with parallel version of UniBic biclustering algorithm: one of the most accurate biclustering methods that have been developed so far. For the convenience of usage, we have wrapped the algorithm in an R package called runibic. The package includes: (1) a couple of times faster parallel version of the original sequential algorithm,(2) muchmore efficient memory management, (3) modularity which allows to build new methods on top of the provided one, (4) integration with the modern Bioconductor packages such as SummarizedExperiment, ExpressionSetand biclust.AvailabilityThe package is implemented in R (3.4) and will be available in the new release of Bioconductor (3.6). Currently it could be downloaded from the following URL: http://github.com/athril/runibic/[email protected], [email protected] informationSupplementary informations are available in vignette of the package.


2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 11 (8) ◽  
pp. 2078-2098 ◽  
Author(s):  
Shu-Ye Jiang ◽  
Jingjing Jin ◽  
Rajani Sarojam ◽  
Srinivasan Ramachandran

Abstract Terpenes are organic compounds and play important roles in plant growth and development as well as in mediating interactions of plants with the environment. Terpene synthases (TPSs) are the key enzymes responsible for the biosynthesis of terpenes. Although some species were employed for the genome-wide identification and characterization of the TPS family, limited information is available regarding the evolution, expansion, and retention mechanisms occurring in this gene family. We performed a genome-wide identification of the TPS family members in 50 sequenced genomes. Additionally, we also characterized the TPS family from aromatic spearmint and basil plants using RNA-Seq data. No TPSs were identified in algae genomes but the remaining plant species encoded various numbers of the family members ranging from 2 to 79 full-length TPSs. Some species showed lineage-specific expansion of certain subfamilies, which might have contributed toward species or ecotype divergence or environmental adaptation. A large-scale family expansion was observed mainly in dicot and monocot plants, which was accompanied by frequent domain loss. Both tandem and segmental duplication significantly contributed toward family expansion and expression divergence and played important roles in the survival of these expanded genes. Our data provide new insight into the TPS family expansion and evolution and suggest that TPSs might have originated from isoprenyl diphosphate synthase genes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Xi Chen ◽  
Jinghua Gu ◽  
Andrew F. Neuwald ◽  
Leena Hilakivi-Clarke ◽  
Robert Clarke ◽  
...  

Abstract Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites, based on which cis-regulatory modules (CRMs) can be inferred. CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.


2019 ◽  
Vol 36 (8) ◽  
pp. 2608-2610
Author(s):  
Aritro Nath ◽  
Jeremy Chang ◽  
R Stephanie Huang

Abstract Summary MicroRNAs (miRNAs) are critical post-transcriptional regulators of gene expression. Due to challenges in accurate profiling of small RNAs, a vast majority of public transcriptome datasets lack reliable miRNA profiles. However, the biological consequence of miRNA activity in the form of altered protein-coding gene (PCG) expression can be captured using machine-learning algorithms. Here, we present iMIRAGE (imputed miRNA activity from gene expression), a convenient tool to predict miRNA expression using PCG expression of the test datasets. The iMIRAGE package provides an integrated workflow for normalization and transformation of miRNA and PCG expression data, along with the option to utilize predicted miRNA targets to impute miRNA activity from independent test PCG datasets. Availability and implementation The iMIRAGE package for R, along with package documentation and vignette, is available at https://aritronath.github.io/iMIRAGE/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zachary B Abrams ◽  
Dwayne G Tally ◽  
Lynne V Abruzzo ◽  
Kevin R Coombes

Abstract Summary Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. Availability and Implementation Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS. Supplementary information There is no supplementary data.


Sign in / Sign up

Export Citation Format

Share Document