EWASex: an efficient R-package to predict sex in epigenome-wide association studies

Author(s):  
Jesper Beltoft Lund ◽  
Weilong Li ◽  
Afsaneh Mohammadnejad ◽  
Shuxia Li ◽  
Jan Baumbach ◽  
...  

Abstract Summary Epigenome-Wide Association Study (EWAS) has become a powerful approach to identify epigenetic variations associated with diseases or health traits. Sex is an important variable to include in EWAS to ensure unbiased data processing and statistical analysis. We introduce the R-package EWASex, which allows for fast and highly accurate sex-estimation using DNA methylation data on a small set of CpG sites located on the X-chromosome under stable X-chromosome inactivation in females. Results We demonstrate that EWASex outperforms the current state of the art tools by using different EWAS datasets. With EWASex, we offer an efficient way to predict and to verify sex that can be easily implemented in any EWAS using blood samples or even other tissue types. It comes with pre-trained weights to work without prior sex labels and without requiring access to RAW data, which is a necessity for all currently available methods. Availability and implementation The EWASex R-package along with tutorials, documentation and source code are available at https://github.com/Silver-Hawk/EWASex. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Andrew E Teschendorff ◽  
Alok K Maity ◽  
Xue Hu ◽  
Chen Weiyan ◽  
Matthias Lechner

Abstract Motivation An important task in the analysis of single-cell RNA-Seq data is the estimation of differentiation potency, as this can help identify stem-or-multipotent cells in non-temporal studies or in tissues where differentiation hierarchies are not well established. A key challenge in the estimation of single-cell potency is the need for a fast and accurate algorithm, scalable to large scRNA-Seq studies profiling millions of cells. Results Here, we present a single-cell potency measure, called Correlation of Connectome and Transcriptome (CCAT), which can return accurate single-cell potency estimates of a million cells in minutes, a 100-fold improvement over current state-of-the-art methods. We benchmark CCAT against 8 other single-cell potency models and across 28 scRNA-Seq studies, encompassing over 2 million cells, demonstrating comparable accuracy than the current state-of-the-art, at a significantly reduced computational cost, and with increased robustness to dropouts. Availability and implementation CCAT is part of the SCENT R-package, freely available from https://github.com/aet21/SCENT. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4764-4766 ◽  
Author(s):  
Jonathan Cairns ◽  
William R Orchard ◽  
Valeriya Malysheva ◽  
Mikhail Spivakov

Abstract Summary Capture Hi-C is a powerful approach for detecting chromosomal interactions involving, at least on one end, DNA regions of interest, such as gene promoters. We present Chicdiff, an R package for robust detection of differential interactions in Capture Hi-C data. Chicdiff enhances a state-of-the-art differential testing approach for count data with bespoke normalization and multiple testing procedures that account for specific statistical properties of Capture Hi-C. We validate Chicdiff on published Promoter Capture Hi-C data in human Monocytes and CD4+ T cells, identifying multitudes of cell type-specific interactions, and confirming the overall positive association between promoter interactions and gene expression. Availability and implementation Chicdiff is implemented as an R package that is publicly available at https://github.com/RegulatoryGenomicsGroup/chicdiff. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3516-3521 ◽  
Author(s):  
Lixiang Zhang ◽  
Lin Lin ◽  
Jia Li

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 3 ◽  
pp. 449-460 ◽  
Author(s):  
Michael Roth ◽  
Mirella Lapata

Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling.


2019 ◽  
Vol 35 (19) ◽  
pp. 3701-3708 ◽  
Author(s):  
Gulnara R Svishcheva ◽  
Nadezhda M Belonogova ◽  
Irina V Zorkoltseva ◽  
Anatoly V Kirichenko ◽  
Tatiana I Axenovich

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3514-3516 ◽  
Author(s):  
Danyue Dong ◽  
Yuan Tian ◽  
Shijie C Zheng ◽  
Andrew E Teschendorff

AbstractMotivationThe biological interpretation of differentially methylated sites derived from Epigenome-Wide-Association Studies (EWAS) remains a significant challenge. Gene Set Enrichment Analysis (GSEA) is a general tool to aid biological interpretation, yet its correct and unbiased implementation in the EWAS context is difficult due to the differential probe representation of Illumina Infinium DNA methylation beadchips.ResultsWe present a novel GSEA method, called ebGSEA, which ranks genes, not CpGs, according to the overall level of differential methylation, as assessed using all the probes mapping to the given gene. Applied on simulated and real EWAS data, we show how ebGSEA may exhibit higher sensitivity and specificity than the current state-of-the-art, whilst also avoiding differential probe representation bias. Thus, ebGSEA will be a useful additional tool to aid the interpretation of EWAS data.Availability and implementationebGSEA is available from https://github.com/aet21/ebGSEA, and has been incorporated into the ChAMP Bioconductor package (https://www.bioconductor.org).Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i643-i650
Author(s):  
Emilio Dorigatti ◽  
Benjamin Schubert

Abstract Motivation Conceptually, epitope-based vaccine design poses two distinct problems: (i) selecting the best epitopes to elicit the strongest possible immune response and (ii) arranging and linking them through short spacer sequences to string-of-beads vaccines, so that their recovery likelihood during antigen processing is maximized. Current state-of-the-art approaches solve this design problem sequentially. Consequently, such approaches are unable to capture the inter-dependencies between the two design steps, usually emphasizing theoretical immunogenicity over correct vaccine processing, thus resulting in vaccines with less effective immunogenicity in vivo. Results In this work, we present a computational approach based on linear programming, called JessEV, that solves both design steps simultaneously, allowing to weigh the selection of a set of epitopes that have great immunogenic potential against their assembly into a string-of-beads construct that provides a high chance of recovery. We conducted Monte Carlo cleavage simulations to show that a fixed set of epitopes often cannot be assembled adequately, whereas selecting epitopes to accommodate proper cleavage requirements substantially improves their recovery probability and thus the effective immunogenicity, pathogen and population coverage of the resulting vaccines by at least 2-fold. Availability and implementation The software and the data analyzed are available at https://github.com/SchubertLab/JessEV. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Alan E Murphy ◽  
Brian M Schilder ◽  
Nathan G Skene

Abstract Motivation Genome-wide association studies (GWAS) summary statistics have popularised and accelerated genetic research. However, a lack of standardisation of the file formats used has proven problematic when running secondary analysis tools or performing meta-analysis studies. Results To address this issue, we have developed MungeSumstats, a Bioconductor R package for the standardisation and quality control of GWAS summary statistics. MungeSumstats can handle the most common summary statistic formats, including variant call format (VCF) producing a reformatted, standardised, tabular summary statistic file, VCF or R native data object. Availability MungeSumstats is available on Bioconductor (v 3.13) and can also be found on Github at: https://neurogenomics.github.io/MungeSumstats Supplementary information The analysis deriving the most common summary statistic formats is available at: https://al-murphy.github.io/SumstatFormats


Sign in / Sign up

Export Citation Format

Share Document