scholarly journals MiDAS—Meaningful Immunogenetic Data at Scale

2021 ◽  
Vol 17 (7) ◽  
pp. e1009131
Author(s):  
Maciej Migdal ◽  
Dan Fu Ruan ◽  
William F. Forrest ◽  
Amir Horowitz ◽  
Christian Hammer

Human immunogenetic variation in the form of HLA and KIR types has been shown to be strongly associated with a multitude of immune-related phenotypes. However, association studies involving immunogenetic loci most commonly involve simple analyses of classical HLA allelic diversity, resulting in limitations regarding the interpretability and reproducibility of results. We here present MiDAS, a comprehensive R package for immunogenetic data transformation and statistical analysis. MiDAS recodes input data in the form of HLA alleles and KIR types into biologically meaningful variables, allowing HLA amino acid fine mapping, analyses of HLA evolutionary divergence as well as experimentally validated HLA-KIR interactions. Further, MiDAS enables comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS thus closes the gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to immune and disease biology. It is freely available under a MIT license.

2021 ◽  
Author(s):  
Maciej Migdal ◽  
Dan Fu Ruan ◽  
William F. Forrest ◽  
Amir Horowitz ◽  
Christian Hammer

Human immunogenetic variation in the form of HLA and KIR types has been shown to be strongly associated with a multitude of immune-related phenotypes. We present MiDAS, an R package enabling statistical association analysis and using immunogenetic data transformation functions for HLA amino acid fine mapping, analysis of HLA evolutionary divergence as well as HLA-KIR interactions. MiDAS closes the gap between inference of immunogenetic variation and its efficient utilization to make meaningful discoveries.


2017 ◽  
Author(s):  
Josine Min ◽  
Gibran Hemani ◽  
George Davey Smith ◽  
Caroline Relton ◽  
Matthew Suderman

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.


2021 ◽  
Author(s):  
Florian Privé ◽  
Bjarni J. Vilhjálmsson ◽  
Timothy S. H. Mak

AbstractWe present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Joline W J Beulens ◽  
Joey Nap ◽  
Arno R van der Slik ◽  
Arnaud Zaldumbide ◽  
...  

Abstract Numerous large genome-wide association studies have been performed to understand the influence of genetics on traits. Many identified risk loci are in non-coding and intergenic regions, which complicates understanding how genes and their downstream pathways are influenced. An integrative data approach is required to understand the mechanism and consequences of identified risk loci. Here, we developed the R-package CONQUER. Data for SNPs of interest are acquired from static- and dynamic repositories (build GRCh38/hg38), including GTExPortal, Epigenomics Project, 4D genome database and genome browsers. All visualizations are fully interactive so that the user can immediately access the underlying data. CONQUER is a user-friendly tool to perform an integrative approach on multiple SNPs where risk loci are not seen as individual risk factors but rather as a network of risk factors.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


Author(s):  
Hisashi Takamizawa ◽  
Yutaka Nishiyama

It has been accepted that neutron irradiation embrittlement of reactor pressure vessel is caused by irradiation-induced formation of solute clusters (SCs) and matrix damages (MDs). In the present study, to analyze the contribution of chemical composition contained in SCs to irradiation embrittlement at high fluence region, statistical analysis using the Bayesian nonparametric (BNP) method was performed for Japanese PWR surveillance data. The significance of P, Si and Mn contents, which are not necessarily included in embrittlement correlations unlike the Cu and Ni content, was evaluated. The BNP method can learn the complexity of the statistical model itself from the input data and infer the predicted data with individual probability distribution of predict condition. The result suggested that irradiation embrittlement was most affected by the Si content in three examined elements at high fluence region.


2019 ◽  
Vol 35 (19) ◽  
pp. 3701-3708 ◽  
Author(s):  
Gulnara R Svishcheva ◽  
Nadezhda M Belonogova ◽  
Irina V Zorkoltseva ◽  
Anatoly V Kirichenko ◽  
Tatiana I Axenovich

Abstract Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document