Multiple testing in genome-wide association studies via hidden Markov models

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

Controlling the false discovery rate in GWAS with population structure

10.1101/2020.08.04.236703 ◽

2020 ◽

Author(s):

Matteo Sesia ◽

Stephen Bates ◽

Emmanuel Candès ◽

Jonathan Marchini ◽

Chiara Sabatti

Keyword(s):

Population Structure ◽

False Discovery Rate ◽

Markov Models ◽

State Of The Art ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association Studies ◽

False Discovery ◽

Identical By Descent ◽

Genome Wide

AbstractThis paper proposes a novel statistical method to address population structure in genome-wide association studies while controlling the false discovery rate, which overcomes some limitations of existing approaches. Our solution accounts for linkage disequilibrium and diverse ancestries by combining conditional testing via knockoffs with hidden Markov models from state-of-the-art phasing methods. Furthermore, we account for familial relatedness by describing the joint distribution of haplotypes sharing long identical-by-descent segments with a generalized hidden Markov model. Extensive simulations affirm the validity of this method, while applications to UK Biobank phenotypes yield many more discoveries compared to BOLT-LMM, most of which are confirmed by the Japan Biobank and FinnGen data.

Download Full-text

Increasing the Efficiency of Genome-wide Association Mapping via Hidden Markov Models

10.1101/039099 ◽

2016 ◽

Author(s):

Hong Gao ◽

Hua Tang ◽

Carlos Bustamante

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Large Scale ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Trend Test ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

With the rapid production of high dimensional genetic data, one major challenge in genome-wide association studies is to develop effective and efficient statistical tools to resolve the low power problem of detecting causal SNPs with low to moderate susceptibility, whose effects are often obscured by substantial background noises. Here we present a novel method that serves as an optimal technique for reducing background noises and improving detection power in genome-wide association studies. The approach uses hidden Markov model and its derivate Markov hidden Markov model to estimate the posterior probabilities of a markers being in an associated state. We conducted extensive simulations based on the human whole genome genotype data from the GlaxoSmithKline-POPRES project to calibrate the sensitivity and specificity of our method and compared with many popular approaches for detecting positive signals including the χ^2 test for association and the Cochran-Armitage trend test. Our simulation results suggested that at very low false positive rates (<10^-6), our method reaches the power of 0.9, and is more powerful than any other approaches, when the allelic effect of the causal variant is non-additive or unknown. Application of our method to the data set generated by Welcome Trust Case Control Consortium using 14,000 cases and 3,000 controls confirmed its powerfulness and efficiency under the context of the large-scale genome-wide association studies.

Download Full-text

A hidden Markov random field model for genome-wide association studies

Biostatistics ◽

10.1093/biostatistics/kxp043 ◽

2009 ◽

Vol 11 (1) ◽

pp. 139-150 ◽

Cited By ~ 20

Author(s):

H. Li ◽

Z. Wei ◽

J. Maris

Keyword(s):

Markov Random Field ◽

Field Model ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Markov Random Field Model ◽

Markov Random ◽

Genome Wide ◽

Hidden Markov Random Field

Download Full-text

A Novel Hidden Markov Model for Genome-Wide Association Studies

2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C) ◽

10.1109/qrs-c.2017.86 ◽

2017 ◽

Author(s):

Junli Yang ◽

Bo Song ◽

Bing Yan ◽

Guoqiang Li

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

networkGWAS: A network-based approach for genome-wide association studies in structured populations

10.1101/2021.11.11.468206 ◽

2021 ◽

Author(s):

Giulia Muzio ◽

Leslie O'Bray ◽

Laetitia Meng-Papaxanthos ◽

Juliane Klatt ◽

Karsten Borgwardt

Keyword(s):

Genetic Markers ◽

Complex Traits ◽

Multiple Testing ◽

Association Studies ◽

Search Space ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Genome Wide

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.

Download Full-text