scholarly journals Logistic regression protects against population structure in genetic association studies

2005 ◽  
Vol 16 (2) ◽  
pp. 290-296 ◽  
Author(s):  
E. Setakis

2004 ◽  
Vol 36 (5) ◽  
pp. 512-517 ◽  
Author(s):  
Jonathan Marchini ◽  
Lon R Cardon ◽  
Michael S Phillips ◽  
Peter Donnelly


2019 ◽  
Author(s):  
Yiqi Yao ◽  
Alejandro Ochoa

AbstractModern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but nowadays the Linear Mixed-Effects Model (LMM) is believed by many to be a superior model. Remarkably, previous comparisons have been limited by testing PCA without varying the number of principal components (PCs), by simulating unrealistically simple population structures, and by not always measuring both type-I error control and predictive power. In this work, we thoroughly evaluate PCA with varying number of PCs alongside LMM in various realistic scenarios, including admixture together with family structure, measuring both null p-value uniformity and the area under the precision-recall curves. We find that PCA performs as well as LMM when enough PCs are used and the sample size is large, and find a remarkable robustness to extreme number of PCs. However, we notice decreased performance for PCA relative to LMM when sample sizes are small and when there is family structure, although LMM performance is highly variable. Altogether, our work suggests that PCA is a favorable approach for association studies when sample sizes are large and no close relatives exist in the data, and a hybrid approach of LMM with PCs may be the best of both worlds.



2018 ◽  
Author(s):  
Matthew P. Conomos ◽  
Alex P. Reiner ◽  
Mary Sara McPeek ◽  
Timothy A. Thornton

AbstractLinear mixed models (LMMs) have become the standard approach for genetic association testing in the presence of sample structure. However, the performance of LMMs has primarily been evaluated in relatively homogeneous populations of European ancestry, despite many of the recent genetic association studies including samples from worldwide populations with diverse ancestries. In this paper, we demonstrate that existing LMM methods can have systematic miscalibration of association test statistics genome-wide in samples with heterogenous ancestry, resulting in both increased type-I error rates and a loss of power. Furthermore, we show that this miscalibration arises due to varying allele frequency differences across the genome among populations. To overcome this problem, we developed LMM-OPS, an LMM approach which orthogonally partitions diverse genetic structure into two components: distant population structure and recent genetic relatedness. In simulation studies with real and simulated genotype data, we demonstrate that LMM-OPS is appropriately calibrated in the presence of ancestry heterogeneity and outperforms existing LMM approaches, including EMMAX, GCTA, and GEMMA. We conduct a GWAS of white blood cell (WBC) count in an admixed sample of 3,551 Hispanic/Latino American women from the Women’s Health Initiative SNP Health Association Resource where LMM-OPS detects genome-wide significant associations with corresponding p-values that are one or more orders of magnitude smaller than those from competing LMM methods. We also identify a genome-wide significant association with regulatory variant rs2814778 in the DARC gene on chromosome 1, which generalizes to Hispanic/Latino Americans a previous association with reduced WBC count identified in African Americans.



2011 ◽  
Vol 4 (3) ◽  
pp. 317-326 ◽  
Author(s):  
David B. Allison ◽  
Nita A. Limdi ◽  
Nianjun Liu ◽  
Amit Patki ◽  
Hongyu Zhao


2003 ◽  
Vol 4 (4) ◽  
pp. 431-441 ◽  
Author(s):  
Elad Ziv ◽  
Esteban González Burchard


2014 ◽  
Author(s):  
Matthew P Conomos ◽  
Michael B Miller ◽  
Timothy A Thornton

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.



2010 ◽  
Vol 19 (3) ◽  
pp. 347-352 ◽  
Author(s):  
Jeroen R Huyghe ◽  
Erik Fransen ◽  
Samuli Hannula ◽  
Lut Van Laer ◽  
Els Van Eyken ◽  
...  


2016 ◽  
Vol 98 (4) ◽  
pp. 653-666 ◽  
Author(s):  
Han Chen ◽  
Chaolong Wang ◽  
Matthew P. Conomos ◽  
Adrienne M. Stilp ◽  
Zilin Li ◽  
...  


Author(s):  
Pantelis G Bagos ◽  
Georgios K Nikolopoulos

We propose here a simple and robust approach for meta-analysis of molecular association studies. Making use of the binary structure of the data, and by treating the genotypes as independent variables in a logistic regression, we apply a simple and commonly used methodology that performs satisfactorily, being at the same time very flexible. We present simple tests for detecting heterogeneity and we describe a random effects extension of the method in order to allow for between studies heterogeneity. We derive also simple tests for assessing the most plausible genetic model of inheritance, and its between-studies heterogeneity as well as adjusting for covariates. The methodology introduced here is easily extended in cases with polytomous or continuous outcomes as well as in cases with more than two alleles. We apply the methodology in several published meta-analyses of genetic association studies with very encouraging results. The main advantages of the proposed methodology is its flexibility and the ease of use, while at the same time covers almost every aspect of a meta-analysis providing overall estimates without the need of multiple comparisons. We anticipate that this simple method would be used in the future in meta-analyses of genetic association studies. A STATA command performing all the available computations is available at http://bioinformatics.biol.uoa.gr/~pbagos/metagen/.



Sign in / Sign up

Export Citation Format

Share Document