Accounting for population structure and relatedness in gene expression genome-wide association testing using a mixed-model approach

2009 ◽  
Author(s):  
Samantha Bateman ◽  
Greg Gibson ◽  
Youssef Idaghdour ◽  
Wendy Czika ◽  
Kelci Miclaus ◽  
...  
2012 ◽  
Vol 44 (9) ◽  
pp. 1066-1071 ◽  
Author(s):  
Arthur Korte ◽  
Bjarni J Vilhjálmsson ◽  
Vincent Segura ◽  
Alexander Platt ◽  
Quan Long ◽  
...  

2012 ◽  
Vol 44 (7) ◽  
pp. 825-830 ◽  
Author(s):  
Vincent Segura ◽  
Bjarni J Vilhjálmsson ◽  
Alexander Platt ◽  
Arthur Korte ◽  
Ümit Seren ◽  
...  

2017 ◽  
Author(s):  
Haohan Wang ◽  
Bryon Aragam ◽  
Eric P. Xing

AbstractA fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Douglas W. Yao ◽  
Nikolas G. Balanis ◽  
Eleazar Eskin ◽  
Thomas G. Graeber

2020 ◽  
Author(s):  
Sarah W. Curtis ◽  
Daniel Chang ◽  
Myoung Keun Lee ◽  
John R. Shaffer ◽  
Karlijne Indencleef ◽  
...  

AbstractNonsyndromic orofacial clefts (OFCs) are the most common craniofacial birth defect in humans and, like many complex traits, OFCs are phenotypically and etiologically heterogenous. The phenotypic heterogeneity of OFCs extends beyond the structures affected by the cleft (e.g., cleft lip (CL) and cleft lip and palate (CLP) to other features, such as the severity of the cleft. Here, we focus on bilateral and unilateral clefts as one dimension of OFC severity. Unilateral clefts are more frequent than bilateral clefts for both CL and CLP, but the genetic architecture of these subtypes is not well understood, and it is not known if genetic variants predispose for the formation of one subtype over another. Therefore, we tested for subtype-specific genetic associations in 44 bilateral CL (BCL) cases, 434 unilateral CL (UCL) cases, 530 bilateral CLP cases (BCLP), 1123 unilateral CLP (UCLP) cases, and unrelated controls (N = 1626), using the mixed-model approach implemented in GENESIS. While no novel loci were found in subtype-specific analyses comparing cases to controls, the genetic architecture of UCL was distinct compared to BCL, with 43.8% of suggestive loci (p < 1.0×10−5) having non-overlapping confidence intervals between the two subtypes. To further understand the genetic risk factors for severity differences, we then performed a genome-wide scan for modifiers using a similar mixed-model approach and found one genome-wide significant modifier locus on 20p11 (p = 7.53×10−9), 300kb downstream of PAX1, associated with higher odds of BCL compared to UCL, which also replicated in an independent cohort (p = 0.0018) and showed no effect in BCLP (p>0.05). We further found that SNPs at this locus were associated with normal human nasal shape. Taken together, these results suggest bilateral and unilateral clefts may have differences in their genetic architecture, especially between CL and CLP. Moreover, our results suggest BCL, the rarest form of OFC, may be genetically distinct from the other OFC subtypes. This expands our understanding of genetic modifiers for subtypes of OFCs and further elucidates the genetic mechanisms behind the phenotypic heterogeneity in OFCs.


2013 ◽  
Vol 7 (1) ◽  
pp. 27-33 ◽  
Author(s):  
Gengxin Li ◽  
Hongjiang Zhu

With the availability of high-density genomic data containing millions of single nucleotide polymorphisms and tens or hundreds of thousands of individuals, genetic association study is likely to identify the variants contributing to complex traits in a genome-wide scale. However, genome-wide association studies are confounded by some spurious associations due to not properly interpreting sample structure (containing population structure, family structure and cryptic relatedness). The absence of complete genealogy of population in the genome-wide association studies model greatly motivates the development of new methods to correct the inflation of false positive. In this process, linear mixed model based approaches with the advantage of capturing multilevel relatedness have gained large ground. We summarize current literatures dealing with sample structure, and our review focuses on the following four areas: (i) The approaches handling population structure in genome-wide association studies; (ii) The linear mixed model based approaches in genome-wide association studies; (iii) The performance of linear mixed model based approaches in genome-wide association studies and (iv) The unsolved issues and future work of linear mixed model based approaches.


Author(s):  
Meng Luo ◽  
Shiliang Gu

AbstractDuring the past decades, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits included in humans, animals, and plants. All common genome-wide association (GWA) methods rely on population structure correction to avoid false genotype and phenotype associations. However, population structure correction is a stringent penalization, which also impedes the identification of real associations. Here, we used recent statistical advances and proposed iterative screen regression (ISR), which enables simultaneous multiple marker associations and shown to appropriately correction population stratification and cryptic relatedness in GWAS. Results from analyses of simulated suggest that the proposed ISR method performed well in terms of power (sensitivity) versus FDR (False Discovery Rate) and specificity, also less bias (higher accuracy) in effect (PVE) estimation than the existing multi-loci (mixed) model and the single-locus (mixed) model. We also show the practicality of our approach by applying it to rice, outbred mice, and A.thaliana datasets. It identified several new causal loci that other methods did not detect. Our ISR provides an alternative for multi-loci GWAS, and the implementation was computationally efficient, analyzing large datasets practicable (n>100,000).


Sign in / Sign up

Export Citation Format

Share Document