scholarly journals Permutation methods for assessing significance in binary trait association mapping with structured samples

2018 ◽  
Author(s):  
Joelle Mbatchou ◽  
Mark Abney ◽  
Mary Sara McPeek

AbstractIn genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the asymptotic distribution of the test statistic is unknown or not well-approximated. This commonly arises when the association test statistic is itself a function of multiple correlated statistics. e.g, in tests of gene-set, pathway or genome-wide significance, as well as omnibus tests that combine test statistics that perform well in different scenarios. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary trait association mapping in samples that have related individuals and/or population structure. BRASS allows for covariates ascertainment andsimultaneous testing of multiple markers, and it accommodates a wide range of test statistics. We use an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model methods, and we use a combination of principal components and a genetic relatedness matrix to account for sample structure. We show in simulation studies that BRASS maintains correct control of type 1 error in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, only BRASS maintains correct control of type 1 error, performing far better than all other methods. We apply BRASS to two genome-wide analyses in domestic dog, one for elbow dysplasia (ED) in 82 breeds and another for idiopathic epilepsy (IE) in the Irish Wolfhound breed. We detect significant association of IE with SNPs in a previously-identified chromosome 4 region that contains multiple candidate genes.Author summaryPermutation testing is commonly used when distributional assumptions cannot be made or do not apply, or when performing a multiple testing correction, e.g., to assess region-wide or genome-wide significance in association mapping studies. Naively permuting the data is only valid under the assumption of exchangeability, which, in the presence of sample structure and polygenicity, typically does not hold. Linear mixed-model based approaches have been proposed for permutation-based tests with continuous traits that can also adjust for sample structure; however, these may not remain valid when applied to binary traits, as key features of binary data are not well accounted for. We propose BRASS, a permutation-based testing method for binary data that incorporates important characteristics of binary data in the trait model, can accommodate relevant covariates and ascertainment, and adjusts for the presence of structure in the sample. We demonstrate the use of this approach in the context of correcting for multiple testing in two genome-wide association studies in domestic dog: one for elbow dysplasia and one for idiopathic epilepsy.

2019 ◽  
Author(s):  
Yasin Kaymaz ◽  
Cliff I. Oduor ◽  
Ozkan Aydemir ◽  
Micah A. Luftig ◽  
Juliana A. Otieno ◽  
...  

AbstractEndemic Burkitt lymphoma (eBL), the most prevalent pediatric cancer in sub-Saharan Africa, is associated with malaria and Epstein Barr virus (EBV). In order to better understand the role of EBV in eBL, we improved viral DNA enrichment methods and generated a total of 98 new EBV genomes from both eBL cases (N=58) and healthy controls (N=40) residing in the same geographic region in Kenya. Comparing cases and controls, we found that EBV type 1 was significantly associated with eBL with 74.5% of patients (41/55) versus 47.5% of healthy children (19/40) carrying type 1 (OR=3.24, 95% CI=1.36 - 7.71,P=0.007). Controlling for EBV type, we also performed a genome-wide association study identifying 6 nonsynonymous variants in the genes EBNA1, EBNA2, BcLF1, and BARF1 that were enriched in eBL patients. Additionally, we observed that viruses isolated from plasma of eBL patients were identical to their tumor counterpart consistent with circulating viral DNA originating from the tumor. We also detected three intertypic recombinants carrying type 1 EBNA2 and type 2 EBNA3 regions as well as one novel genome with a 20 kb deletion resulting in the loss of multiple lytic and virion genes. Comparing EBV types, genes show differential variation rates as type 1 appears to be more divergent. Besides, type 2 demonstrates novel substructures. Overall, our findings address the complexities of EBV population structure and provide new insight into viral variation, which has the potential to influence eBL oncogenesis.Key PointsEBV type 1 is more prevalent in eBL patients compared to the geographically matched healthy control group.Genome-wide association analysis between cases and controls identifies 6 eBL-associated nonsynonymous variants in EBNA1, EBNA2, BcLF1, and BARF1 genes.Analysis of population structure reveals that EBV type 2 exists as two genomic sub groups.


Author(s):  
Shuo Jiao

This chapter presents set-based approaches that focus on identifying G X E interactions rather than set-based approaches that are based primarily on detecting G main effects (e.g., via marginal effects). The author reviews both his own research and the development of his Set Based Gene EnviRonment InterAction test (SBERIA), as well as another set-based G X E approach referred to as GESAT. GESAT extends the variance component test of the SNP-set Kernel Association Test (SKAT) to evaluate G x E effects while incorporating the main SNP effects as covariates. While both of these approaches (SBERIA and GESAT) have outperformed other benchmark methods (e.g., likelihood ratio test) and have been demonstrated to retain the appropriate Type 1 error rate, in this chapter the author conducts simulation studies to compare findings for SBERIA and GESAT approaches, and identifies associated strengths and limitations of the respective methods.


2017 ◽  
Author(s):  
Haohan Wang ◽  
Xiang Liu ◽  
Yunpeng Xiao ◽  
Ming Xu ◽  
Eric P. Xing

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
J. M. Belanger ◽  
T. R. Famula ◽  
L. C. Gershony ◽  
M. K. Palij ◽  
A. M. Oberbauer

Abstract Background Idiopathic epilepsy (IE) is a common neurological disorder in the domestic dog, and is defined as repeated seizure activity having no identifiable underlying cause. Some breeds, such as the Belgian shepherd dog, have a greater prevalence of the disorder. Previous studies in this and other breeds have identified ADAM23 as a gene that confers risk of IE, although additional loci are known to exist. The present study sought to identify additional loci that influence IE in the Belgian shepherd dog. Results Genome-wide association studies (GWAS) revealed a significant association between IE and CFA 14 (p < 1.03 E− 08) and a suggestive association on CFA 37 (p < 2.91 E− 06) in a region in linkage disequilibrium with ADAM23. Logistic regression identified a 2-loci model that demonstrated interaction between the two chromosomal regions that when combined predicted IE risk with high sensitivity. Conclusions Two interacting loci, one each on CFAs 14 and 37, predictive of IE in the Belgian shepherd were identified. The loci are adjacent to potential candidate genes associated with neurological function. Further exploration of the region is warranted to identify causal variants underlying the association. Additionally, although the two loci were very good at predicting IE, they failed to capture all the risk, indicating additional loci or incomplete penetrance are also likely contributing to IE expression in the Belgian shepherd dog.


2016 ◽  
Author(s):  
Calum J. Maclean ◽  
Brian P.H. Metzger ◽  
Jian-Rong Yang ◽  
Wei-Chin Ho ◽  
Bryan Moyers ◽  
...  

ABSTRACTThe budding yeast Saccharomyces cerevisiae is the best studied eukaryote in molecular and cell biology, but its utility for understanding the genetic basis of natural phenotypic variation is limited by the inefficiency of association mapping owing to strong and complex population structure. To facilitate association mapping, we analyzed 190 high-quality genomes of diverse strains, including 85 newly sequenced ones, to uncover yeast’s population structure that varies substantially among genomic regions. We identified 181 yeast genes that are absent from the reference genome and demonstrated their expression and role in important functions such as drug resistance. We then simultaneously measured the growth rates of over 4500 lab strains each deficient of a nonessential gene and 81 natural strains across multiple environments using unique DNA barcode present in each strain. We combined the genome-wide reverse genetic information with genome-wide association analysis to determine potential genomic regions of importance to environmental adaptations, and for a subset experimentally validated their role by reciprocal hemizygosity tests. The resources provided permit efficient and reliable association mapping in yeast and significantly enhances its value as a model for understanding the genetic mechanisms of phenotypic polymorphism and evolution.


2020 ◽  
Vol 15 (2) ◽  
pp. 121-134
Author(s):  
Sonu Bharti

The content of cardiotonic arjunolic acid in Terminalia arjuna vary among the population. We studied the population structure and the association between the molecular markers and its active ingredient among 140 plants collected from various agroclimatic zones in India. Large variation was detected for the arjunolic acid in this study showing suitableness of the genotypes. The maximum arjunolic acid content was approximately 238 per cent higher than the lowest value for the genotypes and was found to be considerably correlated to bark thickness, bark fresh weight and bark dry weight. The population structure studies described the existence of nine subpopulations. As the distance increased between the associated markers, Linkage disequilibrium (LD) reduction and a considerable reduction in LD decay was ascertained. Eleven QTL regions associated with arjunolic acid were identified from a genome-wide marker-trait association study. Fine-scale resolution detected significant LD among 3.4 per cent RAPD paired loci and 8.7 per cent ISSR paired loci and 6.7 per cent RAPD paired loci and 13.3 per cent ISSR paired loci. Importantly LD decay found to start at a distance of >20bp from the loci on the genome of T. arjuna accessions. Finally, association mapping (AM) in arjun tightly linked to OPT09 which can be a possible substitute to QTL mapping methodology.


2022 ◽  
Vol 54 (4) ◽  
Author(s):  
Rizwan Qaiser ◽  
Zahid Akram ◽  
Shahzad Asad ◽  
Inam-Ul Haq ◽  
Saad Imran Malik ◽  
...  

Author(s):  
Tao Wang

The importance of the gene × gene (G × G) and gene × environment (G × E) interaction has been widely recognized. It is statistically challenging to account for interactions in the analysis of genome-wide association data. In this chapter, we introduce a gene-based method for modeling G × G and G × E interactions under the regression framework. We evaluate the type 1 error rate and power of this new method by simulations. We apply this method to the endometrial cancer case-control dataset.


Sign in / Sign up

Export Citation Format

Share Document