scholarly journals A Simple Approximation To The Bias Of Gene-Envinornment Interactions In Case/Control Studies With Silent Disease

2018 ◽  
Author(s):  
Iryna Lobach ◽  
Joshua Sampson ◽  
Siarhei Lobach ◽  
Alexander Alekseyenko ◽  
Alexandra Pryatinska ◽  
...  

AbstractOne of the most important research areas in case-control Genome-Wide Association Studies is to determine how the effect of a genotype varies across the environment or to measure the gene-environment interaction (GxE). We consider the scenario when some of the “healthy” controls actually have the disease and when the frequency of these latent cases varies by the environmental variable of interest. In this scenario, performing logistic regression of clinically defined case status on the genetic variant, environmental variable, and their interaction will result in biased estimates of GxE interaction. Here, we derive a general theoretical approximation to the bias in the estimates of the GxE interaction and show, through extensive simulation, that this approximation is accurate in finite samples. Moreover, we apply this approximation to evaluate the bias in the effect estimates of the genetic variants related to mitochondrial proteins a large-scale Prostate Cancer study.

2019 ◽  
Author(s):  
Rounak Dey ◽  
Seunggeun Lee

AbstractIn genome-wide association studies (GWASs), genotype log-odds ratios (LORs) quantify the effects of the variants on the binary phenotypes, and calculating the genotype LORs for all of the markers is required for several downstream analyses. Calculating genotype LORs at a genome-wide scale is computationally challenging, especially when analyzing large-scale biobank data, which involves performing thousands of GWASs phenome-wide. Since most of the binary phenotypes in biobank-based studies have unbalanced (case : control = 1 : 10) or often extremely unbalanced (case : control = 1 : 100) case-control ratios, the existing methods cannot provide a scalable and accurate way to estimate the genotype LORs. The traditional logistic regression provides biased LOR estimates in such situations. Although the Firth bias correction method can provide unbiased LOR estimates, it is not scalable for genome-wide or phenome-wide scale association analyses typically used in biobank-based studies, especially when the number of non-genetic covariates is large. On the other hand, the saddlepoint approximation-based test (fastSPA), which can provide accurate p values and is scalable to analyse large-scale biobank data, does not provide the genotype LOR estimates as it is a score-based test. Here, we propose a scalable method based on score statistics, to accurately estimate the genotype LORs, adjusting for non-genetic covariates. Comparing to the Firth method, our proposed method reduces the computational complexity from O(nK2 + K3) to O(n), where n is the sample-size, and K is the number of non-genetic covariates. Our method is ~ 10x faster than the Firth method when 15 covariates are being adjusted for. Through extensive numerical simulations, we show that the proposed method is both scalable and accurate in estimating the genotype ORs in genome-wide or phenome-wide scale.


2012 ◽  
Vol 6 ◽  
pp. BBI.S9867 ◽  
Author(s):  
Guanjie Chen ◽  
Ao Yuan ◽  
Jie Zhou ◽  
Amy R. Bentley ◽  
Adebowale Adeyemo ◽  
...  

Missing heritability is still a challenge for Genome Wide Association Studies (GWAS). Gene-gene interactions may partially explain this residual genetic influence and contribute broadly to complex disease. To analyze the gene-gene interactions in case-control studies of complex disease, we propose a simple, non-parametric method that utilizes the F-statistic. This approach consists of three steps. First, we examine the joint distribution of a pair of SNPs in cases and controls separately. Second, an F-test is used to evaluate the ratio of dependence in cases to that of controls. Finally, results are adjusted for multiple tests. This method was used to evaluate gene-gene interactions that are associated with risk of Type 2 Diabetes among African Americans in the Howard University Family Study. We identified 18 gene-gene interactions ( P < 0.0001). Compared with the commonly-used logistical regression method, we demonstrate that the F-ratio test is an efficient approach to measuring gene-gene interactions, especially for studies with limited sample size.


2017 ◽  
Author(s):  
Wei Zhou ◽  
Jonas B. Nielsen ◽  
Lars G. Fritsche ◽  
Rounak Dey ◽  
Maiken E. Gabrielsen ◽  
...  

AbstractIn genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly – producing large type I error rates – in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 white British European-ancestry samples for >1400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.


2019 ◽  
Author(s):  
Zhe Wang ◽  
Han Chen ◽  
Traci M. Bartz ◽  
Lawrence F. Bielak ◽  
Daniel I. Chasman ◽  
...  

AbstractBackgroundAlcohol intake influences plasma lipid levels and such effects may be modulated by genetic variants.ObjectiveWe aimed to characterize the role of aggregated rare and low-frequency variants in gene by alcohol consumption interactions associated with fasting plasma lipid levels.DesignIn the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, fasting plasma triglycerides (TG), and high- and low-density lipoprotein cholesterol (HDL-c and LDL-c) were measured in 34,153 European Americans from five discovery studies and 32,275 individuals from six replication studies. Rare and low-frequency protein coding variants (minor allele frequency ≤ 5%) measured by an exome array were aggregated by genes and evaluated by a gene-environment interaction (GxE) test and a joint test of genetic main and GxE interaction effects. Two dichotomous self-reported alcohol consumption variables, current drinker, defined as any recurrent drinking behavior, and regular drinker, defined as the subset of current drinkers who consume at least two drinks per week, were considered.ResultsWe discovered and replicated 21 gene-lipid associations at 13 known lipid loci through the joint test. Eight loci (PCSK9, LPA, LPL, LIPG, ANGPTL4, APOB, APOC3 and CD300LG) remained significant after conditioning on the common index single nucleotide polymorphism (SNP) identified by previous genome-wide association studies, suggesting an independent role for rare and low-frequency variants at these loci. One significant gene-alcohol interaction on TG was discovered at a Bonferroni corrected significance level (p-value <5×10−5) and replicated (p-value <0.013 for the interaction test) inSMC5.ConclusionsIn conclusion, this study applied new gene-based statistical approaches to uncover the role of rare and low-frequency variants in gene-alcohol consumption interactions on lipid levels.


2020 ◽  
Vol 3 (2) ◽  
pp. 25-30
Author(s):  
Renata Zunec

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is reported to vary across different populations in the prevalence of infection, in the death rate of patients, in the severity of symptoms and in the drug response of patients. Among host genetic factors that can influence all these attributes human leukocyte antigen (HLA) genetic system stands out as one of the leading candidates. Case-control studies, large-scale population-based studies, as well as experimental bioinformatics studies are of utmost importance to confirm HLA susceptibility spectrum of COVID-19. This review presents the results of the first case-control and epidemiological studies performed in several populations, early after the pandemic breakout. The results are pointing to several susceptible and protective HLA alleles and haplotypes associations with COVID-19, some of which might be of interest for the future studies in Croatia, due to its common presence in the population. However, further multiple investigations from around the world, as numerous as possible, are needed to confirm or deteriorate these preliminary results.


2018 ◽  
Author(s):  
Omer Weissbrod ◽  
Jonathan Flint ◽  
Saharon Rosset

AbstractMethods that estimate heritability and genetic correlations from genome-wide association studies have proven to be powerful tools for investigating the genetic architecture of common diseases and exposing unexpected relationships between disorders. Many relevant studies employ a case-control design, yet most methods are primarily geared towards analyzing quantitative traits. Here we investigate the validity of three common methods for estimating genetic heritability and genetic correlation. We find that the Phenotype-Correlation-Genotype-Correlation (PCGC) approach is the only method that can estimate both quantities accurately in the presence of important non-genetic risk factors, such as age and sex. We extend PCGC to work with summary statistics that take the case-control sampling into account, and demonstrate that our new method, PCGC-s, accurately estimates both heritability and genetic correlations and can be applied to large data sets without requiring individual-level genotypic or phenotypic information. Finally, we use PCGC-S to estimate the genetic correlation between schizophrenia and bipolar disorder, and demonstrate that previous estimates are biased due to incorrect handling of sex as a strong risk factor. PCGC-s is available at https://github.com/omerwe/PCGCs.


2020 ◽  
Vol 87 (2) ◽  
pp. 57-64
Author(s):  
Filippova Tamara Vladimirovna ◽  
Khafizov Кamil Faridovich ◽  
Rudenko Vadim Igorevich ◽  
Rapoport Leonid Mikhailovich ◽  
Tsarichenko Dmitry Georgievich ◽  
...  

The article summarizes the findings of Russian and international studies of the genetic aspects of polygenic urolithiasis associated with impairment of calcium metabolism. The article analyzes the genetic risk factors of polygenic nephrolithiasis that show significant association with the disease in case-control studies and Genome-Wide Association Studies (16 genes). We described the gene functions involved in concrement formation in polygenic nephrolithiasis. The modern molecular and genetic technologies (DNA microarray, high-throughput DNA sequencing, etc.) enable identification of the genetic predisposition to a specific disease, realization of the individualized treatment of the patient, and carrying out timely preventive measures among the proband’s relatives.


2018 ◽  
Author(s):  
Lorin Crawford ◽  
Xiang Zhou

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.


2011 ◽  
Vol 8 (2) ◽  
pp. 204-221 ◽  
Author(s):  
Gürkan Üstünkar ◽  
Yeşim Aydın Son

Summary Recently, there has been increasing research to discover genomic biomarkers, haplotypes, and potentially other variables that together contribute to the development of diseases. Single Nucleotide Polymorphisms (SNPs) are the most common form of genomic variations and they can represent an individual’s genetic variability in greatest detail. Genome-wide association studies (GWAS) of SNPs, high-dimensional case-control studies, are among the most promising approaches for identifying disease causing variants. METU-SNP software is a Java based integrated desktop application specifically designed for the prioritization of SNP biomarkers and the discovery of genes and pathways related to diseases via analysis of the GWAS case-control data. Outputs of METU-SNP can easily be utilized for the downstream biomarkers research to allow the prediction and the diagnosis of diseases and other personalized medical approaches. Here, we introduce and describe the system functionality and architecture of the METU-SNP. We believe that the METU-SNP will help researchers with the reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting the development of personalized medicine approaches and targeted drug discoveries


Sign in / Sign up

Export Citation Format

Share Document