scholarly journals Technical Note: Efficient and accurate estimation of genotype odds ratios in biobank-based unbalanced case-control studies

2019 ◽  
Author(s):  
Rounak Dey ◽  
Seunggeun Lee

AbstractIn genome-wide association studies (GWASs), genotype log-odds ratios (LORs) quantify the effects of the variants on the binary phenotypes, and calculating the genotype LORs for all of the markers is required for several downstream analyses. Calculating genotype LORs at a genome-wide scale is computationally challenging, especially when analyzing large-scale biobank data, which involves performing thousands of GWASs phenome-wide. Since most of the binary phenotypes in biobank-based studies have unbalanced (case : control = 1 : 10) or often extremely unbalanced (case : control = 1 : 100) case-control ratios, the existing methods cannot provide a scalable and accurate way to estimate the genotype LORs. The traditional logistic regression provides biased LOR estimates in such situations. Although the Firth bias correction method can provide unbiased LOR estimates, it is not scalable for genome-wide or phenome-wide scale association analyses typically used in biobank-based studies, especially when the number of non-genetic covariates is large. On the other hand, the saddlepoint approximation-based test (fastSPA), which can provide accurate p values and is scalable to analyse large-scale biobank data, does not provide the genotype LOR estimates as it is a score-based test. Here, we propose a scalable method based on score statistics, to accurately estimate the genotype LORs, adjusting for non-genetic covariates. Comparing to the Firth method, our proposed method reduces the computational complexity from O(nK2 + K3) to O(n), where n is the sample-size, and K is the number of non-genetic covariates. Our method is ~ 10x faster than the Firth method when 15 covariates are being adjusted for. Through extensive numerical simulations, we show that the proposed method is both scalable and accurate in estimating the genotype ORs in genome-wide or phenome-wide scale.

2017 ◽  
Author(s):  
Carlo Maj ◽  
Elena Milanesi ◽  
Massimo Gennarelli ◽  
Luciano Milanesi ◽  
ivan Merelli

In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with Genome-Wide Association Studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient pipeline to investigate the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our pipeline we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.


2020 ◽  
Vol 7 (11) ◽  
Author(s):  
Guillaume Butler-Laporte ◽  
Devin Kreuzer ◽  
Tomoko Nakanishi ◽  
Adil Harroud ◽  
Vincenzo Forgetta ◽  
...  

Abstract Background Infectious diseases are causally related to a large array of noncommunicable diseases (NCDs). Identifying genetic determinants of infections and antibody-mediated immune responses may shed light on this relationship and provide therapeutic targets for drug and vaccine development. Methods We used the UK biobank cohort of up to 10 000 serological measurements of infectious diseases and genome-wide genotyping. We used data on 13 pathogens to define 46 phenotypes: 15 seropositivity case–control phenotypes and 31 quantitative antibody measurement phenotypes. For each of these, we performed genome-wide association studies (GWAS) using the fastGWA linear mixed model package and human leukocyte antigen (HLA) classical allele and amino acid residue associations analyses using Lasso regression for variable selection. Results We included a total of 8735 individuals for case–control phenotypes, and an average (range) of 4286 (276–8555) samples per quantitative analysis. Fourteen of the GWAS yielded a genome-wide significant (P < 5 ×10-8) locus at the major histocompatibility complex (MHC) on chromosome 6. Outside the MHC, we found a total of 60 loci, multiple associated with Epstein-Barr virus (EBV)–related NCDs (eg, RASA3, MED12L, and IRF4). FUT2 was also identified as an important gene for polyomaviridae. HLA analysis highlighted the importance of DRB1*09:01, DQB1*02:01, DQA1*01:02, and DQA1*03:01 in EBV serologies and of DRB1*15:01 in polyomaviridae. Conclusions We have identified multiple genetic variants associated with antibody immune response to 13 infections, many of which are biologically plausible therapeutic or vaccine targets. This may help prioritize future research and drug development.


2020 ◽  
Vol 87 (2) ◽  
pp. 57-64
Author(s):  
Filippova Tamara Vladimirovna ◽  
Khafizov Кamil Faridovich ◽  
Rudenko Vadim Igorevich ◽  
Rapoport Leonid Mikhailovich ◽  
Tsarichenko Dmitry Georgievich ◽  
...  

The article summarizes the findings of Russian and international studies of the genetic aspects of polygenic urolithiasis associated with impairment of calcium metabolism. The article analyzes the genetic risk factors of polygenic nephrolithiasis that show significant association with the disease in case-control studies and Genome-Wide Association Studies (16 genes). We described the gene functions involved in concrement formation in polygenic nephrolithiasis. The modern molecular and genetic technologies (DNA microarray, high-throughput DNA sequencing, etc.) enable identification of the genetic predisposition to a specific disease, realization of the individualized treatment of the patient, and carrying out timely preventive measures among the proband’s relatives.


2013 ◽  
Vol 20 (6) ◽  
pp. 875-887 ◽  
Author(s):  
Anja Rudolph ◽  
Rebecca Hein ◽  
Sara Lindström ◽  
Lars Beckmann ◽  
Sabine Behrens ◽  
...  

Women using menopausal hormone therapy (MHT) are at increased risk of developing breast cancer (BC). To detect genetic modifiers of the association between current use of MHT and BC risk, we conducted a meta-analysis of four genome-wide case-only studies followed by replication in 11 case–control studies. We used a case-only design to assess interactions between single-nucleotide polymorphisms (SNPs) and current MHT use on risk of overall and lobular BC. The discovery stage included 2920 cases (541 lobular) from four genome-wide association studies. The top 1391 SNPs showing P values for interaction (Pint) <3.0×10−3 were selected for replication using pooled case–control data from 11 studies of the Breast Cancer Association Consortium, including 7689 cases (676 lobular) and 9266 controls. Fixed-effects meta-analysis was used to derive combined Pint. No SNP reached genome-wide significance in either the discovery or combined stage. We observed effect modification of current MHT use on overall BC risk by two SNPs on chr13 near POMP (combined Pint≤8.9×10−6), two SNPs in SLC25A21 (combined Pint≤4.8×10−5), and three SNPs in PLCG2 (combined Pint≤4.5×10−5). The association between lobular BC risk was potentially modified by one SNP in TMEFF2 (combined Pint≤2.7×10−5), one SNP in CD80 (combined Pint≤8.2×10−6), three SNPs on chr17 near TMEM132E (combined Pint≤2.2×10−6), and two SNPs on chr18 near SLC25A52 (combined Pint≤4.6×10−5). In conclusion, polymorphisms in genes related to solute transportation in mitochondria, transmembrane signaling, and immune cell activation are potentially modifying BC risk associated with current use of MHT. These findings warrant replication in independent studies.


2018 ◽  
Author(s):  
Iryna Lobach ◽  
Joshua Sampson ◽  
Siarhei Lobach ◽  
Alexander Alekseyenko ◽  
Alexandra Pryatinska ◽  
...  

AbstractOne of the most important research areas in case-control Genome-Wide Association Studies is to determine how the effect of a genotype varies across the environment or to measure the gene-environment interaction (GxE). We consider the scenario when some of the “healthy” controls actually have the disease and when the frequency of these latent cases varies by the environmental variable of interest. In this scenario, performing logistic regression of clinically defined case status on the genetic variant, environmental variable, and their interaction will result in biased estimates of GxE interaction. Here, we derive a general theoretical approximation to the bias in the estimates of the GxE interaction and show, through extensive simulation, that this approximation is accurate in finite samples. Moreover, we apply this approximation to evaluate the bias in the effect estimates of the genetic variants related to mitochondrial proteins a large-scale Prostate Cancer study.


2018 ◽  
Author(s):  
Lorin Crawford ◽  
Xiang Zhou

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.


2011 ◽  
Vol 8 (2) ◽  
pp. 204-221 ◽  
Author(s):  
Gürkan Üstünkar ◽  
Yeşim Aydın Son

Summary Recently, there has been increasing research to discover genomic biomarkers, haplotypes, and potentially other variables that together contribute to the development of diseases. Single Nucleotide Polymorphisms (SNPs) are the most common form of genomic variations and they can represent an individual’s genetic variability in greatest detail. Genome-wide association studies (GWAS) of SNPs, high-dimensional case-control studies, are among the most promising approaches for identifying disease causing variants. METU-SNP software is a Java based integrated desktop application specifically designed for the prioritization of SNP biomarkers and the discovery of genes and pathways related to diseases via analysis of the GWAS case-control data. Outputs of METU-SNP can easily be utilized for the downstream biomarkers research to allow the prediction and the diagnosis of diseases and other personalized medical approaches. Here, we introduce and describe the system functionality and architecture of the METU-SNP. We believe that the METU-SNP will help researchers with the reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting the development of personalized medicine approaches and targeted drug discoveries


2020 ◽  
Vol 4 ◽  
pp. 247054702092484 ◽  
Author(s):  
Frank R. Wendt ◽  
Gita A. Pathak ◽  
Daniel S. Tylee ◽  
Aranyak Goswami ◽  
Renato Polimanti

Genome-wide association studies (GWAS) have been performed for many psychiatric disorders and revealed a complex polygenic architecture linking mental and physical health phenotypes. Psychiatric diagnoses are often heterogeneous, and several layers of trait heterogeneity may contribute to detection of genetic risks per disorder or across multiple disorders. In this review, we discuss these heterogeneities and their consequences on the discovery of risk loci using large-scale genetic data. We primarily highlight the ways in which sex and diagnostic complexity contribute to risk locus discovery in schizophrenia, bipolar disorder, attention deficit hyperactivity disorder, autism spectrum disorder, posttraumatic stress disorder, major depressive disorder, obsessive-compulsive disorder, Tourette’s syndrome and chronic tic disorder, anxiety disorders, suicidality, feeding and eating disorders, and substance use disorders. Genetic data also have facilitated discovery of clinically relevant subphenotypes also described here. Collectively, GWAS of psychiatric disorders revealed that the understanding of heterogeneity, polygenicity, and pleiotropy is critical to translate genetic findings into treatment strategies.


2017 ◽  
Author(s):  
Carlo Maj ◽  
Elena Milanesi ◽  
Massimo Gennarelli ◽  
Luciano Milanesi ◽  
ivan Merelli

In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with Genome-Wide Association Studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient pipeline to investigate the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our pipeline we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.


Sign in / Sign up

Export Citation Format

Share Document