Technical Note: Efficient and accurate estimation of genotype odds ratios in biobank-based unbalanced case-control studies

Mapping Intimacies ◽

10.1101/646018 ◽

2019 ◽

Author(s):

Rounak Dey ◽

Seunggeun Lee

Keyword(s):

Large Scale ◽

Association Studies ◽

Case Control ◽

Accurate Estimation ◽

Genome Wide Association Studies ◽

Odds Ratios ◽

Case Control Studies ◽

Genome Wide ◽

A Genome ◽

Wide Scale

AbstractIn genome-wide association studies (GWASs), genotype log-odds ratios (LORs) quantify the effects of the variants on the binary phenotypes, and calculating the genotype LORs for all of the markers is required for several downstream analyses. Calculating genotype LORs at a genome-wide scale is computationally challenging, especially when analyzing large-scale biobank data, which involves performing thousands of GWASs phenome-wide. Since most of the binary phenotypes in biobank-based studies have unbalanced (case : control = 1 : 10) or often extremely unbalanced (case : control = 1 : 100) case-control ratios, the existing methods cannot provide a scalable and accurate way to estimate the genotype LORs. The traditional logistic regression provides biased LOR estimates in such situations. Although the Firth bias correction method can provide unbiased LOR estimates, it is not scalable for genome-wide or phenome-wide scale association analyses typically used in biobank-based studies, especially when the number of non-genetic covariates is large. On the other hand, the saddlepoint approximation-based test (fastSPA), which can provide accurate p values and is scalable to analyse large-scale biobank data, does not provide the genotype LOR estimates as it is a score-based test. Here, we propose a scalable method based on score statistics, to accurately estimate the genotype LORs, adjusting for non-genetic covariates. Comparing to the Firth method, our proposed method reduces the computational complexity from O(nK2 + K3) to O(n), where n is the sample-size, and K is the number of non-genetic covariates. Our method is ~ 10x faster than the Firth method when 15 covariates are being adjusted for. Through extensive numerical simulations, we show that the proposed method is both scalable and accurate in estimating the genotype ORs in genome-wide or phenome-wide scale.

Download Full-text

A General Framework for Two-Stage Analysis of Genome-wide Association Studies and Its Application to Case-Control Studies

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2012.03.007 ◽

2012 ◽

Vol 90 (5) ◽

pp. 760-773 ◽

Cited By ~ 15

Author(s):

James M.S. Wason ◽

Frank Dudbridge

Keyword(s):

General Framework ◽

Association Studies ◽

Case Control ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Two Stage ◽

Genome Wide ◽

Stage Analysis

Download Full-text

Epistasis analysis reveals associations between gene variants and bipolar disorder

10.7287/peerj.preprints.3242v1 ◽

2017 ◽

Author(s):

Carlo Maj ◽

Elena Milanesi ◽

Massimo Gennarelli ◽

Luciano Milanesi ◽

ivan Merelli

Keyword(s):

Bipolar Disorder ◽

Association Studies ◽

Statistical Tests ◽

Genome Wide Association Studies ◽

Epistasis Analysis ◽

Complex Phenotypes ◽

Genome Wide ◽

A Genome ◽

Gene Associations ◽

Wide Scale

In complex phenotypes (e.g., psychiatric diseases) single locus tests, commonly performed with Genome-Wide Association Studies, have proven to be limited in discovering strong gene associations. A growing body of evidence suggests that epistatic non-linear effects may be responsible for complex phenotypes arising from the interaction of different biological factors. A major issue in epistasis analysis is the computational burden due to the huge number of statistical tests to be performed when considering all the potential genotype combinations. In this work, we developed a computational efficient pipeline to investigate the presence of epistasis at a genome-wide scale in bipolar disorder, which is a typical example of complex phenotype with a relevant but unexplained genetic background. By running our pipeline we were able to identify 13 epistasis interactions between variants located in genes potentially involved in biological processes associated with the analyzed phenotype.

Download Full-text

Genetic Determinants of Antibody-Mediated Immune Responses to Infectious Diseases Agents: A Genome-Wide and HLA Association Study

Open Forum Infectious Diseases ◽

10.1093/ofid/ofaa450 ◽

2020 ◽

Vol 7 (11) ◽

Author(s):

Guillaume Butler-Laporte ◽

Devin Kreuzer ◽

Tomoko Nakanishi ◽

Adil Harroud ◽

Vincenzo Forgetta ◽

...

Keyword(s):

Infectious Diseases ◽

Immune Responses ◽

Vaccine Development ◽

Association Studies ◽

Case Control ◽

Genome Wide Association Studies ◽

Genetic Determinants ◽

Hla Association ◽

Genome Wide ◽

A Genome

Abstract Background Infectious diseases are causally related to a large array of noncommunicable diseases (NCDs). Identifying genetic determinants of infections and antibody-mediated immune responses may shed light on this relationship and provide therapeutic targets for drug and vaccine development. Methods We used the UK biobank cohort of up to 10 000 serological measurements of infectious diseases and genome-wide genotyping. We used data on 13 pathogens to define 46 phenotypes: 15 seropositivity case–control phenotypes and 31 quantitative antibody measurement phenotypes. For each of these, we performed genome-wide association studies (GWAS) using the fastGWA linear mixed model package and human leukocyte antigen (HLA) classical allele and amino acid residue associations analyses using Lasso regression for variable selection. Results We included a total of 8735 individuals for case–control phenotypes, and an average (range) of 4286 (276–8555) samples per quantitative analysis. Fourteen of the GWAS yielded a genome-wide significant (P < 5 ×10-8) locus at the major histocompatibility complex (MHC) on chromosome 6. Outside the MHC, we found a total of 60 loci, multiple associated with Epstein-Barr virus (EBV)–related NCDs (eg, RASA3, MED12L, and IRF4). FUT2 was also identified as an important gene for polyomaviridae. HLA analysis highlighted the importance of DRB1*09:01, DQB1*02:01, DQA1*01:02, and DQA1*03:01 in EBV serologies and of DRB1*15:01 in polyomaviridae. Conclusions We have identified multiple genetic variants associated with antibody immune response to 13 infections, many of which are biologically plausible therapeutic or vaccine targets. This may help prioritize future research and drug development.

Download Full-text

Genetic factors of polygenic urolithiasis

Urologia Journal ◽

10.1177/0391560319898375 ◽

2020 ◽

Vol 87 (2) ◽

pp. 57-64

Author(s):

Filippova Tamara Vladimirovna ◽

Khafizov Кamil Faridovich ◽

Rudenko Vadim Igorevich ◽

Rapoport Leonid Mikhailovich ◽

Tsarichenko Dmitry Georgievich ◽

...

Keyword(s):

Genetic Factors ◽

Preventive Measures ◽

Association Studies ◽

Case Control ◽

International Studies ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Genome Wide ◽

Genetic Technologies ◽

High Throughput Dna Sequencing

The article summarizes the findings of Russian and international studies of the genetic aspects of polygenic urolithiasis associated with impairment of calcium metabolism. The article analyzes the genetic risk factors of polygenic nephrolithiasis that show significant association with the disease in case-control studies and Genome-Wide Association Studies (16 genes). We described the gene functions involved in concrement formation in polygenic nephrolithiasis. The modern molecular and genetic technologies (DNA microarray, high-throughput DNA sequencing, etc.) enable identification of the genetic predisposition to a specific disease, realization of the individualized treatment of the patient, and carrying out timely preventive measures among the proband’s relatives.

Download Full-text

Genetic modifiers of menopausal hormone replacement therapy and breast cancer risk: a genome–wide interaction study

Endocrine Related Cancer ◽

10.1530/erc-13-0349 ◽

2013 ◽

Vol 20 (6) ◽

pp. 875-887 ◽

Cited By ~ 16

Author(s):

Anja Rudolph ◽

Rebecca Hein ◽

Sara Lindström ◽

Lars Beckmann ◽

Sabine Behrens ◽

...

Keyword(s):

Breast Cancer ◽

Fixed Effects ◽

Association Studies ◽

Meta Analysis ◽

Case Control ◽

Genome Wide Association Studies ◽

Genetic Modifiers ◽

Genome Wide ◽

A Genome ◽

Increased Risk

Women using menopausal hormone therapy (MHT) are at increased risk of developing breast cancer (BC). To detect genetic modifiers of the association between current use of MHT and BC risk, we conducted a meta-analysis of four genome-wide case-only studies followed by replication in 11 case–control studies. We used a case-only design to assess interactions between single-nucleotide polymorphisms (SNPs) and current MHT use on risk of overall and lobular BC. The discovery stage included 2920 cases (541 lobular) from four genome-wide association studies. The top 1391 SNPs showing P values for interaction (Pint) <3.0×10−3 were selected for replication using pooled case–control data from 11 studies of the Breast Cancer Association Consortium, including 7689 cases (676 lobular) and 9266 controls. Fixed-effects meta-analysis was used to derive combined Pint. No SNP reached genome-wide significance in either the discovery or combined stage. We observed effect modification of current MHT use on overall BC risk by two SNPs on chr13 near POMP (combined Pint≤8.9×10−6), two SNPs in SLC25A21 (combined Pint≤4.8×10−5), and three SNPs in PLCG2 (combined Pint≤4.5×10−5). The association between lobular BC risk was potentially modified by one SNP in TMEFF2 (combined Pint≤2.7×10−5), one SNP in CD80 (combined Pint≤8.2×10−6), three SNPs on chr17 near TMEM132E (combined Pint≤2.2×10−6), and two SNPs on chr18 near SLC25A52 (combined Pint≤4.6×10−5). In conclusion, polymorphisms in genes related to solute transportation in mitochondria, transmembrane signaling, and immune cell activation are potentially modifying BC risk associated with current use of MHT. These findings warrant replication in independent studies.

Download Full-text

A Simple Approximation To The Bias Of Gene-Envinornment Interactions In Case/Control Studies With Silent Disease

10.1101/444596 ◽

2018 ◽

Author(s):

Iryna Lobach ◽

Joshua Sampson ◽

Siarhei Lobach ◽

Alexander Alekseyenko ◽

Alexandra Pryatinska ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Environmental Variable ◽

Case Control ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Finite Samples ◽

Research Areas ◽

Gxe Interaction

AbstractOne of the most important research areas in case-control Genome-Wide Association Studies is to determine how the effect of a genotype varies across the environment or to measure the gene-environment interaction (GxE). We consider the scenario when some of the “healthy” controls actually have the disease and when the frequency of these latent cases varies by the environmental variable of interest. In this scenario, performing logistic regression of clinically defined case status on the genetic variant, environmental variable, and their interaction will result in biased estimates of GxE interaction. Here, we derive a general theoretical approximation to the bias in the estimates of the GxE interaction and show, through extensive simulation, that this approximation is accurate in finite samples. Moreover, we apply this approximation to evaluate the bias in the effect estimates of the genetic variants related to mitochondrial proteins a large-scale Prostate Cancer study.

Download Full-text

Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies

10.1101/374983 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lorin Crawford ◽

Xiang Zhou

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Computational Cost ◽

Case Control ◽

Type I ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Control Data ◽

Genome Wide

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.

Download Full-text

METU-SNP: An Integrated Software System for SNPComplex Disease Association Analysis

Journal of Integrative Bioinformatics ◽

10.1515/jib-2011-187 ◽

2011 ◽

Vol 8 (2) ◽

pp. 204-221 ◽

Cited By ~ 3

Author(s):

Gürkan Üstünkar ◽

Yeşim Aydın Son

Keyword(s):

Association Studies ◽

Case Control ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Case Control Studies ◽

Single Nucleotide ◽

High Dimensional Case ◽

Genome Wide ◽

Desktop Application ◽

Integrated Software

Summary Recently, there has been increasing research to discover genomic biomarkers, haplotypes, and potentially other variables that together contribute to the development of diseases. Single Nucleotide Polymorphisms (SNPs) are the most common form of genomic variations and they can represent an individual’s genetic variability in greatest detail. Genome-wide association studies (GWAS) of SNPs, high-dimensional case-control studies, are among the most promising approaches for identifying disease causing variants. METU-SNP software is a Java based integrated desktop application specifically designed for the prioritization of SNP biomarkers and the discovery of genes and pathways related to diseases via analysis of the GWAS case-control data. Outputs of METU-SNP can easily be utilized for the downstream biomarkers research to allow the prediction and the diagnosis of diseases and other personalized medical approaches. Here, we introduce and describe the system functionality and architecture of the METU-SNP. We believe that the METU-SNP will help researchers with the reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting the development of personalized medicine approaches and targeted drug discoveries

Download Full-text

Heterogeneity and Polygenicity in Psychiatric Disorders: A Genome-Wide Perspective

Chronic Stress ◽

10.1177/2470547020924844 ◽

2020 ◽

Vol 4 ◽

pp. 247054702092484 ◽

Cited By ~ 1

Author(s):

Frank R. Wendt ◽

Gita A. Pathak ◽

Daniel S. Tylee ◽

Aranyak Goswami ◽

Renato Polimanti

Keyword(s):

Psychiatric Disorders ◽

Large Scale ◽

Association Studies ◽

Genetic Data ◽

Autism Spectrum ◽

Genome Wide Association Studies ◽

Compulsive Disorder ◽

Genome Wide ◽

A Genome ◽

Chronic Tic Disorder

Genome-wide association studies (GWAS) have been performed for many psychiatric disorders and revealed a complex polygenic architecture linking mental and physical health phenotypes. Psychiatric diagnoses are often heterogeneous, and several layers of trait heterogeneity may contribute to detection of genetic risks per disorder or across multiple disorders. In this review, we discuss these heterogeneities and their consequences on the discovery of risk loci using large-scale genetic data. We primarily highlight the ways in which sex and diagnostic complexity contribute to risk locus discovery in schizophrenia, bipolar disorder, attention deficit hyperactivity disorder, autism spectrum disorder, posttraumatic stress disorder, major depressive disorder, obsessive-compulsive disorder, Tourette’s syndrome and chronic tic disorder, anxiety disorders, suicidality, feeding and eating disorders, and substance use disorders. Genetic data also have facilitated discovery of clinically relevant subphenotypes also described here. Collectively, GWAS of psychiatric disorders revealed that the understanding of heterogeneity, polygenicity, and pleiotropy is critical to translate genetic findings into treatment strategies.

Download Full-text