transferGWAS: GWAS of images using deep transfer learning

Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080030 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant ◽

Alexander F. Wilson

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Classification of Lesions in Retinal Fundus Images for Diabetic Retinopathy Using Transfer Learning

2019 International Conference on Information Technology (ICIT) ◽

10.1109/icit48102.2019.00067 ◽

2019 ◽

Cited By ~ 2

Author(s):

Siddharth Gupta ◽

Avnish Panwar ◽

Silky Goel ◽

Ankush Mittal ◽

Rahul Nijhawan ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Transfer Learning ◽

Fundus Images ◽

Retinal Fundus Images ◽

Retinal Fundus

Download Full-text

Efficient identification of context dependent subgroups of risk from genome-wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2013-0062 ◽

2014 ◽

Vol 13 (2) ◽

Cited By ~ 1

Author(s):

Greg Dyson ◽

Charles F. Sing

Keyword(s):

Prediction Models ◽

Disease Risk ◽

Hypothesis Test ◽

Association Studies ◽

Genome Wide Association ◽

Computational Time ◽

Type I ◽

Genome Wide Association Studies ◽

Genomic Variations ◽

Genome Wide

AbstractWe have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (>500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are under-estimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome-wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.10000871 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Alexander F. Wilson ◽

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Fast and Accurate Genome-Wide Association Test of Multiple Quantitative Traits

Computational and Mathematical Methods in Medicine ◽

10.1155/2018/2564531 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Baolin Wu ◽

James S. Pankow

Keyword(s):

Quantitative Traits ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Competitive Performance ◽

Multiple Traits ◽

Atherosclerosis Risk In Communities ◽

Genome Wide

Multiple correlated traits are often collected in genetic studies. By jointly analyzing multiple traits, we can increase power by aggregating multiple weak effects and reveal additional insights into the genetic architecture of complex human diseases. In this article, we propose a multivariate linear regression-based method to test the joint association of multiple quantitative traits. It is flexible to accommodate any covariates, has very accurate control of type I errors, and offers very competitive performance. We also discuss fast and accurate significance p value computation especially for genome-wide association studies with small-to-medium sample sizes. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to genome-wide association analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) study. We found some very interesting associations with diabetes traits which have not been reported before. We implemented the proposed methods in a publicly available R package.

Download Full-text

ComPaSS-GWAS: A method to reduce type I error in genome-wide association studies when replication data are not available

Genetic Epidemiology ◽

10.1002/gepi.22168 ◽

2018 ◽

Vol 43 (1) ◽

pp. 102-111 ◽

Cited By ~ 4

Author(s):

Jeremy A. Sabourin ◽

Cheryl D. Cropp ◽

Heejong Sung ◽

Lawrence C. Brody ◽

Joan E. Bailey-Wilson ◽

...

Keyword(s):

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Power and type I error rate of false discovery rate approaches in genome-wide association studies

BMC Genetics ◽

10.1186/1471-2156-6-s1-s134 ◽

2005 ◽

Vol 6 (Suppl 1) ◽

pp. S134 ◽

Cited By ~ 58

Author(s):

Qiong Yang ◽

Jing Cui ◽

Irmarie Chazaro ◽

L Adrienne Cupples ◽

Serkalem Demissie

Keyword(s):

False Discovery Rate ◽

Error Rate ◽

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rate ◽

False Discovery ◽

Genome Wide

Download Full-text

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data

Genes ◽

10.3390/genes12050736 ◽

2021 ◽

Vol 12 (5) ◽

pp. 736

Author(s):

Xiaotian Dai ◽

Guifang Fu ◽

Shaofei Zhao ◽

Yifei Zeng

Keyword(s):

Type I Error ◽

Association Studies ◽

Case Control ◽

Error Rates ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Learning Approaches ◽

Genome Wide ◽

Control Disease

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.

Download Full-text

A response to Yurko et al: H-MAGMA, inheriting a shaky statistical foundation, yields excess false positives

10.1101/2020.09.25.310722 ◽

2020 ◽

Author(s):

Christiaan de Leeuw ◽

Nancy Y. A. Sey ◽

Danielle Posthuma ◽

Hyejung Won

Keyword(s):

Psychiatric Disorder ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Analysis Model ◽

Type I Errors ◽

Risk Genes ◽

Genomic Annotation ◽

Genome Wide

AbstractHi-C coupled multimarker analysis of genomic annotation (H-MAGMA) was initially developed to advance MAGMA by assigning non-coding SNPs to their cognate genes based on threedimensional chromatin architecture. Yurko and colleagues raised concerns that the SNP-wise mean gene-analysis model of MAGMA may allow inflation in type I errors. Accordingly, we updated MAGMA and found that the updated version (MAGMA v.1.08) effectively controls for error rate inflation. Intrigued by this result, H-MAGMA was also updated by implementing MAGMA v.1.08. As expected, H-MAGMA v.1.08 detected a smaller set of risk genes than its original version (v.1.07), but the overall statistical architecture remained largely unchanged between v.1.07 and v.1.08. H-MAGMA v.1.08 was then applied to genome-wide association studies (GWAS) of five psychiatric disorders, from which we recapitulated our previous findings that psychiatric disorder risk genes display neuronal and prenatal enrichment. Therefore, issues raised by Yurko and colleagues can be overcome by using (H-)MAGMA v.1.08.

Download Full-text

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

10.1101/2020.10.09.333146 ◽

2020 ◽

Author(s):

Wenjian Bi ◽

Wei Zhou ◽

Rounak Dey ◽

Bhramar Mukherjee ◽

Joshua N Sampson ◽

...

Keyword(s):

Mixed Model ◽

Type I Error ◽

Association Studies ◽

Error Rates ◽

Genome Wide Association ◽

Alternative Methods ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.

Download Full-text