Chances and challenges of machine learning based disease classification in genetic association studies illustrated on age-related macular degeneration

AbstractImaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We established machine learning based disease classification in genetic association analysis as a misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automated classification of age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (images from 4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artefacts in simulation studies. By combining a GWAS on automatically derived AMD classification and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artefacts (near HERC2) and to identify eye color as relevant source of misclassification. On this example of AMD, we are able to provide a proof-of-concept that a GWAS using machine learning derived disease classification yields relevant results and that misclassification needs to be considered in the analysis. These findings generalize to other phenotypes and also emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.

Download Full-text

Chances and challenges of machine learning‐based disease classification in genetic association studies illustrated on age‐related macular degeneration

Genetic Epidemiology ◽

10.1002/gepi.22336 ◽

2020 ◽

Vol 44 (7) ◽

pp. 759-777

Author(s):

Felix Guenther ◽

Caroline Brandl ◽

Thomas W. Winkler ◽

Veronika Wanner ◽

Klaus Stark ◽

...

Keyword(s):

Machine Learning ◽

Macular Degeneration ◽

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Disease Classification ◽

Age Related Macular Degeneration ◽

Age Related

Download Full-text

Multiple similarly effective solutions exist for biomedical feature selection and classification problems

Scientific Reports ◽

10.1038/s41598-017-13184-8 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 9

Author(s):

Jiamei Liu ◽

Cheng Xu ◽

Weifeng Yang ◽

Yayun Shu ◽

Weiwei Zheng ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Association Studies ◽

Binary Classification ◽

Learning Algorithms ◽

Optimal Solution ◽

Machine Learning Algorithms ◽

Disease Classification ◽

Genome Wide Association Studies ◽

Classification Problems

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.

Download Full-text

A Validated Phenotyping Algorithm for Genetic Association Studies in Age-related Macular Degeneration

Scientific Reports ◽

10.1038/srep12875 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 1

Author(s):

Joseph M. Simonett ◽

Mahsa A. Sohrab ◽

Jennifer Pacheco ◽

Loren L. Armstrong ◽

Margarita Rzhetskaya ◽

...

Keyword(s):

Macular Degeneration ◽

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Age Related Macular Degeneration ◽

Age Related

Download Full-text

Automated analysis of retinal imaging using machine learning techniques for computer vision

F1000Research ◽

10.12688/f1000research.8996.1 ◽

2016 ◽

Vol 5 ◽

pp. 1573 ◽

Cited By ~ 20

Author(s):

Jeffrey De Fauw ◽

Pearse Keane ◽

Nenad Tomasev ◽

Daniel Visentin ◽

George van den Driessche ◽

...

Keyword(s):

Machine Learning ◽

Diabetic Retinopathy ◽

Macular Degeneration ◽

Human Error ◽

Machine Learning Algorithms ◽

Age Related Macular Degeneration ◽

Machine Learning Techniques ◽

Sound Waves ◽

Age Related ◽

Fundus Photographs

There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet”) age-related macular degeneration (wet AMD) and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye) and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves). Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, Google DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.

Download Full-text

Automated analysis of retinal imaging using machine learning techniques for computer vision

F1000Research ◽

10.12688/f1000research.8996.2 ◽

2017 ◽

Vol 5 ◽

pp. 1573 ◽

Cited By ~ 4

Author(s):

Jeffrey De Fauw ◽

Pearse Keane ◽

Nenad Tomasev ◽

Daniel Visentin ◽

George van den Driessche ◽

...

Keyword(s):

Machine Learning ◽

Diabetic Retinopathy ◽

Macular Degeneration ◽

Human Error ◽

Machine Learning Algorithms ◽

Age Related Macular Degeneration ◽

Machine Learning Techniques ◽

Sound Waves ◽

Age Related ◽

Fundus Photographs

There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet”) age-related macular degeneration (wet AMD) and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye) and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves). Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.

Download Full-text

Genome-Wide Association Studies-Based Machine Learning for Prediction of Age-Related Macular Degeneration Risk

Translational Vision Science & Technology ◽

10.1167/tvst.10.2.29 ◽

2021 ◽

Vol 10 (2) ◽

pp. 29

Author(s):

Qi Yan ◽

Yale Jiang ◽

Heng Huang ◽

Anand Swaroop ◽

Emily Y. Chew ◽

...

Keyword(s):

Machine Learning ◽

Macular Degeneration ◽

Association Studies ◽

Age Related Macular Degeneration ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Age Related ◽

Genome Wide

Download Full-text

Application of two machine learning algorithms to genetic association studies in the presence of covariates

BMC Genetics ◽

10.1186/1471-2156-9-71 ◽

2008 ◽

Vol 9 (1) ◽

Cited By ~ 9

Author(s):

Bareng AS Nonyane ◽

Andrea S Foulkes

Keyword(s):

Machine Learning ◽

Genetic Association ◽

Association Studies ◽

Learning Algorithms ◽

Genetic Association Studies ◽

Machine Learning Algorithms

Download Full-text

Lung disease classification using machine learning algorithms

International Journal of Applied Mathematics Electronics and Computers ◽

10.18100/ijamec.799363 ◽

2020 ◽

Vol 8 (4) ◽

pp. 125-132

Author(s):

Murat AYKANAT ◽

Özkan KILIÇ ◽

Bahar KURT ◽

Sevgi Behiye SARYAL

Keyword(s):

Machine Learning ◽

Lung Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Disease Classification

Download Full-text

Random Forests for Genetic Association Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1691 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 85

Author(s):

Benjamin A Goldstein ◽

Eric C Polley ◽

Farren B. S. Briggs

Keyword(s):

Machine Learning ◽

Genetic Association ◽

Random Forests ◽

Learning Algorithm ◽

Association Studies ◽

Genetic Association Studies ◽

Machine Learning Algorithms ◽

Computationally Efficient ◽

Genetic Studies ◽

Variable Importance Measures

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.

Download Full-text