scholarly journals Chances and challenges of machine learning based disease classification in genetic association studies illustrated on age-related macular degeneration

2019 ◽  
Author(s):  
Felix Günther ◽  
Caroline Brandl ◽  
Thomas W. Winkler ◽  
Veronika Wanner ◽  
Klaus Stark ◽  
...  

AbstractImaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We established machine learning based disease classification in genetic association analysis as a misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automated classification of age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (images from 4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artefacts in simulation studies. By combining a GWAS on automatically derived AMD classification and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artefacts (near HERC2) and to identify eye color as relevant source of misclassification. On this example of AMD, we are able to provide a proof-of-concept that a GWAS using machine learning derived disease classification yields relevant results and that misclassification needs to be considered in the analysis. These findings generalize to other phenotypes and also emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.

2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Jiamei Liu ◽  
Cheng Xu ◽  
Weifeng Yang ◽  
Yayun Shu ◽  
Weiwei Zheng ◽  
...  

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.


2015 ◽  
Vol 5 (1) ◽  
Author(s):  
Joseph M. Simonett ◽  
Mahsa A. Sohrab ◽  
Jennifer Pacheco ◽  
Loren L. Armstrong ◽  
Margarita Rzhetskaya ◽  
...  

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1573 ◽  
Author(s):  
Jeffrey De Fauw ◽  
Pearse Keane ◽  
Nenad Tomasev ◽  
Daniel Visentin ◽  
George van den Driessche ◽  
...  

There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet”) age-related macular degeneration (wet AMD) and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye) and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves). Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, Google DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.


F1000Research ◽  
2017 ◽  
Vol 5 ◽  
pp. 1573 ◽  
Author(s):  
Jeffrey De Fauw ◽  
Pearse Keane ◽  
Nenad Tomasev ◽  
Daniel Visentin ◽  
George van den Driessche ◽  
...  

There are almost two million people in the United Kingdom living with sight loss, including around 360,000 people who are registered as blind or partially sighted. Sight threatening diseases, such as diabetic retinopathy and age related macular degeneration have contributed to the 40% increase in outpatient attendances in the last decade but are amenable to early detection and monitoring. With early and appropriate intervention, blindness may be prevented in many cases. Ophthalmic imaging provides a way to diagnose and objectively assess the progression of a number of pathologies including neovascular (“wet”) age-related macular degeneration (wet AMD) and diabetic retinopathy. Two methods of imaging are commonly used: digital photographs of the fundus (the ‘back’ of the eye) and Optical Coherence Tomography (OCT, a modality that uses light waves in a similar way to how ultrasound uses sound waves). Changes in population demographics and expectations and the changing pattern of chronic diseases creates a rising demand for such imaging. Meanwhile, interrogation of such images is time consuming, costly, and prone to human error. The application of novel analysis methods may provide a solution to these challenges. This research will focus on applying novel machine learning algorithms to automatic analysis of both digital fundus photographs and OCT in Moorfields Eye Hospital NHS Foundation Trust patients. Through analysis of the images used in ophthalmology, along with relevant clinical and demographic information, DeepMind Health will investigate the feasibility of automated grading of digital fundus photographs and OCT and provide novel quantitative measures for specific disease features and for monitoring the therapeutic success.


Author(s):  
Benjamin A Goldstein ◽  
Eric C Polley ◽  
Farren B. S. Briggs

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document