scholarly journals Chances and challenges of machine learning‐based disease classification in genetic association studies illustrated on age‐related macular degeneration

2020 ◽  
Vol 44 (7) ◽  
pp. 759-777
Author(s):  
Felix Guenther ◽  
Caroline Brandl ◽  
Thomas W. Winkler ◽  
Veronika Wanner ◽  
Klaus Stark ◽  
...  
2019 ◽  
Author(s):  
Felix Günther ◽  
Caroline Brandl ◽  
Thomas W. Winkler ◽  
Veronika Wanner ◽  
Klaus Stark ◽  
...  

AbstractImaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We established machine learning based disease classification in genetic association analysis as a misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automated classification of age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (images from 4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artefacts in simulation studies. By combining a GWAS on automatically derived AMD classification and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artefacts (near HERC2) and to identify eye color as relevant source of misclassification. On this example of AMD, we are able to provide a proof-of-concept that a GWAS using machine learning derived disease classification yields relevant results and that misclassification needs to be considered in the analysis. These findings generalize to other phenotypes and also emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.


2015 ◽  
Vol 5 (1) ◽  
Author(s):  
Joseph M. Simonett ◽  
Mahsa A. Sohrab ◽  
Jennifer Pacheco ◽  
Loren L. Armstrong ◽  
Margarita Rzhetskaya ◽  
...  

2018 ◽  
Author(s):  
Jon M Laurent ◽  
Xin Fu ◽  
Sergei German ◽  
Matthew T Maurano ◽  
Kang Zhang ◽  
...  

AbstractAge-related Macular Degeneration (AMD) is a leading cause of blindness in the developed world, especially in aging populations, and is therefore an important target for new therapeutic development. Recently, there have been several studies demonstrating strong associations between AMD and sites of heritable genetic variation at multiple loci, including a highly significant association at 10q26. The 10q26 risk region contains two genes, HTRA1 and ARMS2, both of which have been separately implicated as causative for the disease, as well as dozens of sites of non-coding variation. To date, no studies have successfully pinpointed which of these variant sites are functional in AMD, nor definitively identified which genes in the region are targets of such regulatory variation. In order to efficiently decipher which sites are functional in AMD phenotypes, we describe a general framework for combinatorial assembly of large ‘synthetic haplotypes’ along with delivery to relevant disease cell types for downstream functional analysis. We demonstrate the successful and highly efficient assembly of a first-draft 119kb wild-type ‘assemblon’ covering the HTRA1/ARMS2 risk region. We further propose the parallelized assembly of a library of combinatorial variant synthetic haplotypes covering the region, delivery and analysis of which will identify functional sites and their effects, leading to an improved understanding of AMD development. We anticipate that the methodology proposed here is highly generalizable towards the difficult problem of identifying truly functional variants from those discovered via GWAS or other genetic association studies.


Author(s):  
Benjamin A Goldstein ◽  
Eric C Polley ◽  
Farren B. S. Briggs

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.


2020 ◽  
Vol 41 (6) ◽  
pp. 539-547
Author(s):  
Antonieta Martínez-Velasco ◽  
Andric C. Perez-Ortiz ◽  
Bani Antonio-Aguirre ◽  
Lourdes Martínez-Villaseñor ◽  
Esmeralda Lira-Romero ◽  
...  

2019 ◽  
Vol 9 (24) ◽  
pp. 5550
Author(s):  
Antonieta Martínez-Velasco ◽  
Lourdes Martínez-Villaseñor ◽  
Luis Miralles-Pechuán ◽  
Andric C. Perez-Ortiz ◽  
Juan C. Zenteno ◽  
...  

Age-related macular degeneration (AMD) is the leading cause of visual dysfunction and irreversible blindness in developed countries and a rising cause in underdeveloped countries. There is a current debate on whether or not cataracts are significant risk factors for AMD development. In particular, research regarding this association is so far inconclusive. For this reason, we aimed to employ here a machine-learning approach to analyze the relevance and importance of cataracts as a risk factor for AMD in a large cohort of Hispanics from Mexico. We conducted a nested case control study of 119 cataract cases and 137 healthy unmatched controls focusing on clinical data from electronic medical records. Additionally, we studied two single nucleotide polymorphisms in the CFH gene previously associated with the disease in various populations as positive control for our method. We next determined the most relevant variables and found the bivariate association between cataracts and AMD. Later, we used supervised machine-learning methods to replicate these findings without bias. To improve the interpretability, we detected the five most relevant features and displayed them using a bar graph and a rule-based tree. Our findings suggest that bilateral cataracts are not a significant risk factor for AMD development among Hispanics from Mexico.


Sign in / Sign up

Export Citation Format

Share Document