GWAS-based Machine Learning for Prediction of Age-Related Macular Degeneration Risk
ABSTRACTNumerous independent susceptibility variants have been identified for Age-related macular degeneration (AMD) by genome-wide association studies (GWAS). Since advanced AMD is currently incurable, an accurate prediction of a person’s AMD risk using genetic information is desirable for early diagnosis and clinical management. In this study, genotype data of 32,215 Caucasian individuals with age above 50 years from the International AMD Genomics Consortium in dbGAP were used to establish and validate prediction models for AMD risk using four different machine learning approaches: neural network, lasso regression, support vector machine, and random forest. A standard logistic regression model was also considered using a genetic risk score. To identify feature SNPs for AMD prediction models, we selected the genome-wide significant SNPs from GWAS. All methods achieved good performance for predicting normal controls versus advanced AMD cases (AUC=0.81∼0.82 in a separate test dataset) and normal controls versus any AMD (AUC=0.78∼0.79). By applying the state-of-art machine learning approaches on the large AMD GWAS data, the predictive models we established can provide an accurate estimation of an individual’s AMD risk profile across the person’s lifespan based on a comprehensive genetic information.