Genetic variants and their interactions in disease risk prediction – machine learning and network perspectives

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.

Download Full-text

Machine learning identifies six genetic variants and alterations in the Heart Atrial Appendage as key contributors to PD risk predictivity.

10.1101/2021.06.29.21259734 ◽

2021 ◽

Author(s):

Daniel Ho ◽

William Schierding ◽

Sophie L Farrow ◽

Antony Cooper ◽

Justin M. O'Sullivan ◽

...

Keyword(s):

Machine Learning ◽

Genetic Variants ◽

Disease Risk ◽

Meta Analysis ◽

Atrial Appendage ◽

Specific Expression ◽

Tissue Specific ◽

Transcriptional Changes ◽

The Uk ◽

Eqtl Data

Parkinson disease (PD) is a complex neurodegenerative disease with a range of causes and clinical presentations. Over 76 genetic loci (comprising 90 SNPs) have been associated with PD by the most recent GWAS meta-analysis. Most of these PD-associated variants are located in non-coding regions of the genome and it is difficult to understand what they are doing and how they contribute to the aetiology of PD. We hypothesised that PD-associated genetic variants modulate disease risk through tissue-specific expression quantitative trait loci (eQTL) effects. We developed and validated a machine learning approach that integrated tissue-specific eQTL data on known PD-associated genetic variants with PD case and control genotypes from the Wellcome Trust Case Control Consortium, the UK Biobank, and NeuroX. In so doing, our analysis ranked the tissue-specific transcription effects for PD-associated genetic variants and estimated their relative contributions to PD risk. We identified roles for SNPs that are connected with INPP5P, CNTN1, GBA and SNCA in PD. Ranking the variants and tissue-specific eQTL effects contributing most to the machine learning model suggested a key role in the risk of developing PD for two variants (rs7617877 and rs6808178) and eQTL associated transcriptional changes of EAF1-AS1 within the heart atrial appendage. Similarly, effects associated with eQTLs located within the brain cerebellum were also recognized to confer major PD risk. These findings warrant further mechanistic investigations to determine if these transcriptional changes could act as early contributors to PD risk and disease development.

Download Full-text

A Fast Fourier Transform-Coupled Machine Learning-Based Ensemble Model for Disease Risk Prediction Using a Real-Life Dataset

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-57454-7_51 ◽

2017 ◽

pp. 654-670 ◽

Cited By ~ 4

Author(s):

Raid Lafta ◽

Ji Zhang ◽

Xiaohui Tao ◽

Yan Li ◽

Wessam Abbas ◽

...

Keyword(s):

Machine Learning ◽

Fourier Transform ◽

Fast Fourier Transform ◽

Risk Prediction ◽

Disease Risk ◽

Real Life ◽

Ensemble Model

Download Full-text

A Structural Graph-Coupled Advanced Machine Learning Ensemble Model for Disease Risk Prediction in a Telehealthcare Environment

Studies in Big Data - Big Data in Engineering Applications ◽

10.1007/978-981-10-8476-8_18 ◽

2018 ◽

pp. 363-384

Author(s):

Raid Lafta ◽

Ji Zhang ◽

Xiaohui Tao ◽

Yan Li ◽

Mohammed Diykh ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Disease Risk ◽

Ensemble Model ◽

Structural Graph

Download Full-text

Preliminary Cardiac Disease Risk Prediction Based on Medical and Behavioural Data Set Using Supervised Machine Learning Techniques

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i31/96740 ◽

2016 ◽

Vol 9 (31) ◽

Cited By ~ 3

Author(s):

Thendral Puyalnithi ◽

V. Madhu Viswanatham

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Cardiac Disease ◽

Disease Risk ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Learning Techniques

Download Full-text

Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA) ◽

10.1109/iccubea47591.2019.9129304 ◽

2019 ◽

Cited By ~ 1

Author(s):

Syed Javeed Pasha ◽

E. Syed Mohamed

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Risk Prediction ◽

Disease Risk ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Machine Learning Identifies Six Genetic Variants and Alterations in the Heart Atrial Appendage as Key Contributors to PD Risk Predictivity

Frontiers in Genetics ◽

10.3389/fgene.2021.785436 ◽

2022 ◽

Vol 12 ◽

Author(s):

Daniel Ho ◽

William Schierding ◽

Sophie L. Farrow ◽

Antony A. Cooper ◽

Andreas W. Kempa-Liehr ◽

...

Keyword(s):

Machine Learning ◽

Genetic Variants ◽

Disease Risk ◽

Meta Analysis ◽

Atrial Appendage ◽

Specific Expression ◽

Tissue Specific ◽

Transcriptional Changes ◽

The Uk ◽

Eqtl Data

Parkinson’s disease (PD) is a complex neurodegenerative disease with a range of causes and clinical presentations. Over 76 genetic loci (comprising 90 SNPs) have been associated with PD by the most recent GWAS meta-analysis. Most of these PD-associated variants are located in non-coding regions of the genome and it is difficult to understand what they are doing and how they contribute to the aetiology of PD. We hypothesised that PD-associated genetic variants modulate disease risk through tissue-specific expression quantitative trait loci (eQTL) effects. We developed and validated a machine learning approach that integrated tissue-specific eQTL data on known PD-associated genetic variants with PD case and control genotypes from the Wellcome Trust Case Control Consortium. In so doing, our analysis ranked the tissue-specific transcription effects for PD-associated genetic variants and estimated their relative contributions to PD risk. We identified roles for SNPs that are connected with INPP5P, CNTN1, GBA and SNCA in PD. Ranking the variants and tissue-specific eQTL effects contributing most to the machine learning model suggested a key role in the risk of developing PD for two variants (rs7617877 and rs6808178) and eQTL associated transcriptional changes of EAF1-AS1 within the heart atrial appendage. Similarly, effects associated with eQTLs located within the Brain Cerebellum were also recognized to confer major PD risk. These findings were replicated in two additional, independent cohorts (the UK Biobank, and NeuroX) and thus warrant further mechanistic investigations to determine if these transcriptional changes could act as early contributors to PD risk and disease development.

Download Full-text