Eye-Color and Type-2 Diabetes Phenotype Prediction From Genotype Data Using Deep Learning Methods

Statistical Techniques ◽

Human Beings ◽

Eye Color ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract Background: Genotype-Phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results: The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96 percent respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97 percent. Conclusion: Genotype-phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods

BMC Bioinformatics ◽

10.1186/s12859-021-04077-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Muhammad Muneeb ◽

Andreas Henschel

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Learning Model ◽

Statistical Techniques ◽

Human Beings ◽

Eye Color ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Author Correction: Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA

Scientific Reports ◽

10.1038/s41598-021-97279-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sarah Quiñones ◽

Aditya Goyal ◽

Zia U. Ahmed

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 2 Diabetes ◽

Spatial Heterogeneity ◽

Learning Model ◽

Machine Learning Model ◽

The Usa

Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA

Scientific Reports ◽

10.1038/s41598-021-85381-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sarah Quiñones ◽

Aditya Goyal ◽

Zia U. Ahmed

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Risk Factors ◽

Type 2 Diabetes ◽

Spatial Heterogeneity ◽

Learning Model ◽

Machine Learning Model ◽

Non Parametric

AbstractType 2 diabetes mellitus (T2D) prevalence in the United States varies substantially across spatial and temporal scales, attributable to variations of socioeconomic and lifestyle risk factors. Understanding these variations in risk factors contributions to T2D would be of great benefit to intervention and treatment approaches to reduce or prevent T2D. Geographically-weighted random forest (GW-RF), a tree-based non-parametric machine learning model, may help explore and visualize the relationships between T2D and risk factors at the county-level. GW-RF outputs are compared to global (RF and OLS) and local (GW-OLS) models between the years of 2013–2017 using low education, poverty, obesity, physical inactivity, access to exercise, and food environment as inputs. Our results indicate that a non-parametric GW-RF model shows a high potential for explaining spatial heterogeneity of, and predicting, T2D prevalence over traditional local and global models when inputting six major risk factors. Some of these predictions, however, are marginal. These findings of spatial heterogeneity using GW-RF demonstrate the need to consider local factors in prevention approaches. Spatial analysis of T2D and associated risk factor prevalence offers useful information for targeting the geographic area for prevention and disease interventions.

Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes

JAMA Network Open ◽

10.1001/jamanetworkopen.2021.11315 ◽

2021 ◽

Vol 4 (5) ◽

pp. e2111315

Author(s):

Mathieu Ravaut ◽

Vinyas Harish ◽

Hamed Sadeghi ◽

Kin Kwan Leung ◽

Maksims Volkovs ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Learning Model ◽

Health Data ◽

Administrative Health Data ◽

Machine Learning Model ◽

Development And Validation

1233-P: Prediction of Type 2 Diabetes Occurrence Using Machine Learning Model

Diabetes ◽

10.2337/db20-1233-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 1233-P

Author(s):

HENOCK M. DEBERNEH ◽

INTAEK KIM ◽

JAE HYUN PARK ◽

EUNSEOK CHA ◽

KYONG HYE JOUNG ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Learning Model ◽

Machine Learning Model

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus

Applied Intelligence ◽

10.1007/s10489-021-02533-w ◽

2021 ◽

Author(s):

Haohui Lu ◽

Shahadat Uddin ◽

Farshid Hajati ◽

Mohammad Ali Moni ◽

Matloob Khushi

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 2 Diabetes ◽

Learning Model ◽

Disease Prediction ◽

Machine Learning Model

Towards more Accessible Precision Medicine: Building a more Transferable Machine Learning Model to Support Prognostic Decisions for Micro- and Macrovascular Complications of Type 2 Diabetes Mellitus

Journal of Medical Systems ◽

10.1007/s10916-019-1321-6 ◽

2019 ◽

Vol 43 (7) ◽

Cited By ~ 3

Author(s):

Era Kim ◽

Pedro J. Caraballo ◽

M. Regina Castro ◽

David S. Pieczkiewicz ◽

Gyorgy J. Simon

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 2 Diabetes ◽

Precision Medicine ◽

Learning Model ◽

Macrovascular Complications ◽

Machine Learning Model

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study (Preprint)

10.2196/preprints.16850 ◽

2019 ◽

Author(s):

Lei Zhang ◽

Xianwen Shang ◽

Subhashaan Sreedharan ◽

Xixi Yan ◽

Jianbin Liu ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Risk Prediction ◽

Survey Study ◽

Gradient Boosting ◽

Diabetes Incidence ◽

Diabetes Onset ◽

Prediction Of Diabetes

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i><.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

Detection and defense of cyberattacks on the machine learning control of robotic systems

10.1177/15485129211043874 ◽

2021 ◽

pp. 154851292110438

Author(s):

George W Clark ◽

Todd R Andel ◽

J Todd McDonald ◽

Tom Johnsten ◽

Tom Thomas

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Defense Mechanisms ◽

Autonomous Vehicle ◽

Learning Algorithms ◽

Learning Model ◽

Robotic Systems ◽

Machine Learning Model ◽

Attack Surface

Robotic systems are no longer simply built and designed to perform sequential repetitive tasks primarily in a static manufacturing environment. Systems such as autonomous vehicles make use of intricate machine learning algorithms to adapt their behavior to dynamic conditions in their operating environment. These machine learning algorithms provide an additional attack surface for an adversary to exploit in order to perform a cyberattack. Since an attack on robotic systems such as autonomous vehicles have the potential to cause great damage and harm to humans, it is essential that detection and defenses of these attacks be explored. This paper discusses the plausibility of direct and indirect cyberattacks on a machine learning model through the use of a virtual autonomous vehicle operating in a simulation environment using a machine learning model for control. Using this vehicle, this paper proposes various methods of detection of cyberattacks on its machine learning model and discusses possible defense mechanisms to prevent such attacks.