Construction of Wine Quality Prediction Model based on Machine Learning Algorithm

2021 ◽  
Author(s):  
HAOYU Zhang ◽  
ZHILE Wang ◽  
JIAWEI He ◽  
JIJIAO Tong
2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


2021 ◽  
Author(s):  
Jincheng Yang

BACKGROUND Diabetes mellitus and cancer are amongst the leading causes of deaths worldwide; hyperglycemia plays a major contributory role in neoplastic transformation risk. Support Vector Machine (SVM) is a type of supervised learning method which analyzes data and recognizes patterns, mainly used for statistical classification and regression. OBJECTIVE From reported adverse events of PD-1 or PD-L1 (programmed death 1 or ligand 1) inhibitors in post-marketing monitoring, we aimed to construct an effective machine learning algorithm to predict the probability of hyperglycemic adverse reaction from PD-1/PD-L1 inhibitors treated patients efficiently and rapidly. METHODS Raw data was downloaded from US Food and Drug Administration Adverse Event Reporting System (FDA FAERS). Signal of relationship between drug and adverse reaction based on disproportionality analysis and Bayesian analysis. A multivariate pattern classification of SVM was used to construct classifier to separate adverse hyperglycemic reaction patients. A 10-fold-3-time cross validation for model setup within training data (80% data) output best parameter values in SVM within R software. The model was validated in each testing data (20% data) and two total drug data, with exactly predictor parameter variables: gamma and nu. RESULTS Total 95918 case files were downloaded from 7 relevant drugs (cemiplimab, avelumab, durvalumab, atezolizumab, pembrolizumab, ipilimumab, nivolumab). The number-type/number-optimization method was selected to optimize model. Both gamma and nu values correlated with case number showed high adjusted r2 in curve regressions (both r2 >0.95). Indexes of accuracy, F1 score, kappa and sensitivity were greatly improved from the prediction model in training data and two total drug data. CONCLUSIONS The SVM prediction model established here can non-invasively and precisely predict occurrence of hyperglycemic adverse drug reaction (ADR) in PD-1/PD-L1 inhibitors treated patients. Such information is vital to overcome ADR and to improve outcomes by distinguish high hyperglycemia-risk patients, and this machine learning algorithm can eventually add value onto clinical decision making. CLINICALTRIAL N/A


Sign in / Sign up

Export Citation Format

Share Document