Comparison of Models Used to Predict Flight Delays at Jomo Kenyatta International Airport

Delays in flights have negative socio-economics effects on passengers, airlines and airports, resulting to huge economic loses. Therefore, their prediction is crucial during the decision-making process for all players of aviation industry for proper management. The development of accurate prediction models for flight delays depend on the complexity of air transport system and airport infrastructure, hence may be country specific. However, there exists no prediction models tailored to Kenyan aviation industry. Hence there is need to develop prediction models amenable to Kenya aviation conditions. The objective of this study was to compare the prediction power of the developed models. Secondary data from Jomo Kenya International Airport (JKIA) was used in this study. The data collected included the day of the flight (Monday to Sunday), the month (January to December), the airline, the flight class (domestic or international), season (summer or winter), capacity of the aircraft, flight ID (tail number) and whether the flight had flown at night or during the day. The analysis of the data was done using R- software. Three models, Logistic model, Support Vector Machine model and Random Forest model, were fitted. The strength and utility of the models was determined using bias-variance learning curves. The study revealed that the models predicted delays with different accuracies. The Random Forest model had a prediction accuracy of 68.99% while the Support Vector Machine model (SVM) had an accuracy of 68.62% and the Logistic Regression model had an accuracy of 66.18%. The Random Forest model outperformed the SVM and Logistic Regression with accuracies of 0.37% and 2.71% respectively. The SVM and Random Forest do not assume probability distribution of the response under investigation, probably indicating why they performed better than the logistic regression. The study recommends application of Random Forest model to predict flight delays at JKIA.

Download Full-text

Identifying the Risk Factors Associated with Nursing Home Residents’ Pressure Ulcers Using Machine Learning Methods

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18062954 ◽

2021 ◽

Vol 18 (6) ◽

pp. 2954

Author(s):

Soo-Kyoung Lee ◽

Juh Hyun Shin ◽

Jinhyun Ahn ◽

Ji Yeon Lee ◽

Dong Eun Jang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Pressure Ulcers ◽

Nursing Home Residents ◽

Random Forest Model ◽

Support Vector ◽

Predictive Values ◽

Forest Model ◽

Linear Svm

Background: Machine learning (ML) can keep improving predictions and generating automated knowledge via data-driven predictors or decisions. Objective: The purpose of this study was to compare different ML methods including random forest, logistics regression, linear support vector machine (SVM), polynomial SVM, radial SVM, and sigmoid SVM in terms of their accuracy, sensitivity, specificity, negative predictor values, and positive predictive values by validating real datasets to predict factors for pressure ulcers (PUs). Methods: We applied representative ML algorithms (random forest, logistic regression, linear SVM, polynomial SVM, radial SVM, and sigmoid SVM) to develop a prediction model (N = 60). Results: The random forest model showed the greatest accuracy (0.814), followed by logistic regression (0.782), polynomial SVM (0.779), radial SVM (0.770), linear SVM (0.767), and sigmoid SVM (0.674). Conclusions: The random forest model showed the greatest accuracy for predicting PUs in nursing homes (NHs). Diverse factors that predict PUs in NHs including NH characteristics and residents’ characteristics were identified according to diverse ML methods. These factors should be considered to decrease PUs in NH residents.

Download Full-text

Random Forest Model for Trip End Identification Using Cellular Phone and Points of Interest Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211031537 ◽

2021 ◽

pp. 036119812110315

Author(s):

Fei Yang ◽

Yanchen Wang ◽

Peter J. Jin ◽

Dingbang Li ◽

Zhenxing Yao

Keyword(s):

Random Forest ◽

Clustering Algorithm ◽

Subjective Experience ◽

Average Distance ◽

Cellular Phone ◽

Random Forest Model ◽

Support Vector ◽

Rule Based ◽

Forest Model ◽

Points Of Interest

Cellular phone data has been proven to be valuable in the analysis of residents’ travel patterns. Existing studies mostly identify the trip ends through rule-based or clustering algorithms. These methods largely depend on subjective experience and users’ communication behaviors. Moreover, limited by privacy policy, the accuracy of these methods is difficult to assess. In this paper, points of interest data is applied to supplement cellular phone data’s missing information generated by users’ behaviors. Specifically, a random forest model for trip end identification is proposed using multi-dimensional attributes. A field data acquisition test is designed and conducted with communication operators to implement synchronized cellular phone data and real trip information collection. The proposed identification approach is empirically evaluated with real trip information. Results show that the overall trip end detection precision and recall reach 95.2% and 88.7% with an average distance error of 269 m, and the time errors of the trip ends are less than 10 min. Compared with the rule-based approach, clustering algorithm, naive Bayes method, and support vector machine, the proposed method has better performance in accuracy and consistency.

Download Full-text

HEART DISEASE PREDICTION WITH LOGISTIC REGRESSION AND RANDOM FOREST MODEL

European Journal of Biomedical and Life Sciences ◽

10.29013/elbls-21-1.2-24-33 ◽

2021 ◽

pp. 24-33

Author(s):

D. Tang

Keyword(s):

Logistic Regression ◽

Heart Disease ◽

Random Forest ◽

Random Forest Model ◽

Disease Prediction ◽

Forest Model

Download Full-text

Wetland conversion risk assessment of East Kolkata Wetland: A Ramsar site using random forest and support vector machine model

Journal of Cleaner Production ◽

10.1016/j.jclepro.2020.123475 ◽

2020 ◽

Vol 275 ◽

pp. 123475

Author(s):

Sasanka Ghosh ◽

Arijit Das

Keyword(s):

Risk Assessment ◽

Support Vector Machine ◽

Random Forest ◽

Support Vector Machine Model ◽

Support Vector ◽

Ramsar Site ◽

Machine Model ◽

East Kolkata Wetland ◽

Wetland Conversion

Download Full-text

Prediction Of Plastic Degrading Microbes

10.1101/2021.08.01.454681 ◽

2021 ◽

Author(s):

Hemalatha N ◽

Akhil Wilson ◽

Akhil Thankachan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Random Forest Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Plastic Pollution ◽

Forest Model

Plastic pollution is one of the challenging problems in the environment. But a life without plastic we cannot imagine. This paper deals with the prediction of plastic degrading microbes using Machine Learning. Here we have used Decision Tree, Random Forest, Support vector Machine and K Nearest Neighbor algorithms in order to predict the plastic degrading microbes. Among the four classifiers, Random Forest model gave the best accuracy of 99.1%.

Download Full-text

The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms

Risk Management and Healthcare Policy ◽

10.2147/rmhp.s297838 ◽

2021 ◽

Vol Volume 14 ◽

pp. 1175-1187

Author(s):

Jie Song ◽

Yuan Gao ◽

Pengbin Yin ◽

Yi Li ◽

Yang Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Pressure Ulcer ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Forest Model

Download Full-text

Machine Learning Approach Using Routine Immediate Postoperative Laboratory Values for Predicting Postoperative Mortality

Journal of Personalized Medicine ◽

10.3390/jpm11121271 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1271

Author(s):

Jaehyeong Cho ◽

Jimyung Park ◽

Eugene Jeong ◽

Jihye Shin ◽

Sangjeong Ahn ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

External Validation ◽

Model Development ◽

Postoperative Mortality ◽

Random Forest Model ◽

Forest Model ◽

Laboratory Values ◽

Increased Risk

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.

Download Full-text

P1511MODELO PREDICTIVO DE SUPERVIVENCIA DE INTELIGENCIA ARTIFICIAL (BOSQUE ALEATORIO) EN HEMODIALISIS. DATOS DEL REGISTRO ANDALUSINA DE ENFERMEDADES RENALES. SICATA

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfaa142.p1511 ◽

2020 ◽

Vol 35 (Supplement_3) ◽

Author(s):

Manuel Benítez Sánchez ◽

Guillermo Martín ◽

Luis Gil Sacaluga ◽

Maria Jose Garcia Cortes ◽

Sergio García Marcos ◽

...

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Roc Curve ◽

Explanatory Models ◽

Multivariate Model ◽

Random Forest Model ◽

Predictor Variables ◽

Multivariate Logistic Regression ◽

Analytical Technique ◽

Forest Model

Abstract Background and Aims Random Forest (RF) is an analytical technique of Artificial Intelligence (AI) that consists of an assembly of trees built by bootstrapping (resampling with replacement). In each node a subset of predictor variables is selected and for them the best cut point is determined. Each division of the tree is based on a random sample of the predictors. The trees are as long as possible. In the construction of each RF tree a part of the observations is not used (37% approx.). It is called an out-of-bag (OOB) sample and is used to obtain an honest estimate of the predictive capacity of the model. So it does not require validation. In each analysis, a few hundred Regression or classification trees are carried out, depending on whether the response variable is numerical or qualitative respectively. The result is an average of the repeated predictions of the model (Bagging). RF allows to calculate the importance of the predictor variables, which can be used later to be included in a multivariate regression model. Method We analyzed 14750 records between 2011 and 2014 contained in Information System of the Autonomous Transplant Coordination of Andalusia (SICATA) a system that includes clinical-epidemiological variables, about anemia, bone bone metabolism, adequacy of dialysis and vascular access. 1911 patients presented the event of interest (exitus). Three predictive and explanatory models of survival are developed: 1-RF. 2-.Multivariate Logistic Regression. 3- Multivariate Logistic Regression that includes the important variables of the previous RF model. We compare them in terms of accuracy (AUC of the ROC curve). Results AUC of the ROC curve of the multivariate model without prior RF was: 0.75 AUC of the ROC curve of the multivariate model with previous RF was: 0.81. AUC of the ROC curve of the Random Forest model: 0.98 Conclusion The Random Forest model has a 98% discrimination in the mortality of patients on Hemodialysis, far superior to the classic multivariate analyzes. The Multivariate Logistic Regression performed with the important RF variables improves the AUC of the previous model 0.81 vs. 0.75.

Download Full-text

Application of data mining techniques and logistic regression to model drug use transition to injection: a case study in drug use treatment centers in Kermanshah Province, Iran

Substance Abuse Treatment Prevention and Policy ◽

10.1186/s13011-019-0242-1 ◽

2019 ◽

Vol 14 (1) ◽

Author(s):

Somayeh Najafi-Ghobadi ◽

Khadijeh Najafi-Ghobadi ◽

Lily Tapak ◽

Abbas Aghaei

Keyword(s):

Risk Factors ◽

Support Vector Machine ◽

Logistic Regression ◽

Drug Use ◽

Likelihood Ratio ◽

Support Vector Machine Model ◽

Positive Likelihood Ratio ◽

Support Vector ◽

Machine Model ◽

Factors Associated

Abstract Background Drug injection has been increasing over the past decades all over the world. Hepatitis B and C viruses (HBV and HCV) are two common infections among people who inject drugs (PWID) and more than 60% of new human immunodeficiency virus (HIV) cases are PWID. Thus, investigating risk factors associated with drug use transition to injection is essential and was the aim of this research. Methods We used a database from drug use treatment centers in Kermanshah Province (Iran) in 2013 that included 2098 records of people who use drugs (PWUD). The information of 29 potential risk factors that are commonly used in the literature on drug use was selected. We employed four classification methods (decision tree, neural network, support vector machine, and logistic regression) to determine factors affecting the decision of PWUD to transition to injection. Results The average specificity of all models was over 84%. Support vector machine produced the highest specificity (0.9). Also, this model showed the highest total accuracy (0.91), sensitivity (0.94), positive likelihood ratio [1] and Kappa (0.94) and the smallest negative likelihood ratio (0). Therefore, important factors according to the support vector machine model were used for further interpretation. Conclusions Based on the support vector machine model, the use of heroin, cocaine, and hallucinogens were identified as the three most important factors associated with drug use transition injection. The results further indicated that PWUD with the history of prison or using drug due to curiosity and unemployment are at higher risks. Unemployment and unreliable sources of income were other suggested factors of transition in this research.

Download Full-text

Application of Data Mining Technology in Risk Prediction of Metabolic Syndrome in Oil Workers

10.21203/rs.3.rs-31038/v1 ◽

2020 ◽

Author(s):

Jie Wang ◽

Chao Li ◽

Jing Li ◽

Sheng Qin ◽

Chunlei Liu ◽

...

Keyword(s):

Metabolic Syndrome ◽

Random Forest ◽

Risk Prediction ◽

Roc Curve ◽

Prediction Models ◽

Prediction Performance ◽

Random Forest Model ◽

Forest Model ◽

The Metabolic Syndrome ◽

Oil Workers

Abstract Background. The prevalence of metabolic syndrome continues to rise sharply worldwide, seriously threatening people's health.In this paper, three kinds of risk prediction models applicable to the metabolic syndrome of oil workers were established, and the optimal models were found through comparison. The optimal model can be used to identify people at high risk of metabolic syndrome as early as possible, to predict their risk, and to persuade them to change their adverse lifestyle so as to slow down and reduce the incidence of metabolic syndrome.Methods. A total of 1,468 workers from an oil company who participated in occupational health physical examination from April 2017 to October 2018 were included in this study. We established the Logistic regression model, the random forest model and the convolutional neural network model, and compared the prediction performance of the models according to the F1 score, sensitivity, accuracy and other indicators of the three models.Results. The results showed that the accuracy of the three models in the training set was 83.45%, 94.21% and 86.34%, the sensitivity was 78.47%, 94.62% and 81.30%, the F1 score was 0.79, 0.93 and 0.83, and the area under the ROC curve was 0.894, 0.987 and 0.935, respectively. In the test set, the accuracy was 76.72%, 80.66% and 78.69%, the sensitivity was 70.00%, 77.50% and 68.33%, the F1 score was 0.70, 0.76 and 0.71, and the area under the ROC curve was 0.797, 0.861 and 0.855, respectively.Conclusions. The study showed that the prediction performance of random forest model is better than other models, and the model has higher application value, which can better predict the risk of metabolic syndrome in oil workers, and provide corresponding theoretical basis for the health management of oil workers.

Download Full-text