scholarly journals Comparison of Models Used to Predict Flight Delays at Jomo Kenyatta International Airport

Author(s):  
P. K. Gachoki ◽  
M. M. Muraya

Delays in flights have negative socio-economics effects on passengers, airlines and airports, resulting to huge economic loses. Therefore, their prediction is crucial during the decision-making process for all players of aviation industry for proper management. The development of accurate prediction models for flight delays depend on the complexity of air transport system and airport infrastructure, hence may be country specific. However, there exists no prediction models tailored to Kenyan aviation industry. Hence there is need to develop prediction models amenable to Kenya aviation conditions. The objective of this study was to compare the prediction power of the developed models. Secondary data from Jomo Kenya International Airport (JKIA) was used in this study. The data collected included the day of the flight (Monday to Sunday), the month (January to December), the airline, the flight class (domestic or international), season (summer or winter), capacity of the aircraft, flight ID (tail number) and whether the flight had flown at night or during the day. The analysis of the data was done using R- software. Three models, Logistic model, Support Vector Machine model and Random Forest model, were fitted. The strength and utility of the models was determined using bias-variance learning curves. The study revealed that the models predicted delays with different accuracies. The Random Forest model had a prediction accuracy of 68.99% while the Support Vector Machine model (SVM) had an accuracy of 68.62% and the Logistic Regression model had an accuracy of 66.18%. The Random Forest model outperformed the SVM and Logistic Regression with accuracies of 0.37% and 2.71% respectively. The SVM and Random Forest do not assume probability distribution of the response under investigation, probably indicating why they performed better than the logistic regression. The study recommends application of Random Forest model to predict flight delays at JKIA.

Author(s):  
Soo-Kyoung Lee ◽  
Juh Hyun Shin ◽  
Jinhyun Ahn ◽  
Ji Yeon Lee ◽  
Dong Eun Jang

Background: Machine learning (ML) can keep improving predictions and generating automated knowledge via data-driven predictors or decisions. Objective: The purpose of this study was to compare different ML methods including random forest, logistics regression, linear support vector machine (SVM), polynomial SVM, radial SVM, and sigmoid SVM in terms of their accuracy, sensitivity, specificity, negative predictor values, and positive predictive values by validating real datasets to predict factors for pressure ulcers (PUs). Methods: We applied representative ML algorithms (random forest, logistic regression, linear SVM, polynomial SVM, radial SVM, and sigmoid SVM) to develop a prediction model (N = 60). Results: The random forest model showed the greatest accuracy (0.814), followed by logistic regression (0.782), polynomial SVM (0.779), radial SVM (0.770), linear SVM (0.767), and sigmoid SVM (0.674). Conclusions: The random forest model showed the greatest accuracy for predicting PUs in nursing homes (NHs). Diverse factors that predict PUs in NHs including NH characteristics and residents’ characteristics were identified according to diverse ML methods. These factors should be considered to decrease PUs in NH residents.


Author(s):  
Fei Yang ◽  
Yanchen Wang ◽  
Peter J. Jin ◽  
Dingbang Li ◽  
Zhenxing Yao

Cellular phone data has been proven to be valuable in the analysis of residents’ travel patterns. Existing studies mostly identify the trip ends through rule-based or clustering algorithms. These methods largely depend on subjective experience and users’ communication behaviors. Moreover, limited by privacy policy, the accuracy of these methods is difficult to assess. In this paper, points of interest data is applied to supplement cellular phone data’s missing information generated by users’ behaviors. Specifically, a random forest model for trip end identification is proposed using multi-dimensional attributes. A field data acquisition test is designed and conducted with communication operators to implement synchronized cellular phone data and real trip information collection. The proposed identification approach is empirically evaluated with real trip information. Results show that the overall trip end detection precision and recall reach 95.2% and 88.7% with an average distance error of 269 m, and the time errors of the trip ends are less than 10 min. Compared with the rule-based approach, clustering algorithm, naive Bayes method, and support vector machine, the proposed method has better performance in accuracy and consistency.


2021 ◽  
Author(s):  
Hemalatha N ◽  
Akhil Wilson ◽  
Akhil Thankachan

Plastic pollution is one of the challenging problems in the environment. But a life without plastic we cannot imagine. This paper deals with the prediction of plastic degrading microbes using Machine Learning. Here we have used Decision Tree, Random Forest, Support vector Machine and K Nearest Neighbor algorithms in order to predict the plastic degrading microbes. Among the four classifiers, Random Forest model gave the best accuracy of 99.1%.


2021 ◽  
Vol 11 (12) ◽  
pp. 1271
Author(s):  
Jaehyeong Cho ◽  
Jimyung Park ◽  
Eugene Jeong ◽  
Jihye Shin ◽  
Sangjeong Ahn ◽  
...  

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.


2020 ◽  
Vol 35 (Supplement_3) ◽  
Author(s):  
Manuel Benítez Sánchez ◽  
Guillermo Martín ◽  
Luis Gil Sacaluga ◽  
Maria Jose Garcia Cortes ◽  
Sergio García Marcos ◽  
...  

Abstract Background and Aims Random Forest (RF) is an analytical technique of Artificial Intelligence (AI) that consists of an assembly of trees built by bootstrapping (resampling with replacement). In each node a subset of predictor variables is selected and for them the best cut point is determined. Each division of the tree is based on a random sample of the predictors. The trees are as long as possible. In the construction of each RF tree a part of the observations is not used (37% approx.). It is called an out-of-bag (OOB) sample and is used to obtain an honest estimate of the predictive capacity of the model. So it does not require validation. In each analysis, a few hundred Regression or classification trees are carried out, depending on whether the response variable is numerical or qualitative respectively. The result is an average of the repeated predictions of the model (Bagging). RF allows to calculate the importance of the predictor variables, which can be used later to be included in a multivariate regression model. Method We analyzed 14750 records between 2011 and 2014 contained in Information System of the Autonomous Transplant Coordination of Andalusia (SICATA) a system that includes clinical-epidemiological variables, about anemia, bone bone metabolism, adequacy of dialysis and vascular access. 1911 patients presented the event of interest (exitus). Three predictive and explanatory models of survival are developed: 1-RF. 2-.Multivariate Logistic Regression. 3- Multivariate Logistic Regression that includes the important variables of the previous RF model. We compare them in terms of accuracy (AUC of the ROC curve). Results AUC of the ROC curve of the multivariate model without prior RF was: 0.75 AUC of the ROC curve of the multivariate model with previous RF was: 0.81. AUC of the ROC curve of the Random Forest model: 0.98 Conclusion The Random Forest model has a 98% discrimination in the mortality of patients on Hemodialysis, far superior to the classic multivariate analyzes. The Multivariate Logistic Regression performed with the important RF variables improves the AUC of the previous model 0.81 vs. 0.75.


Author(s):  
Somayeh Najafi-Ghobadi ◽  
Khadijeh Najafi-Ghobadi ◽  
Lily Tapak ◽  
Abbas Aghaei

Abstract Background Drug injection has been increasing over the past decades all over the world. Hepatitis B and C viruses (HBV and HCV) are two common infections among people who inject drugs (PWID) and more than 60% of new human immunodeficiency virus (HIV) cases are PWID. Thus, investigating risk factors associated with drug use transition to injection is essential and was the aim of this research. Methods We used a database from drug use treatment centers in Kermanshah Province (Iran) in 2013 that included 2098 records of people who use drugs (PWUD). The information of 29 potential risk factors that are commonly used in the literature on drug use was selected. We employed four classification methods (decision tree, neural network, support vector machine, and logistic regression) to determine factors affecting the decision of PWUD to transition to injection. Results The average specificity of all models was over 84%. Support vector machine produced the highest specificity (0.9). Also, this model showed the highest total accuracy (0.91), sensitivity (0.94), positive likelihood ratio [1] and Kappa (0.94) and the smallest negative likelihood ratio (0). Therefore, important factors according to the support vector machine model were used for further interpretation. Conclusions Based on the support vector machine model, the use of heroin, cocaine, and hallucinogens were identified as the three most important factors associated with drug use transition injection. The results further indicated that PWUD with the history of prison or using drug due to curiosity and unemployment are at higher risks. Unemployment and unreliable sources of income were other suggested factors of transition in this research.


2020 ◽  
Author(s):  
Jie Wang ◽  
Chao Li ◽  
Jing Li ◽  
Sheng Qin ◽  
Chunlei Liu ◽  
...  

Abstract Background. The prevalence of metabolic syndrome continues to rise sharply worldwide, seriously threatening people's health.In this paper, three kinds of risk prediction models applicable to the metabolic syndrome of oil workers were established, and the optimal models were found through comparison. The optimal model can be used to identify people at high risk of metabolic syndrome as early as possible, to predict their risk, and to persuade them to change their adverse lifestyle so as to slow down and reduce the incidence of metabolic syndrome.Methods. A total of 1,468 workers from an oil company who participated in occupational health physical examination from April 2017 to October 2018 were included in this study. We established the Logistic regression model, the random forest model and the convolutional neural network model, and compared the prediction performance of the models according to the F1 score, sensitivity, accuracy and other indicators of the three models.Results. The results showed that the accuracy of the three models in the training set was 83.45%, 94.21% and 86.34%, the sensitivity was 78.47%, 94.62% and 81.30%, the F1 score was 0.79, 0.93 and 0.83, and the area under the ROC curve was 0.894, 0.987 and 0.935, respectively. In the test set, the accuracy was 76.72%, 80.66% and 78.69%, the sensitivity was 70.00%, 77.50% and 68.33%, the F1 score was 0.70, 0.76 and 0.71, and the area under the ROC curve was 0.797, 0.861 and 0.855, respectively.Conclusions. The study showed that the prediction performance of random forest model is better than other models, and the model has higher application value, which can better predict the risk of metabolic syndrome in oil workers, and provide corresponding theoretical basis for the health management of oil workers.


Sign in / Sign up

Export Citation Format

Share Document