scholarly journals Maximising Accuracy and Efficiency of Traffic Accident Prediction Combining Information Mining with Computational Intelligence Approaches and Decision Trees

Author(s):  
Tatiana Tambouratzis ◽  
Dora Souliou ◽  
Miltiadis Chalikias ◽  
Andreas Gregoriades

Abstract The development of universal methodologies for the accurate, efficient, and timely prediction of traffic accident location and severity constitutes a crucial endeavour. In this piece of research, the best combinations of salient accident-related parameters and accurate accident severity prediction models are determined for the 2005 accident dataset brought together by the Republic of Cyprus Police. The optimal methodology involves: (a) information mining in the form of feature selection of the accident parameters that maximise prediction accuracy (implemented via scatter search), followed by feature extraction (implemented via principal component analysis) and selection of the minimal number of components that contain the salient information of the original parameters, which combined bring about an overall 74.42% reduction in the dataset dimensionality; (b) accident severity prediction via probabilistic neural networks and random forests, both of which independently accomplish over 96% correct prediction and a balanced proportion of under- and over-estimations of accident severity. An explanation of the superiority of the optimal combinations of parameters and models is given, as is a comparison with existing accident classification/prediction approaches

2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Yingjie Qi ◽  
Jian-an Jia ◽  
Huiming Li ◽  
Nagen Wan ◽  
Shuqin Zhang ◽  
...  

Abstract Background It is important to recognize the coronavirus disease 2019 (COVID-19) patients in severe conditions from moderate ones, thus more effective predictors should be developed. Methods Clinical indicators of COVID-19 patients from two independent cohorts (Training data: Hefei Cohort, 82 patients; Validation data: Nanchang Cohort, 169 patients) were retrospected. Sparse principal component analysis (SPCA) using Hefei Cohort was performed and prediction models were deduced. Prediction results were evaluated by receiver operator characteristic curve and decision curve analysis (DCA) in above two cohorts. Results SPCA using Hefei Cohort revealed that the first 13 principal components (PCs) account for 80.8% of the total variance of original data. The PC1 and PC12 were significantly associated with disease severity with odds ratio of 4.049 and 3.318, respectively. They were used to construct prediction model, named Model-A. In disease severity prediction, Model-A gave the best prediction efficiency with area under curve (AUC) of 0.867 and 0.835 in Hefei and Nanchang Cohort, respectively. Model-A’s simplified version, named as LMN index, gave comparable prediction efficiency as classical clinical markers with AUC of 0.837 and 0.800 in training and validation cohort, respectively. According to DCA, Model-A gave slightly better performance than others and LMN index showed similar performance as albumin or neutrophil-to-lymphocyte ratio. Conclusions Prediction models produced by SPCA showed robust disease severity prediction efficiency for COVID-19 patients and have the potential for clinical application.


2021 ◽  
Author(s):  
Anqi Shangguan ◽  
Lingxia Mu ◽  
Guo Xie ◽  
Chenglan Wang ◽  
Yang Jing ◽  
...  

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Fang Zong ◽  
Hongguo Xu ◽  
Huiyong Zhang

The paper presents a comparison between two modeling techniques, Bayesian network and Regression models, by employing them in accident severity analysis. Three severity indicators, that is, number of fatalities, number of injuries and property damage, are investigated with the two methods, and the major contribution factors and their effects are identified. The results indicate that the goodness of fit of Bayesian network is higher than that of Regression models in accident severity modeling. This finding facilitates the improvement of accuracy for accident severity prediction. Study results can be applied to the prediction of accident severity, which is one of the essential steps in accident management process. By recognizing the key influences, this research also provides suggestions for government to take effective measures to reduce accident impacts and improve traffic safety.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Jing Gan ◽  
Linheng Li ◽  
Dapeng Zhang ◽  
Ziwei Yi ◽  
Qiaojun Xiang

Traffic safety has always been an important issue in sustainable transportation development, and the prediction of traffic accident severity remains a crucial challenging issue in the domain of traffic safety. A huge variety of forecasting models have been proposed to meet this challenge. These models gradually evolved from linear to nonlinear forms and from traditional statistical regression models to current popular machine learning models. Recently, a machine learning algorithm called Deep Forests based on the decision tree ensemble has aroused widespread concern, which was proposed for the first time by a research team of Nanjing University. This algorithm was proved to be more accurate and robust in comparison with other machine learning algorithms. Motivated by this benefit, this study employs the UK road safety dataset to propose a novel method for predicting the severity of traffic accidents based on the Deep Forests algorithm. To verify the superiority of our proposed method, several other machine learning algorithm-based perdition models were implemented to predict traffic accident severity with the same dataset, and the prediction results show that the Deep Forests algorithm present good stability, fewer hyper-parameters, and the highest accuracy under different level of training data volume. It is expected that the findings from this study would be helpful for the establishment or improvement of effective traffic safety system within a sustainable transportation system, which is of great significance for helping government managers to establish timely proactive strategies in traffic accident prevention and effectively improve road traffic safety.


2019 ◽  
Vol 20 (S6) ◽  
Author(s):  
Ping Zhang ◽  
Nicholas P. West ◽  
Pin-Yen Chen ◽  
Mike W. C. Thang ◽  
Gareth Price ◽  
...  

Abstract Background Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to determine whether individuals can be clustered into two classification groups based on pre-determined criteria: control and disease group. However, a combination of other components may exist which better distinguish diseased individuals from healthy controls. Genetic algorithms (GAs) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and a genetic algorithm (GA) for identifying sets of bacterial species associated with obesity and metabolic syndrome (Mets). Results The prediction models built using the combination of principal components (PCs) selected by GA were compared to the models built using the top PCs that explained the most variance in the sample and to models built with selected original variables. The advantages of combining PCA with GA were demonstrated. Conclusions The proposed algorithm overcomes the limitation of PCA for data analysis. It offers a new way to build prediction models that may improve the prediction accuracy. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables.


Author(s):  
Marin Mandić ◽  
Goran Kraljević ◽  
Ivan Boban

Due to a high competition in the market, the telecom operators are affected by churn, therefore it is very important for them to identify which users are likely to leave them and switch to the competition telecom company. This research uses data on behaviour of the users from telecom systems that serve to identify patterns in behaviours and thereby recognize the churn. Creating new definition of prepaid soft churn based on multiple conditions is valuable contribution of this paper. At preparing data, a selection of useful attributes was made using the Principal Component Analysis (PCA). The normalization of the attribute values has also been made in order to obtain a proper balance of the influence of all the attributes. Common problem with telecom churn prediction data is imbalance, taking into account the target variable. Such a case is also in the data used in this paper, where the percentage of churners is 12%. Comparison of undersampling and oversampling was performed as a method for resolving the data imbalance problem. Data sets with undersampling and oversasmpling have been used to train the decision tree, logistic regression and neural network algorithms and therefore six prediction models for detecting the churn of the Prepaid users in the telecom were created in this paper. Performance analysis and comparison of the six developed Data mining models was also performed.


Sign in / Sign up

Export Citation Format

Share Document