Maximising Accuracy and Efficiency of Traffic Accident Prediction Combining Information Mining with Computational Intelligence Approaches and Decision Trees

Abstract The development of universal methodologies for the accurate, efficient, and timely prediction of traffic accident location and severity constitutes a crucial endeavour. In this piece of research, the best combinations of salient accident-related parameters and accurate accident severity prediction models are determined for the 2005 accident dataset brought together by the Republic of Cyprus Police. The optimal methodology involves: (a) information mining in the form of feature selection of the accident parameters that maximise prediction accuracy (implemented via scatter search), followed by feature extraction (implemented via principal component analysis) and selection of the minimal number of components that contain the salient information of the original parameters, which combined bring about an overall 74.42% reduction in the dataset dimensionality; (b) accident severity prediction via probabilistic neural networks and random forests, both of which independently accomplish over 96% correct prediction and a balanced proportion of under- and over-estimations of accident severity. An explanation of the superiority of the optimal combinations of parameters and models is given, as is a comparison with existing accident classification/prediction approaches

Download Full-text

Traffic Accident Severity Prediction Using Naive Bayes Algorithm - A Case Study of Semarang Toll Road

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/598/1/012089 ◽

2019 ◽

Vol 598 ◽

pp. 012089

Author(s):

W Budiawan ◽

S Saptadi ◽

Sriyanto ◽

C Tjioe ◽

T Phommachak

Keyword(s):

Traffic Accident ◽

Naive Bayes ◽

Naïve Bayes ◽

Toll Road ◽

Severity Prediction ◽

Accident Severity ◽

Bayes Algorithm

Download Full-text

Lymphocyte–monocyte–neutrophil index: a predictor of severity of coronavirus disease 2019 patients produced by sparse principal component analysis

Virology Journal ◽

10.1186/s12985-021-01561-9 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Yingjie Qi ◽

Jian-an Jia ◽

Huiming Li ◽

Nagen Wan ◽

Shuqin Zhang ◽

...

Keyword(s):

Principal Component Analysis ◽

Prediction Model ◽

Disease Severity ◽

Prediction Models ◽

Principal Component ◽

Component Analysis ◽

Sparse Principal Component Analysis ◽

Validation Data ◽

Severity Prediction ◽

Prediction Efficiency

Abstract Background It is important to recognize the coronavirus disease 2019 (COVID-19) patients in severe conditions from moderate ones, thus more effective predictors should be developed. Methods Clinical indicators of COVID-19 patients from two independent cohorts (Training data: Hefei Cohort, 82 patients; Validation data: Nanchang Cohort, 169 patients) were retrospected. Sparse principal component analysis (SPCA) using Hefei Cohort was performed and prediction models were deduced. Prediction results were evaluated by receiver operator characteristic curve and decision curve analysis (DCA) in above two cohorts. Results SPCA using Hefei Cohort revealed that the first 13 principal components (PCs) account for 80.8% of the total variance of original data. The PC1 and PC12 were significantly associated with disease severity with odds ratio of 4.049 and 3.318, respectively. They were used to construct prediction model, named Model-A. In disease severity prediction, Model-A gave the best prediction efficiency with area under curve (AUC) of 0.867 and 0.835 in Hefei and Nanchang Cohort, respectively. Model-A’s simplified version, named as LMN index, gave comparable prediction efficiency as classical clinical markers with AUC of 0.837 and 0.800 in training and validation cohort, respectively. According to DCA, Model-A gave slightly better performance than others and LMN index showed similar performance as albumin or neutrophil-to-lymphocyte ratio. Conclusions Prediction models produced by SPCA showed robust disease severity prediction efficiency for COVID-19 patients and have the potential for clinical application.

Download Full-text

Traffic accident severity prediction based on oversampling and CNN for imbalanced data

10.23919/ccc52363.2021.9549759 ◽

2021 ◽

Author(s):

Anqi Shangguan ◽

Lingxia Mu ◽

Guo Xie ◽

Chenglan Wang ◽

Yang Jing ◽

...

Keyword(s):

Traffic Accident ◽

Imbalanced Data ◽

Severity Prediction ◽

Accident Severity

Download Full-text

Prediction for Traffic Accident Severity: Comparing the Bayesian Network and Regression Models

Mathematical Problems in Engineering ◽

10.1155/2013/475194 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 20

Author(s):

Fang Zong ◽

Hongguo Xu ◽

Huiyong Zhang

Keyword(s):

Bayesian Network ◽

Traffic Safety ◽

Regression Models ◽

Traffic Accident ◽

Goodness Of Fit ◽

Study Results ◽

Severity Prediction ◽

Accident Management ◽

Modeling Techniques ◽

Accident Severity

The paper presents a comparison between two modeling techniques, Bayesian network and Regression models, by employing them in accident severity analysis. Three severity indicators, that is, number of fatalities, number of injuries and property damage, are investigated with the two methods, and the major contribution factors and their effects are identified. The results indicate that the goodness of fit of Bayesian network is higher than that of Regression models in accident severity modeling. This finding facilitates the improvement of accuracy for accident severity prediction. Study results can be applied to the prediction of accident severity, which is one of the essential steps in accident management process. By recognizing the key influences, this research also provides suggestions for government to take effective measures to reduce accident impacts and improve traffic safety.

Download Full-text

Real-Time Traffic Accident Severity Prediction Using Data Mining Technologies

2017 International Conference on Network and Information Systems for Computers (ICNISC) ◽

10.1109/icnisc.2017.00059 ◽

2017 ◽

Author(s):

Xiao-Ling Xia ◽

Bing Nan ◽

Cui Xu

Keyword(s):

Data Mining ◽

Real Time ◽

Traffic Accident ◽

Real Time Traffic ◽

Severity Prediction ◽

Accident Severity ◽

Using Data

Download Full-text

An Alternative Method for Traffic Accident Severity Prediction: Using Deep Forests Algorithm

Journal of Advanced Transportation ◽

10.1155/2020/1257627 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Jing Gan ◽

Linheng Li ◽

Dapeng Zhang ◽

Ziwei Yi ◽

Qiaojun Xiang

Keyword(s):

Machine Learning ◽

Traffic Safety ◽

Traffic Accident ◽

Learning Algorithm ◽

Sustainable Transportation ◽

Machine Learning Algorithm ◽

Statistical Regression ◽

Road Traffic Safety ◽

Severity Prediction ◽

Accident Severity

Traffic safety has always been an important issue in sustainable transportation development, and the prediction of traffic accident severity remains a crucial challenging issue in the domain of traffic safety. A huge variety of forecasting models have been proposed to meet this challenge. These models gradually evolved from linear to nonlinear forms and from traditional statistical regression models to current popular machine learning models. Recently, a machine learning algorithm called Deep Forests based on the decision tree ensemble has aroused widespread concern, which was proposed for the first time by a research team of Nanjing University. This algorithm was proved to be more accurate and robust in comparison with other machine learning algorithms. Motivated by this benefit, this study employs the UK road safety dataset to propose a novel method for predicting the severity of traffic accidents based on the Deep Forests algorithm. To verify the superiority of our proposed method, several other machine learning algorithm-based perdition models were implemented to predict traffic accident severity with the same dataset, and the prediction results show that the Deep Forests algorithm present good stability, fewer hyper-parameters, and the highest accuracy under different level of training data volume. It is expected that the findings from this study would be helpful for the establishment or improvement of effective traffic safety system within a sustainable transportation system, which is of great significance for helping government managers to establish timely proactive strategies in traffic accident prevention and effectively improve road traffic safety.

Download Full-text

Traffic accident severity prediction and cognitive analysis using deep learning

Soft Computing ◽

10.1007/s00500-021-06515-5 ◽

2021 ◽

Author(s):

Thavavel Vaiyapuri ◽

Meenu Gupta

Keyword(s):

Deep Learning ◽

Traffic Accident ◽

Cognitive Analysis ◽

Severity Prediction ◽

Accident Severity

Download Full-text

Traffic accident severity prediction using a novel multi-objective genetic algorithm

International Journal of Crashworthiness ◽

10.1080/13588265.2016.1275431 ◽

2017 ◽

Vol 22 (4) ◽

pp. 425-440 ◽

Cited By ~ 16

Author(s):

Seyed Hessam-Allah Hashmienejad ◽

Seyed Mohammad Hossein Hasheminejad

Keyword(s):

Genetic Algorithm ◽

Traffic Accident ◽

Multi Objective ◽

Multi Objective Genetic Algorithm ◽

Severity Prediction ◽

Accident Severity

Download Full-text

Selection of microbial biomarkers with genetic algorithm and principal component analysis

BMC Bioinformatics ◽

10.1186/s12859-019-3001-4 ◽

2019 ◽

Vol 20 (S6) ◽

Cited By ~ 1

Author(s):

Ping Zhang ◽

Nicholas P. West ◽

Pin-Yen Chen ◽

Mike W. C. Thang ◽

Gareth Price ◽

...

Keyword(s):

Genetic Algorithm ◽

Prediction Model ◽

Principal Components ◽

Prediction Models ◽

Bacterial Species ◽

Principal Component ◽

Biological Studies ◽

Microbial Biomarkers ◽

Highly Correlated ◽

Selection Of

Abstract Background Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to determine whether individuals can be clustered into two classification groups based on pre-determined criteria: control and disease group. However, a combination of other components may exist which better distinguish diseased individuals from healthy controls. Genetic algorithms (GAs) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and a genetic algorithm (GA) for identifying sets of bacterial species associated with obesity and metabolic syndrome (Mets). Results The prediction models built using the combination of principal components (PCs) selected by GA were compared to the models built using the top PCs that explained the most variance in the sample and to models built with selected original variables. The advantages of combining PCA with GA were demonstrated. Conclusions The proposed algorithm overcomes the limitation of PCA for data analysis. It offers a new way to build prediction models that may improve the prediction accuracy. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables.

Download Full-text

Performance comparison of six Data mining models for soft churn customer prediction in Telecom

IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING ◽

10.7251/ijeec1801029m ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Marin Mandić ◽

Goran Kraljević ◽

Ivan Boban

Keyword(s):

Data Mining ◽

Prediction Models ◽

Principal Component ◽

Performance Comparison ◽

Data Sets ◽

Network Algorithms ◽

Imbalance Problem ◽

Definition Of ◽

Multiple Conditions ◽

Selection Of

Due to a high competition in the market, the telecom operators are affected by churn, therefore it is very important for them to identify which users are likely to leave them and switch to the competition telecom company. This research uses data on behaviour of the users from telecom systems that serve to identify patterns in behaviours and thereby recognize the churn. Creating new definition of prepaid soft churn based on multiple conditions is valuable contribution of this paper. At preparing data, a selection of useful attributes was made using the Principal Component Analysis (PCA). The normalization of the attribute values has also been made in order to obtain a proper balance of the influence of all the attributes. Common problem with telecom churn prediction data is imbalance, taking into account the target variable. Such a case is also in the data used in this paper, where the percentage of churners is 12%. Comparison of undersampling and oversampling was performed as a method for resolving the data imbalance problem. Data sets with undersampling and oversasmpling have been used to train the decision tree, logistic regression and neural network algorithms and therefore six prediction models for detecting the churn of the Prepaid users in the telecom were created in this paper. Performance analysis and comparison of the six developed Data mining models was also performed.

Download Full-text