Comparative Study of Machine Learning Classifiers for Modelling Road Traffic Accidents

Road traffic accidents (RTAs) are a major cause of injuries and fatalities worldwide. In recent years, there has been a growing global interest in analysing RTAs, specifically concerned with analysing and modelling accident data to better understand and assess the causes and effects of accidents. This study analysed the performance of widely used machine learning classifiers using a real-life RTA dataset from Gauteng, South Africa. The study aimed to assess prediction model designs for RTAs to assist transport authorities and policymakers. It considered classifiers such as naïve Bayes, logistic regression, k-nearest neighbour, AdaBoost, support vector machine, random forest, and five missing data methods. These classifiers were evaluated using five evaluation metrics: accuracy, root-mean-square error, precision, recall, and receiver operating characteristic curves. Furthermore, the assessment involved parameter adjustment and incorporated dimensionality reduction techniques. The empirical results and analyses show that the RF classifier, combined with multiple imputations by chained equations, yielded the best performance when compared with the other combinations.

Download Full-text

Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115604 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5604

Author(s):

Yookyung Boo ◽

Youngjin Choi

Keyword(s):

Traffic Accidents ◽

Road Traffic ◽

National Level ◽

Brier Score ◽

Support Vector ◽

Main Diagnosis ◽

Road Traffic Accidents ◽

Multiple Variables ◽

Type Of Injury ◽

Kernel Model

In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tested using five years of national-level data from the Korea Disease Control and Prevention Agency’s (KDCA) National Hospital Discharge In-Depth Survey (2013 through to 2017). Model performance was measured for accuracy, precision, recall, F1 score, and Brier score metrics using classification analysis that included characteristics of patients, accidents, injuries, and illnesses. Due to the number of variables and differing units, the rates of survival and mortality related to road traffic accidents were imbalanced, so the data was corrected and standardized before the classification models’ performances were compared. Using the importance analysis, the main diagnosis, the type of injury, the site of the injury, the type of injury, the operation status, the type of accident, the role at the time of the accident, and the sex were selected as the analysis factors. The biggest contributing factor was the role in the accident, which is the driver, and the major sites of the injuries were head injuries and deep injuries. Using selected factors, comparisons of the classification performance of each model indicated RBF-SVM and RF models were superior to the others. Of the SVM models, the RBF kernel model was superior to the linear kernel model; it can be inferred that the performance of the high-dimensional transformed RBF model is superior when the dimension is complex because of the use of multiple variables. The findings suggest there are limitations to analyses involving imbalanced, multidimensional original data, such as data on road traffic mortality. Thus, analyses must be performed after imbalances are corrected.

Download Full-text

Classifying Lensed Gravitational Waves in the Geometrical Optics Limit with Machine Learning

American Journal of Undergraduate Research ◽

10.33697/ajur.2019.019 ◽

2019 ◽

Vol 16 (2) ◽

pp. 5-16

Author(s):

Amit Singh ◽

Ivan Li ◽

Otto Hannuksela ◽

Tjonnie Li ◽

Kyungmin Kim

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Gravitational Wave ◽

Gravitational Waves ◽

Geometrical Optics ◽

Supervised Machine Learning ◽

Support Vector ◽

Multi Layer Perceptron ◽

Machine Learning Classifiers ◽

Learning Classifiers

Gravitational waves are theorized to be gravitationally lensed when they propagate near massive objects. Such lensing effects cause potentially detectable repeated gravitational wave patterns in ground- and space-based gravitational wave detectors. These effects are difficult to discriminate when the lens is small and the repeated patterns superpose. Traditionally, matched filtering techniques are used to identify gravitational-wave signals, but we instead aim to utilize machine learning techniques to achieve this. In this work, we implement supervised machine learning classifiers (support vector machine, random forest, multi-layer perceptron) to discriminate such lensing patterns in gravitational wave data. We train classifiers with spectrograms of both lensed and unlensed waves using both point-mass and singular isothermal sphere lens models. As the result, classifiers return F1 scores ranging from 0:852 to 0:996, with precisions from 0:917 to 0:992 and recalls ranging from 0:796 to 1:000 depending on the type of classifier and lensing model used. This supports the idea that machine learning classifiers are able to correctly determine lensed gravitational wave signals. This also suggests that in the future, machine learning classifiers may be used as a possible alternative to identify lensed gravitational wave events and to allow us to study gravitational wave sources and massive astronomical objects through further analysis. KEYWORDS: Gravitational Waves; Gravitational Lensing; Geometrical Optics; Machine Learning; Classification; Support Vector Machine; Random Tree Forest; Multi-layer Perceptron

Download Full-text

Different Machine Learning Classifiers for Music Emotion Recognition

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7833.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2187-2191

Keyword(s):

Machine Learning ◽

Emotion Recognition ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Classifier ◽

Promising Alternative ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Statistical Metrics

Music in an essential part of life and the emotion carried by it is key to its perception and usage. Music Emotion Recognition (MER) is the task of identifying the emotion in musical tracks and classifying them accordingly. The objective of this research paper is to check the effectiveness of popular machine learning classifiers like XGboost, Random Forest, Decision Trees, Support Vector Machine (SVM), K-Nearest-Neighbour (KNN) and Gaussian Naive Bayes on the task of MER. Using the MIREX-like dataset [17] to test these classifiers, the effects of oversampling algorithms like Synthetic Minority Oversampling Technique (SMOTE) [22] and Random Oversampling (ROS) were also verified. In all, the Gaussian Naive Bayes classifier gave the maximum accuracy of 40.33%. The other classifiers gave accuracies in between 20.44% and 38.67%. Thus, a limit on the classification accuracy has been reached using these classifiers and also using traditional musical or statistical metrics derived from the music as input features. In view of this, deep learning-based approaches using Convolutional Neural Networks (CNNs) [13] and spectrograms of the music clips for MER is a promising alternative.

Download Full-text

Business Intelligence and Data Warehouse Technologies for Traffic Accident Data Analysis in Botswana

10.5121/csit.2021.111720 ◽

2021 ◽

Author(s):

Monkgogi Mudongo ◽

Edwin Thuma ◽

Nkwebi Peace Motlogelwa ◽

Tebo Leburu-Dingalo ◽

Pulafela Majoo

Keyword(s):

Data Warehouse ◽

Business Intelligence ◽

Traffic Accidents ◽

Road Traffic ◽

Road Traffic Accidents ◽

Managerial Decision ◽

Vehicle Registration ◽

Accident Severity ◽

Accident Data ◽

Causes Of Deaths

Road traffic accidents are a serious problem for the nation of Botswana. A large amount of money is used to compensate those who are affected by road accidents. Traffic accidents are one of the major causes of Deaths in Botswana. It is important for relevant organizations to have a reliable source of data for accurate evaluation of traffic accidents. Similarly, data on vehicle registration must be transformed and be readily available to assist managerial decision makers. In this article, we deploy a Business Intelligence (BI) and Data Warehouse (DW) solution in an attempt to assist the relevant departments in their road traffic accidents and vehicle registration evaluation. In Our evaluation of the traffic accidents our findings suggest that across accident severity, Damage Only accidents had the most interesting recent trend with a 11.93% decrease in the last 3 years on record. Count of Accident Severity for Damage Only accidents dropped from 13,491 to 11,881 between 2018 and 2020 whilst Minor accidents experienced the longest period of growth. Most accidents take place in rural locations and more accidents take place during the weekend. At 28,439, Sunday had the highest number of accidents and was 47.59% higher than Wednesday, which had the lowest count of accidents at 19,269. The results for vehicle registration reveal that the number of vehicle registration decreased for the last 3 years on record. The number of vehicles registered dropped from 65535 to 24457 during its steepest decline between 2019 and 2021.

Download Full-text

Investigating the Physics of Tokamak Global Stability with Interpretable Machine Learning Tools

Applied Sciences ◽

10.3390/app10196683 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6683

Author(s):

Andrea Murari ◽

Emmanuele Peluso ◽

Michele Lungaroni ◽

Riccardo Rossi ◽

Michela Gelfusa ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Independent Learning ◽

Support Vector ◽

Learning Tools ◽

Feedback Systems ◽

Theoretical Understanding ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Mining Tools

The inadequacies of basic physics models for disruption prediction have induced the community to increasingly rely on data mining tools. In the last decade, it has been shown how machine learning predictors can achieve a much better performance than those obtained with manually identified thresholds or empirical descriptions of the plasma stability limits. The main criticisms of these techniques focus therefore on two different but interrelated issues: poor “physics fidelity” and limited interpretability. Insufficient “physics fidelity” refers to the fact that the mathematical models of most data mining tools do not reflect the physics of the underlying phenomena. Moreover, they implement a black box approach to learning, which results in very poor interpretability of their outputs. To overcome or at least mitigate these limitations, a general methodology has been devised and tested, with the objective of combining the predictive capability of machine learning tools with the expression of the operational boundary in terms of traditional equations more suited to understanding the underlying physics. The proposed approach relies on the application of machine learning classifiers (such as Support Vector Machines or Classification Trees) and Symbolic Regression via Genetic Programming directly to experimental databases. The results are very encouraging. The obtained equations of the boundary between the safe and disruptive regions of the operational space present almost the same performance as the machine learning classifiers, based on completely independent learning techniques. Moreover, these models possess significantly better predictive power than traditional representations, such as the Hugill or the beta limit. More importantly, they are realistic and intuitive mathematical formulas, which are well suited to supporting theoretical understanding and to benchmarking empirical models. They can also be deployed easily and efficiently in real-time feedback systems.

Download Full-text

Linear SVM-Based Android Malware Detection for Reliable IoT Services

Journal of Applied Mathematics ◽

10.1155/2014/594501 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 35

Author(s):

Hyo-Sik Ham ◽

Hwan-Hee Kim ◽

Myung-Sup Kim ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Mobile Devices ◽

Malware Detection ◽

Information Leakage ◽

Support Vector ◽

Android Malware ◽

Machine Learning Classifiers ◽

Android Malware Detection ◽

Learning Classifiers ◽

Linear Svm

Current many Internet of Things (IoT) services are monitored and controlled through smartphone applications. By combining IoT with smartphones, many convenient IoT services have been provided to users. However, there are adverse underlying effects in such services including invasion of privacy and information leakage. In most cases, mobile devices have become cluttered with important personal user information as various services and contents are provided through them. Accordingly, attackers are expanding the scope of their attacks beyond the existing PC and Internet environment into mobile devices. In this paper, we apply a linear support vector machine (SVM) to detect Android malware and compare the malware detection performance of SVM with that of other machine learning classifiers. Through experimental validation, we show that the SVM outperforms other machine learning classifiers.

Download Full-text

Empirical Modelling for exploring the factors contributing to disability severity from road traffic accidents in Thailand

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201262.54339 ◽

1970 ◽

Vol 6 (2) ◽

pp. 176-185

Author(s):

Jaratsri Rungrattanaubol ◽

Anamai Na-udom ◽

Antony Harfield

Keyword(s):

Neural Network ◽

Data Mining ◽

Traffic Accidents ◽

Injury Severity ◽

Road Traffic ◽

Road Traffic Accidents ◽

Empirical Modelling ◽

Standard Data ◽

Accident Data ◽

Computer Based

This paper introduces a computer-based model for predicting the severity of injuries in road traffic accidents. Using accident data from surveys at hospitals in Thailand, standard data mining techniques were applied to train and test a multilayer perceptron neural network. The resulting neural network specification was loaded into an interactive environment called EDEN that enables further exploration of the computer-based model. Although the model can be used for the classification of accident data in terms of injury severity (in a similar way to other data mining tools), the EDEN tool enables deeper exploration of the underlying factors that might affect injury severity in road traffic accidents. The aim of this paper is to describe the development of the computer-based model and to demonstrate the potential of EDEN as an interactive tool for knowledge discovery.

Download Full-text

A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things

Electronics ◽

10.3390/electronics10161955 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1955

Author(s):

Ikram Sumaiya Thaseen ◽

Vanitha Mohanraj ◽

Sakthivel Ramachandran ◽

Kishore Sanapala ◽

Sang-Soo Yeo

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Experimental Analysis ◽

Parameter Tuning ◽

Computational Time ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Machine Learning Classifiers ◽

Learning Classifiers

In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.

Download Full-text

Interrogating machine learning classifiers and dimensionality reduction techniques for radiomic prediction of glioma tumor grade.

Journal of Clinical Oncology ◽

10.1200/jco.2018.36.15_suppl.2031 ◽

2018 ◽

Vol 36 (15_suppl) ◽

pp. 2031-2031

Author(s):

Kareem Wahid ◽

Aikaterini Kotrotsou ◽

Srishti Abrol ◽

Ahmed Hassan ◽

Nabil Elshafeey ◽

...

Keyword(s):

Machine Learning ◽

Dimensionality Reduction ◽

Tumor Grade ◽

Machine Learning Classifiers ◽

Reduction Techniques ◽

Learning Classifiers ◽

Dimensionality Reduction Techniques ◽

Glioma Tumor

Download Full-text

Traffic Accidents Severity Prediction using Support Vector Machine Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4393.059720 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1345-1350

Keyword(s):

Support Vector Machine ◽

Traffic Accidents ◽

Road Traffic ◽

Children And Youth ◽

Support Vector ◽

Road Traffic Accidents ◽

Linear Kernel ◽

Road Accidents ◽

Severity Prediction ◽

Decision Making Model

In recent years, road traffic accidents (RTA) have become one of the highest national health concerns worldwide. RTA have become the leading cause of losing lives among children and youth. Recent studies have proven that Data Mining Techniques can break down the complexity that prevails between RTA and corresponding factors. In this paper, Support Vector Machine (SVM) based on Radial basis function (RBF) and Linear Kernel Function is applied to predict fatal road accidents in Lebanon. The experimental results reveal that SVM using RBF give the highest accuracy (86%) and the best AUC (86.6%). The obtained decision-making model claims to tackle the fatal RTA phenomenon.

Download Full-text