Decision support for preventing safety violations

Aim. The aim of the paper is to examine the experience of reducing the effect of the human factor on business processes, to develop the structure and software of the decisionsupport system for preventing safety violations by train drivers using machine learning and to analyse the findings. Methods. The study presented in the paper uses machine learning, statistical analysis and expert analysis. In terms of machine learning, the following methods were used: logistical regression, random forests, gradient boosting over decision trees with frequency-domain representation of categorical features, neural networks. Results. A set of indicators characterizing a train driver’s operation were identified and are to be used as part of the system under development. The term “train driver’s reliability” was defined as the ability not to violate train traffic safety over a certain number of trips. Algorithms were designed and examined for predicting violations in a train driver’s operation that are used in defining reliability groups and lists of preventive measures recommended for the reduction of the number of safety violations in a train driver’s operation. Major violations with proven guilt of the driver that may be committed within the following 3, 7, 10, 20, 30, 60 days were chosen as attributes for the purpose of safety violation prediction. Analysis of the results on the test sample revealed that the model based on gradient boosting over decision trees with frequency-domain representation of categorical features shows the best results for binary classification on the prediction horizon of 30 and 60 days. The developed algorithm made a correct prediction in 76% of cases with the threshold value of 0.7 and horizon of 30 days and in 82% of cases with the threshold value of 0.9 and horizon of 60 days. The solution of the problem can be found in the integration of different approaches to predicting safety violations in a train driver’s operation. Additionally, 10 of the most significant indicators of a train driver’s operation were identified with the best of the considered models, i.e., gradient boosting over decision trees with frequency-domain representation of categorical features. Conclusion. The paper presents an overview of methods and systems of assessing human reliability and the effect of the human factor on the safety of transportation systems. It allowed choosing the most promising directions and methods of predictive analysis of a train driver’s operation, including methods of machine learning. The resulting set of indicators of a train driver’s operation that take into consideration the changes in the quality of such operation allowed obtaining initial data for training the models implemented as part of the system under development. The implemented models enabled the aggregation of information on train drivers and adoption of targeted and temporary preventive measures recommended for improving driver reliability. The resulting approach to the definition of preventive measures has been implemented in three depots of JSC RZD in trial operation mode.

Download Full-text

Building more accurate decision trees with the additive tree

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1816748116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19887-19893 ◽

Cited By ~ 15

Author(s):

José Marcio Luna ◽

Efstathios D. Gennatas ◽

Lyle H. Ungar ◽

Eric Eaton ◽

Eric S. Diffenderfer ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Ensemble Methods ◽

Predictive Performance ◽

Additive Models ◽

Gradient Boosting ◽

Clear Understanding ◽

High Stakes ◽

Additive Tree ◽

Full Interaction

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

Download Full-text

Application of machine learning methods for automated classification and routing in ITIL

Journal of Physics Conference Series ◽

10.1088/1742-6596/2091/1/012041 ◽

2021 ◽

Vol 2091 (1) ◽

pp. 012041

Author(s):

VV Nikulin ◽

S D Shibaikin ◽

A N Vishnyakov

Keyword(s):

Machine Learning ◽

Human Factor ◽

Gradient Boosting ◽

Automated Classification ◽

It Services ◽

Learning Methods ◽

Machine Learning Methods ◽

Text Information ◽

Comparison Of The Results

Abstract The article analyzes the application of machine learning methods for automated classification and routing in ITIL library. ITSM technology and ITIL are considered. The definitions of the incident and IT services are given. Then, the vectorization and extraction of keywords in the information written in natural language is carried out and lemmatization and TF-IDF measure will be used. A comparative analysis of the application of machine learning methods is given as well as a comparison of the results of automatic classification of text information using gradient boosting and a convolutional neural network. Various parameters of these methods are considered and the most effective method of machine learning is determined. The results of using machine learning methods for automated classification of incidents allows high-precision routing of requests for restoring the operability of IT services, reducing response time and errors associated with the human factor.

Download Full-text

Prediction of Mean Wave Overtopping Discharge Using Gradient Boosting Decision Trees

Water ◽

10.3390/w12061703 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1703 ◽

Cited By ~ 3

Author(s):

Joost P. den Bieman ◽

Josefine M. Wilms ◽

Henk F. P. van den Boogaard ◽

Marcel R. A. van Gent

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Numerical Models ◽

Input Parameter ◽

Design Criterion ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Wave Overtopping ◽

Learning Techniques ◽

Machine Learning Model

Wave overtopping is an important design criterion for coastal structures such as dikes, breakwaters and promenades. Hence, the prediction of the expected wave overtopping discharge is an important research topic. Existing prediction tools consist of empirical overtopping formulae, machine learning techniques like neural networks, and numerical models. In this paper, an innovative machine learning method—gradient boosting decision trees—is applied to the prediction of mean wave overtopping discharges. This new machine learning model is trained using the CLASH wave overtopping database. Optimizations to its performance are realized by using feature engineering and hyperparameter tuning. The model is shown to outperform an existing neural network model by reducing the error on the prediction of the CLASH database by a factor of 2.8. The model predictions follow physically realistic trends for variations of important features, and behave regularly in regions of the input parameter space with little or no data coverage.

Download Full-text

Factors Identification and Prediction for Mind Wandering Driving Using Machine Learning

Journal of Advanced Transportation ◽

10.1155/2021/4216215 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ciyun Lin ◽

Hongli Zhang ◽

Bowen Gong ◽

Dayong Wu

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Traffic Safety ◽

Confusion Matrix ◽

Real Life ◽

Forecast Accuracy ◽

Mind Wandering ◽

Driving Safety ◽

Gradient Boosting ◽

Data Set

Traffic safety is affected by many complex factors. Mind wandering (MW) is a fatal cause affecting driving safety and is hard to be detected and prevented due to its uncertain and complex occurrence mechanism. The aim of this study was to propose a framework for analyzing and predicting MW based on readily available driving status data. The data used in this study are the single-trip information collected by the questionnaire, which includes drivers’ personal characteristics, contextual information in which MW occurs, and in-vehicle environmental factors. After investigating the extent of factors that influence MW, these chosen factors are used to forecast MW. Based on these results, we select factors reliable to be obtained in real life to forecast MW. To verify that the new factors explored are useful in improving the forecast accuracy, the compared analysis is conducted with the results found by our approach and the existing approaches. We compare results obtained by four machine-learning-enabled forecasting approaches on a real-life data set. The result shows that the factors found in this paper can significantly improve forecast accuracy. The confusion matrix, ROC curves, and AUC are conducted, and the performance of the gradient boosting decision tree algorithm is better than other forecast approaches. The importance rankings of most factors obtained by the Gradient Boosting Decision Tree and questionnaire are the same.

Download Full-text

Introducing Machine Learning Models to Response Surface Methodologies

10.5772/intechopen.98191 ◽

2021 ◽

Author(s):

Yang Zhang ◽

Yue Wu

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Response Surface ◽

Linear Models ◽

Influence Factors ◽

Neural Nets ◽

Gradient Boosting ◽

Support Vector ◽

Selection Operator

Traditional response surface methodology (RSM) has utilized the ordinary least squared (OLS) technique to numerically estimate the coefficients for multiple influence factors to achieve the values of the responsive factor while considering the intersection and quadratic terms of the influencers if any. With the emergence and popularization of machine learning (ML), more competitive methods has been developed which can be adopted to complement or replace the tradition RSM method, i.e. the OLS with or without the polynomial terms. In this chapter, several commonly used regression models in the ML including the improved linear models (the least absolute shrinkage and selection operator model and the generalized linear model), the decision trees family (decision trees, random forests and gradient boosting trees), the model of the neural nets, (the multi-layer perceptrons) and the support vector machine will be introduced. Those ML models will provide a more flexible way to estimate the response surface function that is difficult to be represented by a polynomial as deployed in the traditional RSM. The advantage of the ML models in predicting precise response factor values is then demonstrated by implementation on an engineering case study. The case study has shown that the various choices of the ML models can reach a more satisfactory estimation for the responsive surface function in comparison to the RSM. The GDBT has exhibited to outperform the RSM with an accuracy improvement for 50% on unseen experimental data.

Download Full-text

Estimating Mangrove Above-Ground Biomass Using Extreme Gradient Boosting Decision Trees Algorithm with Fused Sentinel-2 and ALOS-2 PALSAR-2 Data in Can Gio Biosphere Reserve, Vietnam

Remote Sensing ◽

10.3390/rs12050777 ◽

2020 ◽

Vol 12 (5) ◽

pp. 777 ◽

Cited By ~ 9

Author(s):

Tien Dat Pham ◽

Nga Nhu Le ◽

Nam Thang Ha ◽

Luong Viet Nguyen ◽

Junshi Xia ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Biosphere Reserve ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Above Ground Biomass ◽

Ground Biomass ◽

Extreme Gradient Boosting ◽

Sentinel 2

This study investigates the effectiveness of gradient boosting decision trees techniques in estimating mangrove above-ground biomass (AGB) at the Can Gio biosphere reserve (Vietnam). For this purpose, we employed a novel gradient-boosting regression technique called the extreme gradient boosting regression (XGBR) algorithm implemented and verified a mangrove AGB model using data from a field survey of 121 sampling plots conducted during the dry season. The dataset fuses the data of the Sentinel-2 multispectral instrument (MSI) and the dual polarimetric (HH, HV) data of ALOS-2 PALSAR-2. The performance standards of the proposed model (root-mean-square error (RMSE) and coefficient of determination (R2)) were compared with those of other machine learning techniques, namely gradient boosting regression (GBR), support vector regression (SVR), Gaussian process regression (GPR), and random forests regression (RFR). The XGBR model obtained a promising result with R2 = 0.805, RMSE = 28.13 Mg ha−1, and the model yielded the highest predictive performance among the five machine learning models. In the XGBR model, the estimated mangrove AGB ranged from 11 to 293 Mg ha−1 (average = 106.93 Mg ha−1). This work demonstrates that XGBR with the combined Sentinel-2 and ALOS-2 PALSAR-2 data can accurately estimate the mangrove AGB in the Can Gio biosphere reserve. The general applicability of the XGBR model combined with multiple sourced optical and SAR data should be further tested and compared in a large-scale study of forest AGBs in different geographical and climatic ecosystems.

Download Full-text

XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring

Mathematical Problems in Engineering ◽

10.1155/2021/6655510 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Chao Qin ◽

Yunfeng Zhang ◽

Fangxun Bao ◽

Caiming Zhang ◽

Peide Liu ◽

...

Keyword(s):

Machine Learning ◽

Particle Swarm Optimization ◽

Decision Trees ◽

Credit Scoring ◽

Particle Swarm ◽

Gradient Boosting ◽

Hyperparameter Optimization ◽

Swarm Optimization ◽

Adaptive Particle Swarm Optimization ◽

Extreme Gradient Boosting

Personal credit scoring is a challenging issue. In recent years, research has shown that machine learning has satisfactory performance in credit scoring. Because of the advantages of feature combination and feature selection, decision trees can match credit data which have high dimension and a complex correlation. Decision trees tend to overfitting yet. eXtreme Gradient Boosting is an advanced gradient enhanced tree that overcomes its shortcomings by integrating tree models. The structure of the model is determined by hyperparameters, which is aimed at the time-consuming and laborious problem of manual tuning, and the optimization method is employed for tuning. As particle swarm optimization describes the particle state and its motion law as continuous real numbers, the hyperparameter applicable to eXtreme Gradient Boosting can find its optimal value in the continuous search space. However, classical particle swarm optimization tends to fall into local optima. To solve this problem, this paper proposes an eXtreme Gradient Boosting credit scoring model that is based on adaptive particle swarm optimization. The swarm split, which is based on the clustering idea and two kinds of learning strategies, is employed to guide the particles to improve the diversity of the subswarms, in order to prevent the algorithm from falling into a local optimum. In the experiment, several traditional machine learning algorithms and popular ensemble learning classifiers, as well as four hyperparameter optimization methods (grid search, random search, tree-structured Parzen estimator, and particle swarm optimization), are considered for comparison. Experiments were performed with four credit datasets and seven KEEL benchmark datasets over five popular evaluation measures: accuracy, error rate (type I error and type II error), Brier score, and F 1 score. Results demonstrate that the proposed model outperforms other models on average. Moreover, adaptive particle swarm optimization performs better than the other hyperparameter optimization strategies.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Development of innovative hygienic technologies for preservation of the workers’ reproductive health

Russian Journal of Occupational Health and Industrial Ecology ◽

10.31089/1026-9428-2019-59-9-792-793 ◽

2020 ◽

pp. 792-792

Author(s):

M. A. Fesenko ◽

G. V. Golovaneva ◽

A. V. Miskevich

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Reproductive Health ◽

Reproductive System ◽

Preventive Measures ◽

Learning Algorithms ◽

Reproductive Function ◽

Machine Learning Algorithms ◽

New Model

The new model «Prognosis of men’ reproductive function disorders» was developed. The machine learning algorithms (artificial intelligence) was used for this purpose, the model has high prognosis accuracy. The aim of the model applying is prioritize diagnostic and preventive measures to minimize reproductive system diseases complications and preserve workers’ health and efficiency.

Download Full-text

An Introduction to Machine Learning for Panel Data: Decision Trees, Random Forests, and Other Dendrological Methods

SSRN Electronic Journal ◽

10.2139/ssrn.3717879 ◽

2020 ◽

Author(s):

James Ming Chen

Keyword(s):

Machine Learning ◽

Panel Data ◽

Decision Trees ◽

Random Forests

Download Full-text