Driver Behavior Classification System Analysis Using Machine Learning Methods

Distraction while driving occurs when a driver is engaged in non-driving activities. These activities reduce the driver’s attention and focus on the road, therefore increasing the risk of accidents. As a consequence, the number of accidents increases and infrastructure is damaged. Cars are now equipped with different safety precautions that ensure driver awareness and attention at all times. The first step for such systems is to define whether the driver is distracted or not. Different methods are proposed to detect such distractions, but they lack efficiency when tested in real-life situations. In this paper, four machine learning classification methods are implemented and compared to identify drivers’ behavior and distraction situations based on real data corresponding to different behaviors such as aggressive, drowsy and normal. The data were randomized for a better application of the methods. We demonstrate that the gradient boosting method outperforms the other used classifiers.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03658-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Toktam Khatibi ◽

Elham Hanifi ◽

Mohammad Mehdi Sepehri ◽

Leila Allahqoli

Keyword(s):

Machine Learning ◽

External Validation ◽

Fetal Loss ◽

Null Distribution ◽

Training Dataset ◽

Gradient Boosting ◽

Support Vector ◽

Cross Sectional ◽

Boosting Method ◽

Demographic Features

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.

Download Full-text

Improving the performance of a radio-frequency localization system in adverse outdoor applications

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-02001-6 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Marcelo N. de Sousa ◽

Ricardo Sant’Ana ◽

Rigel P. Fernandes ◽

Julio Cesar Duarte ◽

José A. Apolinário ◽

...

Keyword(s):

Random Forest ◽

Ray Tracing ◽

Real World ◽

Practical Implication ◽

Real Life ◽

Simulated Data ◽

Real Data ◽

Gradient Boosting ◽

Real World Data ◽

Localization Accuracy

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Prediction of sports attendance: A comparative analysis

Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology ◽

10.1177/1754337120983135 ◽

2020 ◽

pp. 175433712098313

Author(s):

Mehmet Şahin ◽

Murat Uçar

Keyword(s):

Neural Network ◽

Machine Learning ◽

Comparative Analysis ◽

National Football League ◽

Relevant Literature ◽

Performance Evaluations ◽

Influential Factor ◽

Gradient Boosting ◽

Home Team ◽

Machine Learning Methods

In this study, a comparative analysis for predicting sports attendance demand is presented based on econometric, artificial intelligence, and machine learning methodologies. Data from more than 20,000 games from three major leagues, namely the National Basketball Association (NBA), National Football League (NFL), and Major League Baseball (MLB), were used for training and testing the approaches. The relevant literature was examined to determine the most useful variables as potential regressors in forecasting. To reveal the most effective approach, three scenarios containing seven cases were constructed. In the first scenario, each league was evaluated separately. In the second scenario, the three possible combinations of league pairings were evaluated, while in the third scenario, all three leagues were evaluated together. The performance evaluations of the results suggest that one of the machine learning methods, Gradient Boosting, outperformed the other methods used. However, the Artificial Neural Network, deep Convolutional Neural Network, and Decision Trees also provided productive and competitive predictions for sports games. Based on the results, the predictions for the NBA and NFL leagues are more satisfactory than the predictions of the MLB, which may be caused by the structure of the MLB. The results of the sensitivity analysis indicate that the performance of the home team is the most influential factor for all three leagues.

Download Full-text

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods

Animals ◽

10.3390/ani11072066 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2066

Author(s):

Swati Srivastava ◽

Bryan Irvine Lopez ◽

Himansu Kumar ◽

Myoungjin Jang ◽

Han-Ha Chai ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Eye Muscle ◽

Important Species ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

Predictive Correlation ◽

Hanwoo Cattle

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Predictive Credit Risk Analytics Using Borrowers' Digital Footprint and Methods of Statistical Machine Learning

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.358-372 ◽

2021 ◽

Vol 12 (7) ◽

pp. 358-372

Author(s):

E. V. Orlova ◽

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Risk Profile ◽

Classification Model ◽

Gradient Boosting ◽

Suggested Approach ◽

Digital Footprint ◽

Credit Risks ◽

Stochastic Gradient Boosting ◽

Boosting Method

The article considers the problem of reducing the banks credit risks associated with the insolvency of borrowers — individuals using financial, socio-economic factors and additional data about borrowers digital footprint. A critical analysis of existing approaches, methods and models in this area has been carried out and a number of significant shortcomings identified that limit their application. There is no comprehensive approach to identifying a borrowers creditworthiness based on information, including data from social networks and search engines. The new methodological approach for assessing the borrowers risk profile based on the phased processing of quantitative and qualitative data and modeling using methods of statistical analysis and machine learning is proposed. Machine learning methods are supposed to solve clustering and classification problems. They allow to automatically determine the data structure and make decisions through flexible and local training on the data. The method of hierarchical clustering and the k-means method are used to identify similar social, anthropometric and financial indicators, as well as indicators characterizing the digital footprint of borrowers, and to determine the borrowers risk profile over group. The obtained homogeneous groups of borrowers with a unique risk profile are further used for detailed data analysis in the predictive classification model. The classification model is based on the stochastic gradient boosting method to predict the risk profile of a potencial borrower. The suggested approach for individuals creditworthiness assessing will reduce the banks credit risks, increase its stability and profitability. The implementation results are of practical importance. Comparative analysis of the effectiveness of the existing and the proposed methodology for assessing credit risk showed that the new methodology provides predictive analytics of heterogeneous information about a potential borrower and the accuracy of analytics is higher. The proposed techniques are the core for the decision support system for justification of individuals credit conditions, minimizing the aggregate credit risks.

Download Full-text

Real-Time V2V Communication With a Machine Learning-Based System for Detecting Drowsiness of Drivers

International Journal of Interdisciplinary Telecommunications and Networking ◽

10.4018/ijitn.2021100104 ◽

2021 ◽

Vol 13 (4) ◽

pp. 35-50

Author(s):

Ahmed Y. Awad ◽

Seshadri Mohan

Keyword(s):

Machine Learning ◽

Traffic Accidents ◽

Ad Hoc ◽

Economic Losses ◽

Drowsy Driving ◽

The Road ◽

V2v Communication ◽

Physical Injuries ◽

On The Road ◽

The U.S

This article applies machine learning to detect whether a driver is drowsy and alert the driver. The drowsiness of a driver can lead to accidents resulting in severe physical injuries, including deaths, and significant economic losses. Driver fatigue resulting from sleep deprivation causes major accidents on today's roads. In 2010, nearly 24 million vehicles were involved in traffic accidents in the U.S., which resulted in more than 33,000 deaths and over 3.9 million injuries, according to the U.S. NHTSA. A significant percentage of traffic accidents can be attributed to drowsy driving. It is therefore imperative that an efficient technique is designed and implemented to detect drowsiness as soon as the driver feels drowsy and to alert and wake up the driver and thereby preventing accidents. The authors apply machine learning to detect eye closures along with yawning of a driver to optimize the system. This paper also implements DSRC to connect vehicles and create an ad hoc vehicular network on the road. When the system detects that a driver is drowsy, drivers of other nearby vehicles are alerted.

Download Full-text

Short- and Medium-range Prediction of Relativistic Electron Flux in the Earth’s Outer Radiation Belt by Machine Learning Methods

Meteorologiya i Gidrologiya ◽

10.52002/0130-2906-2021-3-47-57 ◽

2021 ◽

Vol 3 ◽

pp. 47-57

Author(s):

I. N. Myagkova ◽

◽

V. R. Shirokii ◽

Yu. S. Shugai ◽

O. G. Barinov ◽

...

Keyword(s):

Machine Learning ◽

Radiation Belt ◽

Gradient Boosting ◽

Relativistic Electrons ◽

Learning Methods ◽

Outer Radiation Belt ◽

Machine Learning Methods ◽

The Earth ◽

Skill Scores ◽

Medium Range

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.

Download Full-text