Spatio-Temporal Segmented Traffic Flow Prediction with ANPRS Data Based on Improved XGBoost

Traffic prediction is highly significant for intelligent traffic systems and traffic management. eXtreme Gradient Boosting (XGBoost), a scalable tree lifting algorithm, is proposed and improved to predict more high-resolution traffic state by utilizing origin-destination (OD) relationship of segment flow data between upstream and downstream on the highway. In order to achieve fine prediction, a generalized extended-segment data acquirement mode is added by incorporating information of Automatic Number Plate Recognition System (ANPRS) from exits and entrances of toll stations and acquired by mathematical OD calculation indirectly without cameras. Abnormal data preprocessing and spatio-temporal relationship matching are conducted to ensure the effectiveness of prediction. Pearson analysis of spatial correlation is performed to find the relevance between adjacent roads, and the relative importance of input modes can be verified by spatial lag input and ordinary input. Two improved models, independent XGBoost (XGBoost-I) with individual adjustment parameters of different sections and static XGBoost (XGBoost-S) with overall adjustment of parameters, are conducted and combined with temporal relevant intervals and spatial staggered sectional lag. The early_stopping_rounds adjustment mechanism (EAM) is introduced to improve the effect of the XGBoost model. The prediction accuracy of XGBoost-I-lag is generally higher than XGBoost-I, XGBoost-S-lag, XGBoost-S, and other baseline methods for short-term and long-term multistep ahead. Additionally, the accuracy of the XGBoost-I-lag is evaluated well in nonrecurrent conditions and missing cases with considerable running time. The experiment results indicate that the proposed framework is convincing, satisfactory, and computationally reasonable.

Download Full-text

An amalgamation of YOLOv4 and XGBoost for next-gen smart traffic management system

PeerJ Computer Science ◽

10.7717/peerj-cs.586 ◽

2021 ◽

Vol 7 ◽

pp. e586

Author(s):

Pritul Dave ◽

Arjun Chandarana ◽

Parth Goel ◽

Amit Ganatra

Keyword(s):

Waiting Time ◽

Traffic Management ◽

Green Light ◽

Gradient Boosting ◽

Optimal Time ◽

Traffic Light ◽

Ensemble Technique ◽

Extreme Gradient Boosting ◽

On The Road ◽

Smart Traffic

The traffic congestion and the rise in the number of vehicles have become a grievous issue, and it is focused worldwide. One of the issues with traffic management is that the traffic light’s timer is not dynamic. As a result, one has to remain longer even if there are no or fewer vehicles, on a roadway, causing unnecessary waiting time, fuel consumption and leads to pollution. Prior work on smart traffic management systems repurposes the use of Internet of things, Time Series Forecasting, and Digital Image Processing. Computer Vision-based smart traffic management is an emerging area of research. Therefore a real-time traffic light optimization algorithm that uses Machine Learning and Deep Learning Techniques to predict the optimal time required by the vehicles to clear the lane is presented. This article concentrates on a two-step approach. The first step is to obtain the count of the independent category of the class of vehicles. For this, the You Only Look Once version 4 (YOLOv4) object detection technique is employed. In the second step, an ensemble technique named eXtreme Gradient Boosting (XGBoost) for predicting the optimal time of the green light window is implemented. Furthermore, the different implemented versions of YOLO and different prediction algorithms are compared with the proposed approach. The experimental analysis signifies that YOLOv4 with the XGBoost algorithm produces the most precise outcomes with a balance of accuracy and inference time. The proposed approach elegantly reduces an average of 32.3% of waiting time with usual traffic on the road.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

A Spatio-temporal analysis of Baboon Damage using Sentinel-2 imagery and Extreme Gradient Boosting

Geocarto International ◽

10.1080/10106049.2020.1837259 ◽

2020 ◽

pp. 1-14

Author(s):

Regardt Ferreira ◽

Kabir Peerbhay ◽

Josua Louw ◽

Ilaria Germizhuizen ◽

Andrew Morris ◽

...

Keyword(s):

Temporal Analysis ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal ◽

Sentinel 2

Download Full-text

Higher Order Statistics Based Blind Steg analysis using Deep Learning

Journal of Ravishankar University (Part-B) ◽

10.52228/jrub.2021-34-1-3 ◽

2021 ◽

Vol 34 (1) ◽

pp. 19-28

Author(s):

S. Bera ◽

K. Thakur ◽

P. Vyas ◽

M. Thakur ◽

A. Shrivastava

Keyword(s):

Transition Probability ◽

Principal Component ◽

Transition Probability Matrix ◽

Higher Order ◽

Gradient Boosting ◽

Detection Accuracy ◽

Stego Image ◽

Markov Transition ◽

Extreme Gradient Boosting ◽

Relationship Of

Universal isteganalysis of grey level JPEG images is addressed by modelling the neighbourhood relationship of the image coefficients using the higher order statistical method developed by two-step Markov Transition Probability Matrix (TPM). The implementation of TPM together with the neighbouring pixel relationship provides a better and comparable detection results. The detection accuracy is evaluated on the stego image database using eXtreme Gradient Boosting (XGBoost) with Principal Component Analysis (PCA) on nsF5 and JUNIWARD hiding techniques. Execution time is also compared for all the classifiers. The images are taken from Green spun library and Google website- eXtreme Gradient Boosting.

Download Full-text

Handwritten Gurmukhi Digit Recognition System for Small Datasets

Traitement du signal ◽

10.18280/ts.370416 ◽

2020 ◽

Vol 37 (4) ◽

pp. 661-669

Author(s):

Gurpartap Singh ◽

Sunil Agrawal ◽

Balwinder Singh Sohi

Keyword(s):

Recognition Accuracy ◽

Recognition System ◽

Gradient Boosting ◽

Support Vector ◽

Discrete Wavelet ◽

Testing Time ◽

Training Time ◽

Digit Recognition ◽

Extreme Gradient Boosting ◽

The Impact

In the present study, a method to increase the recognition accuracy of Gurmukhi (Indian Regional Script) Handwritten Digits has been proposed. The proposed methodology uses a DCNN (Deep Convolutional Neural Network) with a cascaded XGBoost (Extreme Gradient Boosting) algorithm. Also, a comprehensive analysis has been done to apprehend the impact of kernel size of DCNN on recognition accuracy. The reason for using DCNN is its impressive performance in terms of recognition accuracy of handwritten digits, but in order to achieve good recognition accuracy, DCNN requires a huge amount of data and also significant training/testing time. In order to increase the accuracy of DCNN for a small dataset more images have been generated by applying a shear transformation (A transformation that preserves parallelism but not length and angles) to the original images. To address the issue of large training time only two hidden layers along with selective cascading XGBoost among the misclassified digits have been used. Also, the issue of overfitting is discussed in detail and has been reduced to a great extent. Finally, the results are compared with performance of some recent techniques like SVM (Support Vector Machine) Random Forest, and XGBoost classifiers on DCT (Discrete Cosine Transform) and DWT (Discrete Wavelet Transform) features obtained on the same dataset. It is found that proposed methodology can outperform other techniques in terms of overall rate of recognition.

Download Full-text

Application of spatio-temporal data in site-specific maize yield prediction with machine learning methods

Precision Agriculture ◽

10.1007/s11119-021-09833-8 ◽

2021 ◽

Author(s):

A. Nyéki ◽

C. Kerepesi ◽

B. Daróczy ◽

A. Benczúr ◽

G. Milics ◽

...

Keyword(s):

Neural Networks ◽

Maize Yield ◽

New Method ◽

Gradient Boosting ◽

Yield Prediction ◽

Temporal Data ◽

Site Specific ◽

Extreme Gradient Boosting ◽

Maize Yields ◽

Spatio Temporal

AbstractIn order to meet the requirements of sustainability and to determine yield drivers and limiting factors, it is now more likely that traditional yield modelling will be carried out using artificial intelligence (AI). The aim of this study was to predict maize yields using AI that uses spatio-temporal training data. The paper has advanced a new method of maize yield prediction, which is based on spatio-temporal data mining. To find the best solution, various models were used: counter-propagation artificial neural networks (CP-ANNs), XY-fused Querynetworks (XY-Fs), supervised Kohonen networks (SKNs), neural networks with Rectangular Linear Activations (ReLU), extreme gradient boosting (XGBoost), support-vector machine (SVM), and different subsets of the independent variables in five vegetation periods. Input variables for modelling included: soil parameters (pH, P2O5, K2O, Zn, clay content, ECa, draught force, Cone index), micro-relief averages, and meteorological parameters for the 63 treatment units in a 15.3 ha research field. The best performing method (XGBoost) reached 92.1% and 95.3% accuracy on the training and the test sets. Additionally, a novel method was introduced to treat individual units in a lattice system. The lattice-based smoothing performed an additional increase in Area under the curve (AUC) to 97.5% over the individual predictions of the XGBoost model. The models were developed using 48 different subsets of variables to determine which variables consistently contributed to prediction accuracy. By comparing the resulting models, it was shown that the best regression model was Extreme Gradient Boosting Trees, with 92.1% accuracy (on the training set). In addition, the method calculates the influence of the spatial distribution of site-specific soil fertility on maize grain yields. This paper provides a new method of spatio-temporal data analyses, taking the most important influencing factors on maize yields into account.

Download Full-text

Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost

Sustainability ◽

10.3390/su13020926 ◽

2021 ◽

Vol 13 (2) ◽

pp. 926

Author(s):

Manze Guo ◽

Zhenzhou Yuan ◽

Bruce Janson ◽

Yongxin Peng ◽

Yang Yang ◽

...

Keyword(s):

Traffic Management ◽

Significant Risk ◽

Logistic Models ◽

Classification Problem ◽

Gradient Boosting ◽

Traffic Crashes ◽

Crash Data ◽

Extreme Gradient Boosting ◽

Pedestrian Traffic ◽

Pedestrian Crashes

Older pedestrians are vulnerable on the streets and at significant risk of injury or death when involved in crashes. Pedestrians’ safety is critical for roadway agencies to consider and improve, especially older pedestrians aged greater than 65 years old. To better protect the older pedestrian group, the factors that contribute to the older crashes need to be analyzed deeply. Traditional modeling approaches such as Logistic models for data analysis may lead to modeling distortions due to the independence assumptions. In this study, Extreme Gradient Boosting (XGBoost), is used to model the classification problem of three different levels of severity of older pedestrian traffic crashes from crash data in Colorado, US. Further, Shapley Additive explanations (SHAP) are implemented to interpret the XGBoost model result and analyze each feature’s importance related to the levels of older pedestrian crashes. The interpretation results show that the driver characteristic, older pedestrian characteristics, and vehicle movement are the most important factors influencing the probability of the three different severity levels. Those results investigate each severity level’s correlation factors, which can inform the department of traffic management and the department of road infrastructure to protect older pedestrians by controlling or managing some of those significant features.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i4.1076 ◽

2020 ◽

pp. 253-262

Author(s):

Mohammad Hamim Zajuli Al Faroby ◽

Mohammad Isa Irawan ◽

Ni Nyoman Tri Puspaningsih

Keyword(s):

Protein Interactions ◽

Interaction Analysis ◽

Synthesis Process ◽

Gradient Boosting ◽

Protein Protein Interactions ◽

Central Function ◽

Extreme Gradient Boosting ◽

Main Protein ◽

The Right ◽

Roc Score

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text