Intersection Traffic Prediction Using Decision Tree Models

Traffic prediction is a critical task for intelligent transportation systems (ITS). Prediction at intersections is challenging as it involves various participants, such as vehicles, cyclists, and pedestrians. In this paper, we propose a novel approach for the accurate intersection traffic prediction by introducing extra data sources other than road traffic volume data into the prediction model. In particular, we take advantage of the data collected from the reports of road accidents and roadworks happening near the intersections. In addition, we investigate two types of learning schemes, namely batch learning and online learning. Three popular ensemble decision tree models are used in the batch learning scheme, including Gradient Boosting Regression Trees (GBRT), Random Forest (RF) and Extreme Gradient Boosting Trees (XGBoost), while the Fast Incremental Model Trees with Drift Detection (FIMT-DD) model is adopted for the online learning scheme. The proposed approach is evaluated using public data sets released by the Victorian Government of Australia. The results indicate that the accuracy of intersection traffic prediction can be improved by incorporating nearby accidents and roadworks information.

Download Full-text

Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models

PeerJ ◽

10.7717/peerj.3092 ◽

2017 ◽

Vol 5 ◽

pp. e3092 ◽

Cited By ~ 1

Author(s):

Shih-Hsiung Liang ◽

Bruno Andreas Walther ◽

Bao-Sen Shieh

Keyword(s):

Decision Tree ◽

Species Traits ◽

Gradient Boosting ◽

Small Data ◽

Invasion Process ◽

Data Set ◽

Establishment Success ◽

Exotic Birds ◽

Tree Models ◽

Nominal Variables

Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.

Download Full-text

Multi-Class Taxonomy of Well Integrity Anomalies Applying Inductive Learning Algorithms: Analytical Approach for Artificial-Lift Wells

10.2118/206129-ms ◽

2021 ◽

Author(s):

Mostafa Sa'eed Yakoot ◽

Adel Mohamed Salem Ragab ◽

Omar Mahmoud

Keyword(s):

Decision Tree ◽

Confusion Matrix ◽

Learning Algorithms ◽

Oil And Gas Industry ◽

Classification Model ◽

Gradient Boosting ◽

Support Vector ◽

Risk Category ◽

Well Integrity ◽

Extreme Gradient Boosting

Abstract Well integrity has become a crucial field with increased focus and being published intensively in industry researches. It is important to maintain the integrity of the individual well to ensure that wells operate as expected for their designated life (or higher) with all risks kept as low as reasonably practicable, or as specified. Machine learning (ML) and artificial intelligence (AI) models are used intensively in oil and gas industry nowadays. ML concept is based on powerful algorithms and robust database. Developing an efficient classification model for well integrity (WI) anomalies is now feasible because of having enormous number of well failures and well barrier integrity tests, and analyses in the database. Circa 9000 dataset points were collected from WI tests performed for 800 wells in Gulf of Suez, Egypt for almost 10 years. Moreover, those data have been quality-controlled and quality-assured by experienced engineers. The data contain different forms of WI failures. The contributing parameter set includes a total of 23 barrier elements. Data were structured and fed into 11 different ML algorithms to build an automated systematic tool for calculating imposed risk category of any well. Comparison analysis for the deployed models was performed to infer the best predictive model that can be relied on. 11 models include both supervised and ensemble learning algorithms such as random forest, support vector machine (SVM), decision tree and scalable boosting techniques. Out of 11 models, the results showed that extreme gradient boosting (XGB), categorical boosting (CatBoost), and decision tree are the most reliable algorithms. Moreover, novel evaluation metrics for confusion matrix of each model have been introduced to overcome the problem of existing metrics which don't consider domain knowledge during model evaluation. The innovated model will help to utilize company resources efficiently and dedicate personnel efforts to wells with the high-risk. As a result, progressive improvements on business, safety, environment, and performance of the business. This paper would be a milestone in the design and creation of the Well Integrity Database Management Program through the combination of integrity and ML.

Download Full-text

Predicting Parkinson's disease using gradient boosting decision tree models with electroencephalography signals

Parkinsonism & Related Disorders ◽

10.1016/j.parkreldis.2022.01.011 ◽

2022 ◽

Author(s):

Seung-Bo Lee ◽

Yong-Jeong Kim ◽

Sungeun Hwang ◽

Hyoshin Son ◽

Sang Kun Lee ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Decision Tree ◽

Gradient Boosting ◽

Tree Models

Download Full-text

Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework

Journal of Diabetes Research ◽

10.1155/2020/6873891 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Mingyue Xue ◽

Yinxia Su ◽

Chen Li ◽

Shuxia Wang ◽

Hua Yao

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Decision Tree ◽

Type Ii Diabetes ◽

Large Scale ◽

Systolic Pressure ◽

Gradient Boosting ◽

Significant Feature ◽

Type Ii ◽

Extreme Gradient Boosting

Background. An estimated 425 million people globally have diabetes, accounting for 12% of the world’s health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods. A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables’ importance scores of T2DM. Results. The results indicated that XGBoost had the best performance (accuracy=0.906, precision=0.910, recall=0.902, F‐1=0.906, and AUC=0.968). The degree of variables’ importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions. We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables’ importance scores gives a clue to prevent diabetes occurrence.

Download Full-text

Predicting Injury Severity of Road Traffic Accidents Using a Hybrid Extreme Gradient Boosting and Deep Neural Network Approach

Laser Scanning Systems in Highway and Safety Assessment - Advances in Science, Technology & Innovation ◽

10.1007/978-3-030-10374-3_10 ◽

2019 ◽

pp. 119-127 ◽

Cited By ~ 1

Author(s):

Biswajeet Pradhan ◽

Maher Ibrahim Sameen

Keyword(s):

Neural Network ◽

Traffic Accidents ◽

Deep Neural Network ◽

Injury Severity ◽

Road Traffic ◽

Gradient Boosting ◽

Road Traffic Accidents ◽

Network Approach ◽

Neural Network Approach ◽

Extreme Gradient Boosting

Download Full-text

MRI-Based Machine Learning in Differentiation Between Benign and Malignant Breast Lesions

Frontiers in Oncology ◽

10.3389/fonc.2021.552634 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yanjie Zhao ◽

Rong Chen ◽

Ting Zhang ◽

Chaoyue Chen ◽

Muhetaer Muhelisa ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Texture Analysis ◽

Training Group ◽

Gradient Boosting ◽

Breast Lesions ◽

Extreme Gradient Boosting ◽

Malignant Breast ◽

Benign Breast Lesions

BackgroundDifferential diagnosis between benign and malignant breast lesions is of crucial importance relating to follow-up treatment. Recent development in texture analysis and machine learning may lead to a new solution to this problem.MethodThis current study enrolled a total number of 265 patients (benign breast lesions:malignant breast lesions = 71:194) diagnosed in our hospital and received magnetic resonance imaging between January 2014 and August 2017. Patients were randomly divided into the training group and validation group (4:1), and two radiologists extracted their texture features from the contrast-enhanced T1-weighted images. We performed five different feature selection methods including Distance correlation, Gradient Boosting Decision Tree (GBDT), least absolute shrinkage and selection operator (LASSO), random forest (RF), eXtreme gradient boosting (Xgboost) and five independent classification models were built based on Linear discriminant analysis (LDA) algorithm.ResultsAll five models showed promising results to discriminate malignant breast lesions from benign breast lesions, and the areas under the curve (AUCs) of receiver operating characteristic (ROC) were all above 0.830 in both training and validation groups. The model with a better discriminating ability was the combination of LDA + gradient boosting decision tree (GBDT). The sensitivity, specificity, AUC, and accuracy in the training group were 0.814, 0.883, 0.922, and 0.868, respectively; LDA + random forest (RF) also suggests promising results with the AUC of 0.906 in the training group.ConclusionThe evidence of this study, while preliminary, suggested that a combination of MRI texture analysis and LDA algorithm could discriminate benign breast lesions from malignant breast lesions. Further multicenter researches in this field would be of great help in the validation of the result.

Download Full-text

Modelos de machine learning para predição do sucesso de startups

Revista de Gestão e Projetos ◽

10.5585/gep.v12i2.18942 ◽

2021 ◽

Vol 12 (2) ◽

pp. 28-55

Author(s):

Fabiano Rodrigues ◽

Francisco Aparecido Rodrigues ◽

Thelma Valéria Rocha Rodrigues

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Initial Public Offering ◽

Gradient Boosting ◽

Support Vector ◽

Trade Offs ◽

Extreme Gradient Boosting ◽

Public Offering

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.

Download Full-text

Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping

Frontiers in Earth Science ◽

10.3389/feart.2021.712240 ◽

2021 ◽

Vol 9 ◽

Author(s):

Shibao Wang ◽

Jianqi Zhuang ◽

Jia Zheng ◽

Hongyu Fan ◽

Jiaxu Kong ◽

...

Keyword(s):

Random Forest ◽

Decision Tree ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Gradient Boosting ◽

The Loess Plateau ◽

Tree Model ◽

Validation Data ◽

Extreme Gradient Boosting

Landslides are widely distributed worldwide and often result in tremendous casualties and economic losses, especially in the Loess Plateau of China. Taking Wuqi County in the hinterland of the Loess Plateau as the research area, using Bayesian hyperparameters to optimize random forest and extreme gradient boosting decision trees model for landslide susceptibility mapping, and the two optimized models are compared. In addition, 14 landslide influencing factors are selected, and 734 landslides are obtained according to field investigation and reports from literals. The landslides were randomly divided into training data (70%) and validation data (30%). The hyperparameters of the random forest and extreme gradient boosting decision tree models were optimized using a Bayesian algorithm, and then the optimal hyperparameters are selected for landslide susceptibility mapping. Both models were evaluated and compared using the receiver operating characteristic curve and confusion matrix. The results show that the AUC validation data of the Bayesian optimized random forest and extreme gradient boosting decision tree model are 0.88 and 0.86, respectively, which showed an improvement of 4 and 3%, indicating that the prediction performance of the two models has been improved. However, the random forest model has a higher predictive ability than the extreme gradient boosting decision tree model. Thus, hyperparameter optimization is of great significance in the improvement of the prediction accuracy of the model. Therefore, the optimized model can generate a high-quality landslide susceptibility map.

Download Full-text

CD-NuSS: A Web Server for the Automated Secondary Structural Characterization of the Nucleic Acids from Circular Dichroism Spectra Using Extreme Gradient Boosting Decision-Tree, Neural Network and Kohonen Algorithms

Journal of Molecular Biology ◽

10.1016/j.jmb.2020.08.014 ◽

2020 ◽

pp. 166629

Author(s):

Chakkarai Sathyaseelan ◽

Vinothini Vijayakumar ◽

Thenmalarchelvi Rathinavelan

Keyword(s):

Neural Network ◽

Decision Tree ◽

Nucleic Acids ◽

Structural Characterization ◽

Web Server ◽

Gradient Boosting ◽

Circular Dichroism Spectra ◽

Extreme Gradient Boosting ◽

Circular Dichroïsm

Download Full-text

Detection of Ionospheric Scintillation Based on XGBoost Model Improved by SMOTE-ENN Technique

Remote Sensing ◽

10.3390/rs13132577 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2577

Author(s):

Mengying Lin ◽

Xuefen Zhu ◽

Teng Hua ◽

Xinhua Tang ◽

Gangyi Tu ◽

...

Keyword(s):

Random Forest ◽

Decision Tree ◽

Natural Phenomenon ◽

Gradient Boosting ◽

Ionospheric Scintillation ◽

Detection Accuracy ◽

Polar Regions ◽

Validation Data ◽

Extreme Gradient Boosting ◽

Testing Accuracy

Ionospheric scintillation frequently occurs in equatorial, auroral and polar regions, posing a threat to the performance of the global navigation satellite system (GNSS). Thus, the detection of ionospheric scintillation is of great significance in regard to improving GNSS performance, especially when severe ionospheric scintillation occurs. Normal algorithms exhibit insensitivity in strong scintillation detection in that the natural phenomenon of strong scintillation appears only occasionally, and such samples account for a small proportion of the data in datasets relative to those for weak/moderate scintillation events. Aiming at improving the detection accuracy, we proposed a strategy combining an improved eXtreme Gradient Boosting (XGBoost) algorithm by using the synthetic minority, oversampling technique and edited nearest neighbor (SMOTE-ENN) resampling technique for detecting events imbalanced with respect to weak, medium and strong ionospheric scintillation. It outperformed the decision tree and random forest by 12% when using imbalanced training and validation data, for tree depths ranging from 1 to 30. For different degrees of imbalance in the training datasets, the testing accuracy of the improved XGBoost was about 4% to 5% higher than that of the decision tree and random forest. Meanwhile, the testing results for the improved method showed significant increases in evaluation indicators, while the recall value for strong scintillation events was relatively stable, above 90%, and the corresponding F1 scores were over 92%. When testing on datasets with different degrees of imbalance, there was a distinct increase of about 10% to 20% in the recall value and 6% to 11% in the F1 score for strong scintillation events, with the testing accuracy ranging from 90.42% to 96.04%.

Download Full-text