Exploring Implicit Relationships between Pavement Surface Friction and Vehicle Crash Severity Using Interpretable Extreme Gradient Boosting Method

Author(s):  
Guangyuan Zhao ◽  
Yi Jiang ◽  
Shuo Li ◽  
Susan Tighe

Pavement friction has been identified as crucial in traffic safety. Since the Highway Safety Manual prediction algorithm is often based on crash frequency, the crash severity distribution might be assumed unchanged before and after the countermeasure. However, pavement surface treatments can improve the friction to different levels, by which crash severity outcomes may vary greatly. To explore the implicit effects of pavement friction on vehicle crash severity, this paper first validates the extreme gradient boosting model performance and then the Shapley additive explanations interaction values are employed to interpret individual features and the nonlinear interactions among predictors. Under various scenarios, the XGBoost output probability is utilized to convert into dynamic crash severity distributions. Results also indicate that friction becomes more significant when the friction number is less than 38, and immediate corrective actions are needed when the friction number is below 20.

2021 ◽  
Vol 13 (6) ◽  
pp. 1147
Author(s):  
Xiangqian Li ◽  
Wenping Yuan ◽  
Wenjie Dong

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.


2021 ◽  
Vol 21 (2) ◽  
pp. 5-17
Author(s):  
Anna Markella Antoniadi ◽  
Miriam Galvin ◽  
Mark Heverin ◽  
Orla Hardiman ◽  
Catherine Mooney

Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative disease that causes a rapid decline in motor functions and has a fatal trajectory. ALS is currently incurable, so the aim of the treatment is mostly to alleviate symptoms and improve quality of life (QoL) for the patients. The goal of this study is to develop a Clinical Decision Support System (CDSS) to alert clinicians when a patient is at risk of experiencing low QoL. The source of data was the Irish ALS Registry and interviews with the 90 patients and their primary informal caregiver at three time-points. In this dataset, there were two different scores to measure a person's overall QoL, based on the McGill QoL (MQoL) Questionnaire and we worked towards the prediction of both. We used Extreme Gradient Boosting (XGBoost) for the development of the predictive models, which was compared to a logistic regression baseline model. Additionally, we used Synthetic Minority Over-sampling Technique (SMOTE) to examine if that would increase model performance and SHAP (SHapley Additive explanations) as a technique to provide local and global explanations to the outputs as well as to select the most important features. The total calculated MQoL score was predicted accurately using three features - age at disease onset, ALSFRS-R score for orthopnoea and the caregiver's status pre-caregiving - with a F1-score on the test set equal to 0.81, recall of 0.78, and precision of 0.84. The addition of two extra features (caregiver's age and the ALSFRS-R score for speech) produced similar outcomes (F1-score 0.79, recall 0.70 and precision 0.90).


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hengrui Chen ◽  
Hong Chen ◽  
Ruiyu Zhou ◽  
Zhizhen Liu ◽  
Xiaoke Sun

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.


2020 ◽  
Vol 15 (11) ◽  
pp. 3135-3150 ◽  
Author(s):  
Lin Wang ◽  
Chongzhi Wu ◽  
Libin Tang ◽  
Wengang Zhang ◽  
Suzanne Lacasse ◽  
...  

2020 ◽  
Vol 35 (Supplement_3) ◽  
Author(s):  
Tae-Hyun Yoo ◽  
Hae-Ryong Yun ◽  
Jae Hyun Chang

Abstract Background and Aims The optimization of anemia management is a challenging task due to the complexities of underlying diseases and heterogeneous responses to erythropoiesis-stimulating agents (ESA) in patients with end-stage kidney disease (ESKD). Recent studies have shown that machine learning (ML) algorithms can be an effective tool to predict hemoglobin (Hb) levels and determine the ESA doses in these patients. However, most of the proposed ML approaches are not designed to handle multivariate longitudinal patient data. Thus, we developed Hb prediction and ESA doses recommendation algorithm (HPERA) using recurrent neural networks (RNN). Method A total of 466 participants, who underwent hemodialysis in 7 hospitals in the Republic of Korea, were included in the present study. We selected 15 variables from an extreme gradient boosting (XGBoost) algorithm. The outcome of the prediction algorithm was Hb levels in next month. In the recommendation algorithm, the outcome was the ESA dose for target Hb next month. Among various types of RNN families, gated recurrent units (GRU) were used to build both the prediction and recommendation algorithms. In addition to holding out a separate validation dataset, we used a Gaussian noise layer following each input layer to avoid overfitting. We also performed linear regression, multilayer perceptrons, and extreme gradient boosting with an extensive hyperparameter search to validate our GRU-based prediction algorithm. The performances of each model were evaluated in terms of the mean absolute error (MAE). Results The mean age of the study population was 57.8 years, 248 (53.2%) participants are male, and the mean observation period is 30.0 months. The best result of our prediction algorithm in terms of MAE was 0.59 g/dL and was obtained by two stacked GRU layers followed by a single hidden feedforward network with 6-month follow-up patient data. The best recommendation algorithm had 43.2 μg in MAE and this was obtained by one GRU layer followed by two layers of a feedforward network. The HPERA had a lower overall ESA dose (μg/months) [155 (80–240) vs. 140 (70–210), P<0.001], decreased Hb difference (g/dL) [0.8 (0.4–1.4) vs. 0.6 (0.3–1.0), P<0.001)], and had a higher success and a lower failure rates of reaching target Hb compared to those in real practice. Conclusion The GRU-based prediction model outperformed previous ML methodologies, though hyperparameter turning was much simpler. Using the HPERA showed the possibility of a reduced amount of ESA, decreased Hb difference, and increased the reaching rate of target Hb levels. Our study revealed a great potential direction of anemia management using ML in ESKD patients.


2017 ◽  
Vol 25 (3) ◽  
pp. 321-330 ◽  
Author(s):  
Shang Gao ◽  
Michael T Young ◽  
John X Qiu ◽  
Hong-Jun Yoon ◽  
James B Christian ◽  
...  

Abstract Objective We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts from free-text documents. Materials and Methods Data for our analyses were obtained from 942 deidentified pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classification, matched to G1–G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. Results Our results demonstrate that for both information tasks, HAN performed significantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macroF-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). Conclusions HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.


Atmosphere ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 373 ◽  
Author(s):  
Mehdi Zamani Joharestani ◽  
Chunxiang Cao ◽  
Xiliang Ni ◽  
Barjeece Bashir ◽  
Somayeh Talebiesfandarani

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM2.5) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM2.5 concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM2.5 concentrations, the factors influencing PM2.5 prediction have not been investigated. In this work, we study feature importance for PM2.5 prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM2.5, and geographical data, in the modeling. The best model performance obtained was R2 = 0.81 (R = 0.9), MAE = 9.93 µg/m3, and RMSE = 13.58 µg/m3 using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R2 varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM2.5 lag data, satellite-derived AODs did not improve model performance.


2019 ◽  
Vol 46 (8) ◽  
pp. 712-721 ◽  
Author(s):  
Saleh R. Mousa ◽  
Peter R. Bakhit ◽  
Sherif Ishak

Despite the research efforts for reducing traffic accidents, the number of global annual vehicle accidents is still on the rise. This continues to motivate researchers to examine the factors contributing to crash and near-crash events (CNC). Recently, many studies attempted to identify the associated crash factors using naturalistic driving study (SHRP2-NDS) data. Despite the many classifiers developed in the literature, the high dimensionality and multicollinearity within the SHRP2-NDS data limit the accuracy and reliability of the developed models. This study develops an extreme gradient boosting (XGB) classifier, robust to multicollinearity, using the SHRP2-NDS dataset for identifying the factors contributing to CNC events. The performance of the XGB classifier is evaluated against three other advanced machine-learning algorithms. Results indicate that the XGB model outperformed the other models with a detection accuracy of 85% and identified the “driver behavior” and “intersection influence” as the most contributing factors to CNC detection.


2021 ◽  
Vol 13 (13) ◽  
pp. 7454
Author(s):  
Bo Qiu ◽  
Wei (David) Fan

Due to the increasing traffic volume in metropolitan areas, short-term travel time prediction (TTP) can be an important and useful tool for both travelers and traffic management. Accurate and reliable short-term travel time prediction can greatly help vehicle routing and congestion mitigation. One of the most challenging tasks in TTP is developing and selecting the most appropriate prediction algorithm using the available data. In this study, the travel time data was provided and collected from the Regional Integrated Transportation Information System (RITIS). Then, the travel times were predicted for short horizons (ranging from 15 to 60 min) on the selected freeway corridors by applying four different machine learning algorithms, which are Decision Trees (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory neural network (LSTM). Many spatial and temporal characteristics that may affect travel time were used when developing the models. The performance of prediction accuracy and reliability are compared. Numerical results suggest that RF can achieve a better prediction performance result than any of the other methods not only in accuracy but also with stability.


Sign in / Sign up

Export Citation Format

Share Document