scholarly journals Passenger Flow Prediction Using Smart Card Data from Connected Bus System Based on Interpretable XGBoost

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Liang Zou ◽  
Sisi Shu ◽  
Xiang Lin ◽  
Kaisheng Lin ◽  
Jiasong Zhu ◽  
...  

Bus passenger flow prediction is a critical component of advanced transportation information system for public traffic management, control, and dispatch. With the development of artificial intelligence, many previous studies attempted to apply machine learning models to extract comprehensive correlations from transit networks to improve passenger flow prediction accuracy, given that the variety and volume of traffic data have been easily obtained. The passenger flow on a station is highly affected by various factors such as the previous time step, peak hours or nonpeak hours, and extracting the key features from the data is essential for a passenger flow prediction model. Although the neural networks, k -nearest neighbor, and some deep learning models have been adopted to mine the temporal correlations of the passenger flow data, the lack of interpretability of the influenced variables is still a big problem. Classical tree-based models can mine the correlations between variables and rank the importance of each variable. In this study, we presented a method to extract passenger flow of different routes on the station and implemented a XGBoost model to find the contributions of variables to the prediction of passenger flow. Comparing to benchmark models, the proposed model can reach state-of-the-art prediction accuracy and computational efficiency on the real-world dataset. Moreover, the XGBoost model can interpret the predicted results. It can be seen that period is the most important variable for the passenger flow prediction, and so the management of buses during peak hours should be improved.

Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4574
Author(s):  
Hongwei Jia ◽  
Haiyong Luo ◽  
Hao Wang ◽  
Fang Zhao ◽  
Qixue Ke ◽  
...  

Passenger flow prediction has drawn increasing attention in the deep learning research field due to its great importance in traffic management and public safety. The major challenge of this essential task lies in multiple spatiotemporal correlations that exhibit complex non-linear correlations. Although both the spatial and temporal perspectives have been considered in modeling, most existing works have ignored complex temporal correlations or underlying spatial similarity. In this paper, we identify the unique spatiotemporal correlation of urban metro flow, and propose an attention-based deep spatiotemporal network with multi-task learning (ADST-Net) at a citywide level to predict the future flow from historical observations. ADST-Net uses three independent channels with the same structure to model the recent, daily-periodic and weekly-periodic complicated spatiotemporal correlations, respectively. Specifically, each channel uses the framework of residual networks, the rectified block and the multi-scale convolutions to mine spatiotemporal correlations. The residual networks can effectively overcome the gradient vanishing problem. The rectified block adopts an attentional mechanism to automatically reweigh measurements at different time intervals, and the multi-scale convolutions are used to extract explicit spatial relationships. ADST-Net also introduces an external embedding mechanism to extract the influence of external factors on flow prediction, such as weather conditions. Furthermore, we enforce multi-task learning to utilize transition passenger flow volume prediction as an auxiliary task during the training process for generalization. Through this model, we can not only capture the steady trend, but also the sudden changes of passenger flow. Extensive experimental results on two real-world traffic flow datasets demonstrate the obvious improvement and superior performance of our proposed algorithm compared with state-of-the-art baselines.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Chuanxiang Ren ◽  
Chunxu Chai ◽  
Changchang Yin ◽  
Haowei Ji ◽  
Xuezhen Cheng ◽  
...  

Short-term traffic flow prediction can provide a basis for traffic management and support for travelers to make decisions. Accurate short-term traffic flow prediction also provides necessary conditions for the sustainable development of the traffic environment. Although the application of deep learning methods for traffic flow prediction has achieved good accuracy, the problem of combining multiple deep learning methods to improve the prediction accuracy of a single method still has a margin for in-depth research. In this article, a combined deep learning prediction (CDLP) model including two paralleled single deep learning models, CNN-LSTM-attention model and CNN-GRU-attention model, is established. In the model, a one-dimensional convolutional neural network (1DCNN) is used to extract traffic flow local trend features and RNN variants (LSTM and GRU) with attention mechanism are used to extract long temporal dependencies trend features. Moreover, a dynamic optimal weighted coefficient algorithm (DOWCA) is proposed to calculate the dynamic weights of CNN-LSTM-attention and CNN-GRU-attention with the goal of minimizing the sum of squared errors of the CDLP model. Then, the neuron number, loss function, optimization algorithm, and other parameters of the CDLP model are discussed and set through experiments. Finally, the training set and test set for the CDLP model are established through the processing of traffic flow data collected from the field. The CDLP model is trained and tested, and the prediction results of traffic flow are obtained and analyzed. It indicates that the CDLP model can fit the change trend of traffic flow very well and has better performance. Furthermore, under the same dataset, the results from the CDLP model are compared with baseline models. It is found that the CDLP model has higher prediction accuracy than baseline models.


2019 ◽  
Vol 11 (19) ◽  
pp. 5281 ◽  
Author(s):  
Peikun Li ◽  
Chaoqun Ma ◽  
Jing Ning ◽  
Yun Wang ◽  
Caihua Zhu

The improvement of accuracy of short-term passenger flow prediction plays a key role in the efficient and sustainable development of metro operation. The primary objective of this study is to explore the factors that influence prediction accuracy from time granularity and station class. An important aim of the study was also in presenting the proposition of change in a forecasting method. Passenger flow data from 87 Metro stations in Xi’an was collected and analyzed. A framework of short-term passenger flow based on the Empirical Mode Decomposition-Support Vector Regression (EMD-SVR) was proposed to predict passenger flow for different types of stations. Also, the relationship between the generation of passenger flow prediction error and passenger flow data was investigated. First, the metro network was classified into four categories by using eight clustering factors based on the characteristics of inbound passenger flow. Second, Pearson correlation coefficient was utilized to explore the time interval and time granularity for short-term passenger flow prediction. Third, the EMD-SVR was used to predict the passenger flow in the optimal time interval for each station. Results showed that the proposed approach has a significant improvement compared to the traditional passenger flow forecast approach. Lookback Volatility (LVB) was applied to reflect the fluctuation difference of passenger flow data, and the linear fitting of prediction error was conducted. The goodness-of-fit (R2) was found to be 0.768, indicating a good fitting of the data. Furthermore, it revealed that there are obvious differences in the prediction error of the four kinds of stations.


Author(s):  
Yangyang Zhao ◽  
Lu Ren ◽  
Zhenliang Ma ◽  
Xinguo Jiang

Short-term metro passenger flow prediction is vital for the operation and management of metro systems. Most studies focus on the higher prediction accuracy with statistical and machine learning methods, but little attention has been paid to the prioritization and selection of feature variables, especially for different metro station types. This study aims to analyze the effect of feature variables on the prediction results, and then select appropriate predictor variables accordingly. A novel three-stage framework is proposed to prioritize feature variables for short-term metro passenger flow prediction, including station clustering, feature extraction, and variable prioritization. A hierarchical clustering algorithm (AHC) is developed for station clustering, the results of which are verified by the K-means and Davies-Bouldin (DB) statistical index. We then extract the temporal, spatial, and external features. Finally, the association between the variables and the prediction results is explored using tree-based models. The proposed framework is demonstrated and validated with data collected from Shanghai Metro Automatic Fare Collection (AFC) system. The results highlight that the importance of feature variables for developing models varies between stations, whereas only a few variables are found to explain most of the variation in the testing dataset; different feature variables lead to distinct differences in prediction accuracy, and simply adding more predictor variables does not necessarily lead to higher prediction accuracy. In addition, the station type and prediction type (i.e., tap-in and tap-out) have little influence on the selection of feature variables.


Author(s):  
Trinh Dinh Toan ◽  
Viet-Hung Truong

Short-term prediction of traffic flow is essential for the deployment of intelligent transportation systems. In this paper we present an efficient method for short-term traffic flow prediction using a Support Vector Machine (SVM) in comparison with baseline methods, including the historical average, the Current Time Based, and the Double Exponential Smoothing predictors. To demonstrate the efficiency and accuracy of the SVM method, we used one-month time-series traffic flow data on a segment of the Pan Island Expressway in Singapore for training and testing the model. The results show that the SVM method significantly outperforms the baseline methods for most prediction intervals, and under various traffic conditions, for the rolling horizon of 30 min. In investigating the effect of the input-data dimension on prediction accuracy, we found that the rolling horizon has a clear effect on the SVM’s prediction accuracy: for the rolling horizon of 30–60 min, the longer the rolling horizon, the more accurate the SVM prediction is. To look for a solution for improvement of the SVM’s training performance, we investigate the application of k-Nearest Neighbor method for SVM training using both actual data and simulated incident data. The results show that the k- Nearest Neighbor method facilitates a substantial reduction of SVM training size to accelerate the training without compromising predictive performance.


Sign in / Sign up

Export Citation Format

Share Document