scholarly journals Short-Term Forecasting of Railway Passenger Flow Based on Clustering of Booking Curves

2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Minshu Ma ◽  
Jun Liu ◽  
Jingjia Cao

For railway companies, the benefits from revenue management activities, like inventory control, dynamic pricing, and so forth, rely heavily on the accuracy of the short-term forecasting of the passenger flow. In this paper, based on the analysis of the relevance between final booking amounts and shapes of the booking curves, a novel short-term forecasting approach, which employs a specifically designed clustering algorithm and the data of both historical booking records and the bookings on hand, is proposed. The empirical study with real data sets from Chinese railway shows that the proposed approach outperforms the advanced pickup model (one of the most popular models in practice) during the early and middle stages of booking horizon when bookings are not concentrated in the final days before departure.

Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2021 ◽  
pp. 1-18
Author(s):  
Angeliki Koutsimpela ◽  
Konstantinos D. Koutroumbas

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.


2019 ◽  
Vol 20 (10) ◽  
pp. 3613-3622 ◽  
Author(s):  
Liyang Tang ◽  
Yang Zhao ◽  
Javier Cabrera ◽  
Jian Ma ◽  
Kwok Leung Tsui

2011 ◽  
Vol 34 (7) ◽  
pp. 850-861 ◽  
Author(s):  
Guan Yuan ◽  
Shixiong Xia ◽  
Lei Zhang ◽  
Yong Zhou ◽  
Cheng Ji

With the development of location-based services, such as the Global Positioning System and Radio Frequency Identification, a great deal of trajectory data can be collected. Therefore, how to mine knowledge from these data has become an attractive topic. In this paper, we propose an efficient trajectory-clustering algorithm based on an index tree. Firstly, an index tree is proposed to store trajectories and their similarity matrix, with which trajectories can be retrieved efficiently; secondly, a new conception of trajectory structure is introduced to analyse both the internal and external features of trajectories; then, trajectories are partitioned into trajectory segments according to their corners; furthermore, the similarity between every trajectory segment pairs is compared by presenting the structural similarity function; finally, trajectory segments are grouped into different clusters according to their location in the different levels of the index tree. Experimental results on real data sets demonstrate not only the efficiency and effectiveness of our algorithm, but also the great flexibility that feature sensitivity can be adjusted by different parameters, and the cluster results are more practically significant.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Qingying Lai ◽  
Jun Liu ◽  
Yongji Luo ◽  
Minshu Ma

Short-term forecasting of OD (origin to destination) passenger flow on high-speed rail (HSR) is one of the critical tasks in rail traffic management. This paper proposes a hybrid model to explore the impact of the train service frequency (TSF) of the HSR on the passenger flow. The model is composed of two parts. One is the Holt-Winters model, which takes advantage of time series characteristics of passenger flow. The other part considers the changes of TSF for the OD in different time during a day. The two models are integrated by the minimum absolute value method to generate the final hybrid model. The operational data of Beijing-Shanghai high-speed railway from 2012 to 2016 are used to verify the effectiveness of the model. In addition to the forecasting ability, with a definite formation, the proposed model can be further used to forecast the effects of the TSF.


2013 ◽  
Vol 409-410 ◽  
pp. 1071-1074
Author(s):  
Xiu Shan Jiang ◽  
Rui Feng Zhang ◽  
Liang Pan

Take Wuhan-Guangzhou high-speed railway for example. By adopting the empirical mode decomposition (EMD) attempt to analyze mode from the perspective of volatility of high speed railway passenger flow fluctuation signal. Constructed the ensemble empirical mode decomposition-gray support vector machine (EEMD-GSVM) short-term forecasting model which fuse the gray generation and support vector machine with the ensemble empirical mode decomposition (EEMD). Finally, by the accuracy of predicted results, explains the EEMD-GSVM model has the better adaptability.


2021 ◽  
Vol 24 (1) ◽  
pp. 42-47
Author(s):  
N. P. Koryshev ◽  
◽  
I. A. Hodashinsky ◽  

The article presents a description of the algorithm for generating fuzzy rules for a fuzzy classifier using data clustering, metaheuristic, and the clustering quality index, as well as the results of performance testing on real data sets.


2021 ◽  
Vol 3 (1) ◽  
pp. 1-7
Author(s):  
Yadgar Sirwan Abdulrahman

Clustering is one of the essential strategies in data analysis. In classical solutions, all features are assumed to contribute equally to the data clustering. Of course, some features are more important than others in real data sets. As a result, essential features will have a more significant impact on identifying optimal clusters than other features. In this article, a fuzzy clustering algorithm with local automatic weighting is presented. The proposed algorithm has many advantages such as: 1) the weights perform features locally, meaning that each cluster's weight is different from the rest. 2) calculating the distance between the samples using a non-euclidian similarity criterion to reduce the noise effect. 3) the weight of the features is obtained comparatively during the learning process. In this study, mathematical analyzes were done to obtain the clustering centers well-being and the features' weights. Experiments were done on the data set range to represent the progressive algorithm's efficiency compared to other proposed algorithms with global and local features


2021 ◽  
Vol 37 (1) ◽  
pp. 71-89
Author(s):  
Vu-Tuan Dang ◽  
Viet-Vu Vu ◽  
Hong-Quan Do ◽  
Thi Kieu Oanh Le

During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the clustering results can be significantly improved by using side information, which is available or collected from users. There are two main kinds of side information that can be learned in semi-supervised clustering algorithms: the class labels - called seeds or the pairwise constraints. The first semi-supervised clustering was introduced in 2000, and since that, many algorithms have been presented in literature. However, it is not easy to use both types of side information in the same algorithm. To address the problem, this paper proposes a semi-supervised graph based clustering algorithm that tries to use seeds and constraints in the clustering process, called MCSSGC. Moreover, we introduces a simple but efficient active learning method to collect the constraints that can boost the performance of MCSSGC, named KMMFFQS. In order to verify effectiveness of the proposed algorithm, we conducted a series of experiments not only on real data sets from UCI, but also on a document data set applied in an Information Extraction of Vietnamese documents. These obtained results show that the proposed algorithm can significantly improve the clustering process compared to some recent algorithms.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Zhe Zhang ◽  
Cheng Wang ◽  
Yueer Gao ◽  
Jianwei Chen ◽  
Yiwen Zhang

To solve the problems of current short-term forecasting methods for metro passenger flow, such as unclear influencing factors, low accuracy, and high time-space complexity, a method for metro passenger flow based on ST-LightGBM after considering transfer passenger flow is proposed. Firstly, using historical data as the training set to transform the problem into a data-driven multi-input single-output regression prediction problem, the problem of the short-term prediction of metro passenger flow is formalized and the difficulties of the problem are identified. Secondly, we extract the candidate temporal and spatial features that may affect passenger flow at a metro station from passenger travel data based on the spatial transfer and spatial similarity of passenger flow. Thirdly, we use a maximal information coefficient (MIC) feature selection algorithm to select the significant impact features as the input. Finally, a short-term forecasting model for metro passenger flow based on the light gradient boosting machine (LightGBM) model is established. Taking transfer passenger flow into account, this method has a low space-time cost and high accuracy. The experimental results on the dataset of Lianban metro station in Xiamen city show that the proposed method obtains higher prediction accuracy than SARIMA, SVR, and BP network.


Sign in / Sign up

Export Citation Format

Share Document