Short-Term Forecasting of Railway Passenger Flow Based on Clustering of Booking Curves

For railway companies, the benefits from revenue management activities, like inventory control, dynamic pricing, and so forth, rely heavily on the accuracy of the short-term forecasting of the passenger flow. In this paper, based on the analysis of the relevance between final booking amounts and shapes of the booking curves, a novel short-term forecasting approach, which employs a specifically designed clustering algorithm and the data of both historical booking records and the bookings on hand, is proposed. The empirical study with real data sets from Chinese railway shows that the proposed approach outperforms the advanced pickup model (one of the most popular models in practice) during the early and middle stages of booking horizon when bookings are not concentrated in the final days before departure.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

A new stochastic gradient descent possibilistic clustering algorithm

AI Communications ◽

10.3233/aic-210125 ◽

2021 ◽

pp. 1-18

Author(s):

Angeliki Koutsimpela ◽

Konstantinos D. Koutroumbas

Keyword(s):

Cost Function ◽

Gradient Descent ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Convergence Results ◽

Possibilistic Clustering

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.

Download Full-text

Forecasting Short-Term Passenger Flow: An Empirical Study on Shenzhen Metro

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2018.2879497 ◽

2019 ◽

Vol 20 (10) ◽

pp. 3613-3622 ◽

Cited By ~ 13

Author(s):

Liyang Tang ◽

Yang Zhao ◽

Javier Cabrera ◽

Jian Ma ◽

Kwok Leung Tsui

Keyword(s):

Empirical Study ◽

Short Term ◽

Passenger Flow

Download Full-text

An efficient trajectory-clustering algorithm based on an index tree

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331211423284 ◽

2011 ◽

Vol 34 (7) ◽

pp. 850-861 ◽

Cited By ~ 15

Author(s):

Guan Yuan ◽

Shixiong Xia ◽

Lei Zhang ◽

Yong Zhou ◽

Cheng Ji

Keyword(s):

Radio Frequency Identification ◽

Clustering Algorithm ◽

Real Data ◽

Structural Similarity ◽

Location Based Services ◽

Similarity Function ◽

Data Sets ◽

Trajectory Clustering ◽

Trajectory Data ◽

Index Tree

With the development of location-based services, such as the Global Positioning System and Radio Frequency Identification, a great deal of trajectory data can be collected. Therefore, how to mine knowledge from these data has become an attractive topic. In this paper, we propose an efficient trajectory-clustering algorithm based on an index tree. Firstly, an index tree is proposed to store trajectories and their similarity matrix, with which trajectories can be retrieved efficiently; secondly, a new conception of trajectory structure is introduced to analyse both the internal and external features of trajectories; then, trajectories are partitioned into trajectory segments according to their corners; furthermore, the similarity between every trajectory segment pairs is compared by presenting the structural similarity function; finally, trajectory segments are grouped into different clusters according to their location in the different levels of the index tree. Experimental results on real data sets demonstrate not only the efficiency and effectiveness of our algorithm, but also the great flexibility that feature sensitivity can be adjusted by different parameters, and the cluster results are more practically significant.

Download Full-text

A Hybrid Short-Term Forecasting Model of Passenger Flow on High-Speed Rail considering the Impact of Train Service Frequency

Mathematical Problems in Engineering ◽

10.1155/2017/1828102 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Qingying Lai ◽

Jun Liu ◽

Yongji Luo ◽

Minshu Ma

Keyword(s):

Hybrid Model ◽

Traffic Management ◽

High Speed ◽

Short Term ◽

High Speed Rail ◽

Passenger Flow ◽

Proposed Model ◽

Service Frequency ◽

Short Term Forecasting ◽

The Impact

Short-term forecasting of OD (origin to destination) passenger flow on high-speed rail (HSR) is one of the critical tasks in rail traffic management. This paper proposes a hybrid model to explore the impact of the train service frequency (TSF) of the HSR on the passenger flow. The model is composed of two parts. One is the Holt-Winters model, which takes advantage of time series characteristics of passenger flow. The other part considers the changes of TSF for the OD in different time during a day. The two models are integrated by the minimum absolute value method to generate the final hybrid model. The operational data of Beijing-Shanghai high-speed railway from 2012 to 2016 are used to verify the effectiveness of the model. In addition to the forecasting ability, with a definite formation, the proposed model can be further used to forecast the effects of the TSF.

Download Full-text

Short-Time Fluctuation Characteristic and Combined Forecasting of High-Speed Railway Passenger Flow Based on EEMD

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.409-410.1071 ◽

2013 ◽

Vol 409-410 ◽

pp. 1071-1074

Author(s):

Xiu Shan Jiang ◽

Rui Feng Zhang ◽

Liang Pan

Keyword(s):

Support Vector Machine ◽

Empirical Mode Decomposition ◽

High Speed ◽

Ensemble Empirical Mode Decomposition ◽

Support Vector ◽

High Speed Railway ◽

Passenger Flow ◽

Mode Decomposition ◽

Railway Passenger ◽

Short Term Forecasting

Take Wuhan-Guangzhou high-speed railway for example. By adopting the empirical mode decomposition (EMD) attempt to analyze mode from the perspective of volatility of high speed railway passenger flow fluctuation signal. Constructed the ensemble empirical mode decomposition-gray support vector machine (EEMD-GSVM) short-term forecasting model which fuse the gray generation and support vector machine with the ensemble empirical mode decomposition (EEMD). Finally, by the accuracy of predicted results, explains the EEMD-GSVM model has the better adaptability.

Download Full-text

Algorithm to forming a rule base for a fuzzy classifier designed on the basis of the K-means clustering algorithm and the whale optimization algorithm

10.21293/1818-0442-2021-24-1-42-47 ◽

2021 ◽

Vol 24 (1) ◽

pp. 42-47

Author(s):

N. P. Koryshev ◽

◽

I. A. Hodashinsky ◽

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Performance Testing ◽

Real Data ◽

Rule Base ◽

Data Sets ◽

Fuzzy Classifier ◽

Whale Optimization ◽

Clustering Quality ◽

Using Data

The article presents a description of the algorithm for generating fuzzy rules for a fuzzy classifier using data clustering, metaheuristic, and the clustering quality index, as well as the results of performance testing on real data sets.

Download Full-text

A new approach to the fuzzy c-means clustering algorithm by automatic weights and local clustering

10.24271/psr.18 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-7

Author(s):

Yadgar Sirwan Abdulrahman

Keyword(s):

Clustering Algorithm ◽

Similarity Criterion ◽

Real Data ◽

Well Being ◽

Classical Solutions ◽

Data Sets ◽

Data Set ◽

New Approach ◽

Fuzzy C Means Clustering ◽

Global And Local

Clustering is one of the essential strategies in data analysis. In classical solutions, all features are assumed to contribute equally to the data clustering. Of course, some features are more important than others in real data sets. As a result, essential features will have a more significant impact on identifying optimal clusters than other features. In this article, a fuzzy clustering algorithm with local automatic weighting is presented. The proposed algorithm has many advantages such as: 1) the weights perform features locally, meaning that each cluster's weight is different from the rest. 2) calculating the distance between the samples using a non-euclidian similarity criterion to reduce the noise effect. 3) the weight of the features is obtained comparatively during the learning process. In this study, mathematical analyzes were done to obtain the clustering centers well-being and the features' weights. Experiments were done on the data set range to represent the progressive algorithm's efficiency compared to other proposed algorithms with global and local features

Download Full-text

GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/1/15773 ◽

2021 ◽

Vol 37 (1) ◽

pp. 71-89

Author(s):

Vu-Tuan Dang ◽

Viet-Vu Vu ◽

Hong-Quan Do ◽

Thi Kieu Oanh Le

Keyword(s):

Active Learning ◽

Clustering Algorithm ◽

Side Information ◽

Clustering Algorithms ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Supervised Clustering ◽

Class Labels ◽

Graph Based Clustering

During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the clustering results can be significantly improved by using side information, which is available or collected from users. There are two main kinds of side information that can be learned in semi-supervised clustering algorithms: the class labels - called seeds or the pairwise constraints. The first semi-supervised clustering was introduced in 2000, and since that, many algorithms have been presented in literature. However, it is not easy to use both types of side information in the same algorithm. To address the problem, this paper proposes a semi-supervised graph based clustering algorithm that tries to use seeds and constraints in the clustering process, called MCSSGC. Moreover, we introduces a simple but efficient active learning method to collect the constraints that can boost the performance of MCSSGC, named KMMFFQS. In order to verify effectiveness of the proposed algorithm, we conducted a series of experiments not only on real data sets from UCI, but also on a document data set applied in an Information Extraction of Vietnamese documents. These obtained results show that the proposed algorithm can significantly improve the clustering process compared to some recent algorithms.

Download Full-text

Short-Term Passenger Flow Forecast of Rail Transit Station Based on MIC Feature Selection and ST-LightGBM considering Transfer Passenger Flow

Scientific Programming ◽

10.1155/2020/3180628 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Zhe Zhang ◽

Cheng Wang ◽

Yueer Gao ◽

Jianwei Chen ◽

Yiwen Zhang

Keyword(s):

Feature Selection ◽

Gradient Boosting ◽

Short Term ◽

Bp Network ◽

Metro Station ◽

Passenger Flow ◽

Light Gradient ◽

Time Space ◽

Short Term Forecasting ◽

Maximal Information Coefficient

To solve the problems of current short-term forecasting methods for metro passenger flow, such as unclear influencing factors, low accuracy, and high time-space complexity, a method for metro passenger flow based on ST-LightGBM after considering transfer passenger flow is proposed. Firstly, using historical data as the training set to transform the problem into a data-driven multi-input single-output regression prediction problem, the problem of the short-term prediction of metro passenger flow is formalized and the difficulties of the problem are identified. Secondly, we extract the candidate temporal and spatial features that may affect passenger flow at a metro station from passenger travel data based on the spatial transfer and spatial similarity of passenger flow. Thirdly, we use a maximal information coefficient (MIC) feature selection algorithm to select the significant impact features as the input. Finally, a short-term forecasting model for metro passenger flow based on the light gradient boosting machine (LightGBM) model is established. Taking transfer passenger flow into account, this method has a low space-time cost and high accuracy. The experimental results on the dataset of Lianban metro station in Xiamen city show that the proposed method obtains higher prediction accuracy than SARIMA, SVR, and BP network.

Download Full-text