An Ensemble Learning Model for Short-Term Passenger Flow Prediction

In recent years, with the continuous improvement of urban public transportation capacity, citizens’ travel has become more and more convenient, but there are still some potential problems, such as morning and evening peak congestion, imbalance between the supply and demand of vehicles and passenger flow, emergencies, and social local passenger flow surged due to special circumstances such as activities and inclement weather. If you want to properly guide the local passenger flow and make a reasonable deployment of operating buses, it is necessary to grasp the changing law of public transportation short-term passenger flow. This paper builds a short-term passenger flow prediction model for urban public transportation based on the idea of integrated learning. The goal is to use the integrated model to accurately predict the short-term passenger flow of urban public transportation, using Multivariable Linear Regression (MLR), K-Nearest Neighbor (KNN), eXtreme Gradient Boosting (XGBoost), and Gated Recurrent Unit (GRU) as the four seed models, and then use regression algorithm to integrate the model and predict the passenger flow, station boarding and landing, and cross-sectional passenger flow data of the typical representative line 428 in the “Huitian Area” of Beijing from January 1, 2020, to May 31, 2020. Finally, the prediction results of the submodels are compared with those of the integrated model to verify the superiority of the integrated model. The research results of this paper can enrich the short-term passenger flow forecasting system of urban public transportation and provide effective data support and scientific basis for the passenger flow, vehicle management, and dispatch of urban public transportation.

Download Full-text

Short-Term Passenger Flow Prediction in Urban Public Transport: Kalman Filtering Combined K-Nearest Neighbor Approach

IEEE Access ◽

10.1109/access.2019.2937114 ◽

2019 ◽

Vol 7 ◽

pp. 120937-120949 ◽

Cited By ~ 7

Author(s):

Shidong Liang ◽

Minghui Ma ◽

Shengxue He ◽

Hu Zhang

Keyword(s):

Kalman Filtering ◽

Public Transport ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Short Term ◽

Passenger Flow ◽

Flow Prediction ◽

Urban Public Transport

Download Full-text

LSTM Based Architecture for Short-Term Metro Passenger Flow Prediction

CICTP 2020 ◽

10.1061/9780784483053.082 ◽

2020 ◽

Author(s):

Yunshi Long ◽

Liang Zou

Keyword(s):

Short Term ◽

Passenger Flow ◽

Flow Prediction

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

Estimating Express Train Preference of Urban Railway Passengers Based on Extreme Gradient Boosting (XGBoost) using Smart Card Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211013349 ◽

2021 ◽

pp. 036119812110133

Author(s):

Eun Hak Lee ◽

Kyoungtae Kim ◽

Seung-Young Kho ◽

Dong-Kyu Kim ◽

Shin-Hyung Cho

Keyword(s):

Travel Time ◽

Smart Card ◽

Public Transportation ◽

Gradient Boosting ◽

Log Data ◽

Smart Card Data ◽

Extreme Gradient Boosting ◽

Express Train ◽

Total Travel Time ◽

The Individual

As the share of public transport increases, the express strategy of the urban railway is regarded as one of the solutions that allow the public transportation system to operate efficiently. It is crucial to express the urban railway’s express strategy to balance a passenger load between the two types of trains, that is, local and express trains. This research aims to estimate passengers’ preference between local and express trains based on a machine learning technique. Extreme gradient boosting (XGBoost) is trained to model express train preference using smart card and train log data. The passengers are categorized into four types according to their preference for the local and express trains. The smart card data and train log data of Metro Line 9 in Seoul are combined to generate the individual trip chain alternatives for each passenger. With the dataset, the train preference is estimated by XGBoost, and Shapley additive explanations (SHAP) is used to interpret and analyze the importance of individual features. The overall F1 score of the model is estimated to be 0.982. The results of feature analysis show that the total travel time of the local train feature is found to substantially affect the probability of express train preference with a 1.871 SHAP value. As a result, the probability of the express train preference increases with longer total travel time, shorter in-vehicle time, shorter waiting time, and few transfers on the passenger’s route. The model shows notable performance in accuracy and provided an understanding of the estimation results.

Download Full-text

Short-term Passenger Flow Prediction on Bus Stop Based on Hybrid Model

Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017) ◽

10.2991/ecae-17.2018.74 ◽

2018 ◽

Author(s):

Zhijian Wang ◽

Chunlei Yang ◽

Chao Zang

Keyword(s):

Hybrid Model ◽

Short Term ◽

Bus Stop ◽

Passenger Flow ◽

Flow Prediction

Download Full-text

Learning Geo-Contextual Embeddings for Commuting Flow Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5425 ◽

2020 ◽

Vol 34 (01) ◽

pp. 808-816

Author(s):

Zhicheng Liu ◽

Fabio Miranda ◽

Weiting Xiong ◽

Junyan Yang ◽

Qiao Wang ◽

...

Keyword(s):

New York ◽

Real World ◽

Policy Development ◽

Contextual Information ◽

Supply And Demand ◽

Gradient Boosting ◽

Spatial Correlations ◽

Commuting Flows ◽

Flow Prediction ◽

Conventional Models

Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in real-world scenarios where many factors need to be considered. Meanwhile, most existing machine learning-based methods ignore the spatial correlations and fail to model the influence of nearby regions. To address these issues, we propose Geo-contextual Multitask Embedding Learner (GMEL), a model that captures the spatial correlations from geographic contextual information for commuting flow prediction. Specifically, we first construct a geo-adjacency network containing the geographic contextual information. Then, an attention mechanism is proposed based on the framework of graph attention network (GAT) to capture the spatial correlations and encode geographic contextual information to embedding space. Two separate GATs are used to model supply and demand characteristics. To enhance the effectiveness of the embedding representation, a multitask learning framework is used to introduce stronger restrictions, forcing the embeddings to encapsulate effective representation for flow prediction. Finally, a gradient boosting machine is trained based on the learned embeddings to predict commuting flows. We evaluate our model using real-world dataset from New York City and the experimental results demonstrate the effectiveness of our proposed method against the state of the art.

Download Full-text

Bacterial Immunogenicity Prediction by Machine Learning Methods

Vaccines ◽

10.3390/vaccines8040709 ◽

2020 ◽

Vol 8 (4) ◽

pp. 709

Author(s):

Ivan Dimitrov ◽

Nevena Zaharieva ◽

Irini Doytchinova

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Predictive Ability ◽

Initial Step ◽

Majority Voting ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Test Set ◽

Extreme Gradient Boosting

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.

Download Full-text

Short-Term Abnormal Passenger Flow Prediction Based on the Fusion of SVR and LSTM

IEEE Access ◽

10.1109/access.2019.2907739 ◽

2019 ◽

Vol 7 ◽

pp. 42946-42955 ◽

Cited By ~ 21

Author(s):

Jianyuan Guo ◽

Zhen Xie ◽

Yong Qin ◽

Limin Jia ◽

Yaguan Wang

Keyword(s):

Short Term ◽

Passenger Flow ◽

Flow Prediction

Download Full-text

A Comprehensive Comparative Analysis of the Basic Theory of the Short Term Bus Passenger Flow Prediction

Symmetry ◽

10.3390/sym10090369 ◽

2018 ◽

Vol 10 (9) ◽

pp. 369 ◽

Cited By ~ 4

Author(s):

Huawei Zhai ◽

Licheng Cui ◽

Yu Nie ◽

Xiaowei Xu ◽

Weishi Zhang

Keyword(s):

Real Time ◽

Public Transportation ◽

Transportation Systems ◽

Training Data ◽

Time Data ◽

Short Term ◽

Passenger Flow ◽

Modeling Process ◽

Automatic Passenger Counters ◽

Public Transportation Systems

In order to meet the real-time public travel demands, the bus operators need to adjust the timetables in time. Therefore, it is necessary to predict the variations of the short-term passenger flow. Under the help of the advanced public transportation systems, a large amount of real-time data about passenger flow is collected from the automatic passenger counters, automatic fare collection systems, etc. Using these data, different kinds of methods are proposed to predict future variations of the short-term bus passenger flow. Based on the properties and background knowledge, these methods are classified into three categories: linear, nonlinear and combined methods. Their performances are evaluated in detail in the major aspects of the prediction accuracy, the complexity of training data structure and modeling process. For comparison, some long-term prediction methods are also analyzed simply. At last, it points that, with the help of automatic technology, a large amount of data about passenger flow will be collected, and using the big data technology to speed up the data preprocessing and modeling process may be one of the directions worthy of study in the future.

Download Full-text