Mode Choice Prediction using Machine Learning Technique for A Door-to-Door Journey in Kuantan City

A door-to-door journey in a public transportation system is a notable concept that is practically being promoted among users to consider public transport as an important alternative. The door-to-door journey will integrate the travel segments starting from home to destination, including all visible amenities. Users’ preferences on the time travel of these key segments are necessary to be understood. In this case, Machine Learning technique has been seen as a robust computational advancement to forecast their travel mode choice. However, the most convenient model as the best predictor is still questionable. To address this issue, we employed some pre-eminent machine learning models, specifically Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), k-Nearest Neighbor (kNN) as well as Support Vector Machine (SVM), to compare their travel mode choice prediction performance of users in the city of Kuantan. The data collection was conducted in Kuantan City via Revealed/Stated Preferences (RPSP) Survey between 8:00 AM to 5:00 PM on weekdays. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The results depicted that the Random Forest could provide satisfactory classification accuracies for both training and testing data up to 68.3% and 61.3%, respectively, compared to the other evaluated machine learning models. In summary, Random Forest provides a good result in the training and testing data and is considered as the best predictor in this research to forecast users’ mode choice in the city of Kuantan.

Download Full-text

Travel Mode Choice Modeling: Predictive Efficacy between Machine Learning Models and Discrete Choice Model

The Open Transportation Journal ◽

10.2174/1874447802115010241 ◽

2021 ◽

Vol 15 (1) ◽

pp. 241-255

Author(s):

Nur Fahriza Mohd. Ali ◽

Ahmad Farhan Mohd. Sadullah ◽

Anwar PP Abdul Majeed ◽

Mohd Azraai Mohd. Razman ◽

Muhammad Aizzat Zakaria ◽

...

Keyword(s):

Machine Learning ◽

Discrete Choice ◽

Choice Model ◽

Mode Choice ◽

Binary Logistic Regression ◽

Discrete Choice Model ◽

Learning Models ◽

Travel Mode ◽

Travel Mode Choice ◽

Machine Learning Models

Background: A complex travel behaviour among users is intertwined with many factors. Traditionally, the exploration in travel mode choice modeling has been dominated by the Discrete Choice model, nonetheless, owing to the advancement in computational techniques, machine learning has gained traction in understanding travel behavior. Aim: This study aims at predicting users’ travel model choice by means of machine learning models against a conventional Discrete Choice Model, i.e., Binary Logistic Regression. Objective: To investigate the comparison between machine learning models, namely Neural Network, Random Forest, Decision Tree, and Support Vector Machine against the Discrete Choice Model (Binary Logistic Regression) in the prediction of travel mode choice amongst Kuantan City. Methodology: The dataset was collected in Kuantan City, Malaysia, through the Revealed/Stated Preferences (RP/SP) Survey. The data collected was split into a ratio of 80:20 for training and testing before evaluating them between the aforesaid models. The hyperparameters of the models were set to default. The performance of the models is evaluated based on classification accuracy. Results: It was shown in the present study that the Neural Network Model is able to attain a higher prediction accuracy as compared to Binary Logistic Regression (Discrete Choice Model) in classifying mode choice of Kuantan users either to choose public transport or private vehicles as daily transportation. Feature importance technique is crucial for identifying the significant features in modelling travel mode choice. It is demonstrated that the Neural Network Model can yield exceptional classification of mode choice up to 73.4% and 72.4% of training and testing data, respectively, by considering the features identified via the feature importance technique, suggesting the viability of the proposed technique in supporting an informed decision. Conclusion: The findings highlight the strengths and limitations of the Machine Learning Technique as well as the Discrete Choice Model in modeling travel mode choice. It was shown that Machine Learning models have the capability to provide better prediction that could assist the urban transportation planning among policymakers. Meanwhile, it could be also demonstrated that the Discrete Choice Model (Binary Logistic Regression) is helpful in getting a better understanding in expressing the inference relationship between variables for improvising the future transportation system.

Download Full-text

Forecasting Daily Travel Mode Choice of Kuantan Travellers by Means of Machine Learning Models

Lecture Notes in Electrical Engineering - Recent Trends in Mechatronics Towards Industry 4.0 ◽

10.1007/978-981-33-4597-3_89 ◽

2021 ◽

pp. 979-987

Author(s):

Nur Fahriza Mohd Ali ◽

Ahmad Farhan Mohd Sadullah ◽

Anwar P. P. Abdul Majeed ◽

Mohd Azraai Mohd Razman ◽

Chun Sern Choong ◽

...

Keyword(s):

Machine Learning ◽

Mode Choice ◽

Learning Models ◽

Travel Mode ◽

Travel Mode Choice ◽

Machine Learning Models ◽

Daily Travel

Download Full-text

Verifying Robustness of Gradient Boosted Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012446 ◽

2019 ◽

Vol 33 ◽

pp. 2446-2453 ◽

Cited By ~ 4

Author(s):

Gil Einziger ◽

Maayan Goldstein ◽

Yaniv Sa’ar ◽

Itai Segall

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Learning Models ◽

Machine Learning Technique ◽

Small Perturbations ◽

Robustness Property ◽

Learning Technique ◽

Verification Tools ◽

Important Quality ◽

Machine Learning Models

Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models.This work introduces VERIGB, a tool for quantifying the robustness of gradient boosted models. VERIGB encodes the model and the robustness property as an SMT formula, which enables state of the art verification tools to prove the model’s robustness. We extensively evaluate VERIGB on publicly available datasets and demonstrate a capability for verifying large models. Finally, we show that some model configurations tend to be inherently more robust than others.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Download Full-text

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Download Full-text