Between comments and repeat visit: capturing repeat visitors with a hybrid approach

PurposeUnderstanding customers' revisiting behavior is highlighted in the field of service industry and the emergence of online communities has enabled customers to express their prior experience. Thus, purpose of this study is to investigate customers' reviews on an online hotel reservation platform, and explores their postbehaviors from their reviews.Design/methodology/approachThe authors employ two different approaches and compare the accuracy of predicting customers' post behavior: (1) using several machine learning classifiers based on sentimental dimensions of customers' reviews and (2) conducting the experiment consisted of two subsections. In the experiment, the first subsection is designed for participants to predict whether customers who wrote reviews would visit the hotel again (referred to as Prediction), while the second subsection examines whether participants want to visit one of the particular hotels when they read other customers' reviews (dubbed as Decision).FindingsThe accuracy of the machine learning approaches (73.23%) is higher than that of the experimental approach (Prediction: 58.96% and Decision: 64.79%). The key reasons of users' predictions and decisions are identified through qualitative analyses.Originality/valueThe findings reveal that using machine learning approaches show the higher accuracy of predicting customers' repeat visits only based on employed sentimental features. With the novel approach of integrating customers' decision processes and machine learning classifiers, the authors provide valuable insights for researchers and providers of hospitality services.

Download Full-text

A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things

Electronics ◽

10.3390/electronics10161955 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1955

Author(s):

Ikram Sumaiya Thaseen ◽

Vanitha Mohanraj ◽

Sakthivel Ramachandran ◽

Kishore Sanapala ◽

Sang-Soo Yeo

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Experimental Analysis ◽

Parameter Tuning ◽

Computational Time ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Machine Learning Classifiers ◽

Learning Classifiers

In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.

Download Full-text

Mo1903 Machine Learning Classifiers: A Novel Approach to Predicting Bleeding Risk in Hospitalized Cirrhotic Patients

Gastroenterology ◽

10.1016/s0016-5085(15)33685-4 ◽

2015 ◽

Vol 148 (4) ◽

pp. S-1079

Author(s):

Spencer L. James ◽

Emily E. Henderson ◽

Joseph J. Shatzel ◽

Rolland Dickson

Keyword(s):

Machine Learning ◽

Bleeding Risk ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Novel Approach ◽

Cirrhotic Patients

Download Full-text

Machine Learning Approaches Applied to GC-FID Fatty Acid Profiles to Discriminate Wild from Farmed Salmon

Foods ◽

10.3390/foods9111622 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1622

Author(s):

Liliana Grazina ◽

P. J. Rodrigues ◽

Getúlio Igrejas ◽

Maria A. Nunes ◽

Isabel Mafra ◽

...

Keyword(s):

Machine Learning ◽

Fatty Acid ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Learning Approaches ◽

Machine Learning Classifiers ◽

Farmed Salmon ◽

Learning Classifiers

In the last decade, there has been an increasing demand for wild-captured fish, which attains higher prices compared to farmed species, thus being prone to mislabeling practices. In this work, fatty acid composition coupled to advanced chemometrics was used to discriminate wild from farmed salmon. The lipids extracted from salmon muscles of different production methods and origins (26 wild from Canada, 25 farmed from Canada, 24 farmed from Chile and 25 farmed from Norway) were analyzed by gas chromatography with flame ionization detector (GC-FID). All the tested chemometric approaches, namely principal components analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and seven machine learning classifiers, namely k-nearest neighbors (kNN), decision tree, support vector machine (SVM), random forest, artificial neural networks (ANN), naïve Bayes and AdaBoost, allowed for differentiation between farmed and wild salmons using the 17 features obtained from chemical analysis. PCA did not allow clear distinguishing between salmon geographical origin since farmed samples from Canada and Chile overlapped. Nevertheless, using the 17 features in the models, six out of the seven tested machine learning classifiers allowed a classification accuracy of ≥99%, with ANN, naïve Bayes, random forest, SVM and kNN presenting 100% accuracy on the test dataset. The classification models were also assayed using only the best features selected by a reduction algorithm and the best input features mapped by t-SNE. The classifier kNN provided the best discrimination results because it correctly classified all samples according to production method and origin, ultimately using only the three most important features (16:0, 18:2n6c and 20:3n3 + 20:4n6). In general, the classifiers presented good generalization with the herein proposed approach being simple and presenting the advantage of requiring only common equipment existing in most labs.

Download Full-text

Analysing user sentiment of Indian movie reviews

The Electronic Library ◽

10.1108/el-08-2017-0182 ◽

2018 ◽

Vol 36 (4) ◽

pp. 590-606 ◽

Cited By ~ 2

Author(s):

Shrawan Kumar Trivedi ◽

Shubhamoy Dey

Keyword(s):

Machine Learning ◽

Roc Curve ◽

False Positive ◽

Naive Bayes ◽

Business Environment ◽

Naïve Bayes ◽

Content Type ◽

Training Time ◽

Machine Learning Classifiers ◽

Learning Classifiers

Purpose To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews. Design/methodology/approach An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest. Findings The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48. Research limitations/implications Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario. Practical implications In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers. Social implications The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications. Originality/value The constructed PCC is novel and was tested on Indian movie review data.

Download Full-text

Capturing user sentiments for online Indian movie reviews

The Electronic Library ◽

10.1108/el-04-2017-0075 ◽

2018 ◽

Vol 36 (4) ◽

pp. 677-695 ◽

Cited By ~ 3

Author(s):

Shrawan Kumar Trivedi ◽

Shubhamoy Dey ◽

Anil Kumar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Comparative Study ◽

Sentiment Analysis ◽

Language Processing ◽

Classification Model ◽

Support Vector ◽

Content Type ◽

Machine Learning Classifiers ◽

Learning Classifiers

Purpose Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers. Design/methodology/approach In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time). Findings The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM. Originality/value This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.

Download Full-text