scholarly journals Topic modeling for analyzing online reviews in hotel sector

Author(s):  
Nguyen Van Ho ◽  
Ho Trung Thanh

Recently, with the growth of technology and the Internet, customers can easily create their opinions and feedbacks about products and services of hotels on websites or social media. This information is stored in textual form, and is a huge source of data to explore. In order to continue developing to meet customers' needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 26,482 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After preprocessing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) by Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus according to the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with the corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hotel sector.

Author(s):  
Ho Van Nguyen ◽  
Ho Trung Thanh

Recently, with the growth of technology and the Internet, customers can easily give their opinions and feedback about products and services on websites or social media. This information is stored in text form, and is a huge source of data to explore. In order to continue developing to meet customers needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 99,322 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After pre-processing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) was chosen by Perplexity and Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus was used based on the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with a corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hospitality sector.


2020 ◽  
pp. 1-10
Author(s):  
Junegak Joung ◽  
Harrison M. Kim

Abstract Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This paper proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter.


2019 ◽  
Vol 74 (1) ◽  
pp. 20-29 ◽  
Author(s):  
Kun Kim ◽  
Ounjoung Park ◽  
Jacob Barr ◽  
Haejung Yun

Purpose The purpose of this research is to analyze the shifting perceptions of international tourists to Jeju Island and provide practical lessons to the tourism industry. Specifically, in regard to three United Nations Educational, Scientific and Cultural Organization (UNESCO) natural World Heritage sites in Jeju, this research measures the most salient topics mentioned by tourists to inform a more accurate perception of the island’s most valuable natural assets as reported by tourism experiences. Design/methodology/approach This study used a Web crawler to gather over 1,500 English language reviews from international tourists from a famous travel information website. The collected data were then preprocessed for stemming and lemmatization. After this, the processed text data were analyzed through a latent Dirichlet allocation (LDA)-based topic modeling approach to identify the most prominent clusters of ideas mentioned and represent them visually through graphs, tables and charts. Findings The findings from this research suggest that there are ten identifiable topics. Topics focusing on “adventure,” “summits” and “winter” showed noticeable increases, whereas topics focusing on “sunrise peak” and “UNESCO” have decreased over time. There is a trend for international tourists to be ever more conscious of the adventurous and rugged aspects of Jeju, and the novelty of mentioning UNESCO status seems to have worn off. Furthermore, there is the proclivity for tourists to mention “worth” and “enjoy” more as time goes on. Originality/value This study applies LDA-based topic modeling and LDAvis using user-generated online reviews with time-series analyses. Consequently, it provides unique insights into the changing perceptions of ecotourism on Jeju today, as well as contribution to smart tourism fields.


2019 ◽  
Vol 36 (5) ◽  
pp. 655-665 ◽  
Author(s):  
Jurui Zhang

Purpose This paper aims to investigate customers’ experiences with Airbnb by text-mining customer reviews posted on the platform and comparing the extracted topics from online reviews between Airbnb and the traditional hotel industry using topic modeling. Design/methodology/approach This research uses text-mining approaches, including content analysis and topic modeling (latent Dirichlet allocation method), to examine 1,026,988 Airbnb guest reviews of 50,933 listings in seven cities in the USA. Findings The content analysis shows that negative reviews are more authentic and credible than positive reviews on Airbnb and that the occurrence of social words is positively related to positive emotion in reviews, but negatively related to negative emotion in reviews. A comparison of reviews on Airbnb and hotel reviews shows unique topics on Airbnb, namely, “late check-in”, “patio and deck view”, “food in kitchen”, “help from host”, “door lock/key”, “sleep/bed condition” and “host response”. Research limitations/implications The topic modeling result suggests that Airbnb guests want to get to know and connect with the local community; thus, help from hosts on ways they can authentically experience the local community would be beneficial. In addition, the results suggest that customers emphasize their interaction with hosts; thus, to improve customer satisfaction, Airbnb hosts should interact with guests and respond to guests’ inquiries quickly. Practical implications Hotel managers should design marketing programs that fulfill customers’ desire for authentic and local experiences. The results also suggest that peer-to-peer accommodation platforms should improve online review systems to facilitate authentic reviews and help guests have a smooth check-in process. Originality/value This study is one of the first to examine consumer reviews in detail in the sharing economy and compare topics from consumer reviews between Airbnb and hotels.


Author(s):  
Nur Annisa Tresnasari ◽  
Teguh Bharata Adji ◽  
Adhistya Erna Permanasari

Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children.  To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Krzysztof Celuch

PurposeIn search of creating an extraordinary experience for customers, services have gone beyond the means of a transaction between buyers and sellers. In the event industry, where purchasing tickets online is a common procedure, it remains unclear as to how to enhance the multifaceted experience. This study aims at offering a snapshot into the most valued aspects for consumers and to uncover consumers' feelings toward their experience of purchasing event tickets on third-party ticketing platforms.Design/methodology/approachThis is a cross-disciplinary study that applies knowledge from both data science and services marketing. Under the guise of natural language processing, latent Dirichlet allocation topic modeling and sentiment analysis were used to interpret the embedded meanings based on online reviews.FindingsThe findings conceptualized ten dimensions valued by eventgoers, including technical issues, value of core product and service, word-of-mouth, trustworthiness, professionalism and knowledgeability, customer support, information transparency, additional fee, prior experience and after-sales service. Among these aspects, consumers rated the value of the core product and service to be the most positive experience, whereas the additional fee was considered the least positive one.Originality/valueDrawing from the intersection of natural language processing and the status quo of the event industry, this study offers a better understanding of eventgoers' experiences in the case of purchasing online event tickets. It also provides a hands-on guide for marketers to stage memorable experiences in the era of digitalization.


2020 ◽  
Vol 9 (2) ◽  
pp. 14-35
Author(s):  
Debabrata Sarddar ◽  
Raktim Kumar Dey ◽  
Rajesh Bose ◽  
Sandip Roy

As ubiquitous as it is, the Internet has spawned a slew of products that have forever changed the way one thinks of society and politics. This article proposes a model to predict chances of a political party winning based on data collected from Twitter microblogging website, because it is the most popular microblogging platform in the world. Using unsupervised topic modeling and the NRC Emotion Lexicon, the authors demonstrate how it is possible to predict results by analyzing eight types of emotions expressed by users on Twitter. To prove the results based on empirical analysis, the authors examine the Twitter messages posted during 14th Gujarat Legislative Assembly election, 2017. Implementing two unsupervised clustering methods of K-means and Latent Dirichlet Allocation, this research shows how the proposed model is able to examine and summarize observations based on underlying semantic structures of messages posted on Twitter. These two well-known unsupervised clustering methods provide a firm base for the proposed model to enable streamlining of decision-making processes objectively.


2019 ◽  
Vol 9 (2) ◽  
Author(s):  
Klaus Solberg Söilen

For the upcoming conference on Intelligence Studies at ICI 2020 in Bad Nauheim, Germany the focus of this issue of JISIB is on collective intelligence and foresight. The first two papers by Søilen and Almedia and Lesca deal with collective intelligence from an intelligence studies perspective. It may be said that the Internet itself is a gigantic collective intelligence effort, the largest in human history. Open source is a prerequisite for this system to work for everyone. The article by Černý et al. is on open source. All other contributions are on the connection between the Internet, software and intelligence. This issue consists of seven articles to compensate for two articles that were taken out by editors in the last issue. The first article by Søilen entitled “Making sense of the collective intelligence field: a review” is a historical review of the field of collective intelligence. The paper shows how collective intelligence is an interdisciplinary field and argues there is a flaw in the notion of “wisdom of crowds”. Collective intelligence can be understood in terms of social systems theory and as such this approach has been fruitful for the social sciences, although so far not very popular. It also bares relevance for the study of business and economics. The second article by Almeida and Lesca is entitled “Collective intelligence process to interpret weak signals and early warnings”. Early warning and the detection of weak signals is a vital topic for any intelligence organization. Two aspects are discussed in the paper, the importance of new technology and collective sense making or interpretation The third article by Shaikh and Singhal entitled “Study on the various intellectual property management strategies used and implemented by ICT firms for business intelligence” deals with intellectual property rights and patenting strategies. The authors identify a number of defensive and offensive IP strategies applied to ICT companies. The results have a bearing on patent acquisitions. The fourth article by Lamrhari et al. is entitled “Web intelligence for understanding customer satisfaction: application of Latent Dirichlet Allocation (LDA) and the Kano model”. Customer satisfaction today is mostly measured with data from the internet, using different business intelligence techniques. The Kano model is still valuablei,ii, but the way we gather information to assess the different levels in the model has changed. The authors use Latent Dirichlet Allocation to analyze the voice of customer (VOC) in online reviews. They suggest that BI techniques and a fuzzy-Kano model can enable companies to better understand their customers’ online reviews. The fifth article by Nahili et al. is entitled “A new corpus-based convolutional neutral network for big data text analysis”. Companies need efficient ways to analyze everything that is said about them on the internet (reviews, comments). The paper suggests a convolutional neural network (CNN) as it has been successfully used for text classification. IMDB movie reviews and Reuters datasets were used for the experiment. The sixth article by Černý et al. is entitled “Using open data and google search data for competitive intelligence analysis”. Taking the Czech antidepressant market as an example, the authors show how competitive intelligence can be obtained using Google Search data, Google Trend and other OSINT sources. The seventh article by Dadkhah et al. is entitled “The potential of business intelligence tools for expert findings”. The paper suggests a way for researchers to find experts using business intelligence tools. The same method may also be used by any business or person looking for experts on a specific topic. As always, we would above all like to thank the authors for their contributions to this issue of JISIB. Thanks to Dr. Allison Perrigo for reviewing English grammar and helping with layout design for all articles and to the Swedish Research Council for continuous financial support. We hope to see you all at the ICI 2020 on the 16-17 March, 2020. The deadline for the two-page abstract submission is March 1st, 2020.


Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


Sign in / Sign up

Export Citation Format

Share Document