A Comparative Automated Text Analysis of Airbnb Reviews in Hong Kong and Singapore Using Latent Dirichlet Allocation

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.

Download Full-text

Automated Keyword Filtering in LDA for Identifying Product Attributes from Online Reviews

Journal of Mechanical Design ◽

10.1115/1.4048960 ◽

2020 ◽

pp. 1-10

Author(s):

Junegak Joung ◽

Harrison M. Kim

Keyword(s):

Product Design ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Online Reviews ◽

Previous Method ◽

Product Attributes ◽

Customer Reviews ◽

Online Customer Reviews ◽

Dirichlet Allocation

Abstract Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This paper proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

Determination of Motivating Factors of Urban Forest Visitors through Latent Dirichlet Allocation Topic Modeling

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18189649 ◽

2021 ◽

Vol 18 (18) ◽

pp. 9649

Author(s):

Doo-San Kim ◽

Byeong-Cheol Lee ◽

Kwang-Hi Park

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Urban Forest ◽

Urban Forests ◽

Motivating Factors ◽

Text Data ◽

Stop Word ◽

Daily Exercise ◽

Dirichlet Allocation

Despite the unique characteristics of urban forests, the motivating factors of urban forest visitors have not been clearly differentiated from other types of the forest resource. This study aims to identify the motivating factors of urban forest visitors, using latent Dirichlet allocation (LDA) topic modeling based on social big data. A total of 57,449 cases of social text data from social blogs containing the keyword “urban forest” were collected from Naver and Daum, the major search engines in South Korea. Then, 17,229 cases were excluded using morpheme analysis and stop word elimination; 40,110 cases were analyzed to identify the motivating factors of urban forest visitors through LDA topic modeling. Seven motivating factors—“Cafe-related Walk”, “Healing Trip”, “Daily Leisure”, “Family Trip”, “Wonderful View”, “Clean Space”, and “Exhibition and Photography”—were extracted; each contained five keywords. This study elucidates the role of forests as a place for healing, leisure, and daily exercise. The results suggest that efforts should be made toward developing various programs regarding the basic functionality of urban forests as a natural resource and a unique place to support a diversity of leisure and cultural activities.

Download Full-text

Analyzing U.S. Army Officer Evaluation Reports with Natural Language Processing: A Log-Odds and Latent Dirichlet Allocation Exploration

Industrial and Systems Engineering Review ◽

10.37266/iser.2019v7i1.pp44-55 ◽

2019 ◽

Vol 7 (1) ◽

pp. 44-55

Author(s):

Heidy Shi ◽

John Caddell ◽

Julia Lensing

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Language Processing ◽

Text Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Data Set ◽

Army Officer ◽

Log Odds ◽

Dirichlet Allocation

Each job field (branch) in the Army requires a unique set of skills and talents of the officers assigned. Officers who demonstrate the required skills are often more successful in their assigned branch. To better understand how success is described across branches, research was conducted using text mining and text analysis of a data set of Officer Evaluation Reports (OERs). This research looked for common trends and discrepancies across varying branches and like groups of branches by analyzing the narrative portion of OERs. Text analysis methods examined words and bigrams commonly used to describe varying degrees of performance by officers. Topic modeling using Latent Dirichlet Allocation (LDA) was also conducted on top rated narratives to investigate trends and discrepancies in clustering narratives. Findings show that qualitative narratives for the top two performance designations fail to differentiate between officers’ varying levels of performance regardless of branch.

Download Full-text

Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317278 ◽

2020 ◽

Author(s):

Mohd Faiz Hilmi ◽

Yanti Mustapha ◽

Mohammad Tasyriq Che Omar

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Emerging Market ◽

Modeling Study ◽

Dirichlet Allocation

Download Full-text

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Download Full-text

Spam Diffusion in Social Networking Media using Latent Dirichlet Allocation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7898.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 881-885

Keyword(s):

Online Social Networks ◽

Topic Modeling ◽

Information Diffusion ◽

Latent Dirichlet Allocation ◽

Good Accuracy ◽

Ground Truth ◽

Online Social Media ◽

Diffusion Dynamics ◽

Dirichlet Allocation

Like web spam has been a major threat to almost every aspect of the current World Wide Web, similarly social spam especially in information diffusion has led a serious threat to the utilities of online social media. To combat this challenge the significance and impact of such entities and content should be analyzed critically. In order to address this issue, this work usedTwitter as a case study and modeled the contents of information through topic modeling and coupled it with the user oriented feature to deal it with a good accuracy. Latent Dirichlet Allocation (LDA) a widely used topic modeling technique is applied to capture the latent topics from the tweets’ documents. The major contribution of this work is twofold: constructing the dataset which serves as the ground-truth for analyzing the diffusion dynamics of spam/non-spam information and analyzing the effects of topics over the diffusibility. Exhaustive experiments clearly reveal the variation in topics shared by the spam and nonspam tweets. The rise in popularity of online social networks, not only attracts legitimate users but also the spammers. Legitimate users use the services of OSNs for a good purpose i.e., maintaining the relations with friends/colleagues, sharing the information of interest, increasing the reach of their business through advertisings

Download Full-text

Urban Crisis Detection Technique: A Spatial and Data Driven Approach Based on Latent Dirichlet Allocation (LDA) Topic Modeling

Construction Research Congress 2018 ◽

10.1061/9780784481271.025 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yan Wang ◽

John E. Taylor

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Detection Technique ◽

Data Driven ◽

Urban Crisis ◽

Data Driven Approach ◽

Dirichlet Allocation

Download Full-text

Topic modeling for expert finding using latent Dirichlet allocation

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery ◽

10.1002/widm.1102 ◽

2013 ◽

Vol 3 (5) ◽

pp. 346-353 ◽

Cited By ~ 11

Author(s):

Saeedeh Momtazi ◽

Felix Naumann

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Expert Finding ◽

Dirichlet Allocation

Download Full-text

Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation

Computer Science and Mathematical Modelling ◽

10.5604/01.3001.0013.1458 ◽

2019 ◽

Vol 0 (8/2018) ◽

pp. 17-28

Author(s):

Maciej Jankowski

Keyword(s):

Text Analysis ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Previous Analysis ◽

Ensemble Methods ◽

Topic Modelling ◽

New Methods ◽

Data Scientist ◽

Dirichlet Allocation

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.

Download Full-text