Automated Keyword Filtering in LDA for Identifying Product Attributes from Online Reviews

2020 ◽  
pp. 1-10
Author(s):  
Junegak Joung ◽  
Harrison M. Kim

Abstract Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This paper proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ziang Wang ◽  
Feng Yang

Purpose It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word. Design/methodology/approach First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model. Findings The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment. Originality/value This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.


Like web spam has been a major threat to almost every aspect of the current World Wide Web, similarly social spam especially in information diffusion has led a serious threat to the utilities of online social media. To combat this challenge the significance and impact of such entities and content should be analyzed critically. In order to address this issue, this work usedTwitter as a case study and modeled the contents of information through topic modeling and coupled it with the user oriented feature to deal it with a good accuracy. Latent Dirichlet Allocation (LDA) a widely used topic modeling technique is applied to capture the latent topics from the tweets’ documents. The major contribution of this work is twofold: constructing the dataset which serves as the ground-truth for analyzing the diffusion dynamics of spam/non-spam information and analyzing the effects of topics over the diffusibility. Exhaustive experiments clearly reveal the variation in topics shared by the spam and nonspam tweets. The rise in popularity of online social networks, not only attracts legitimate users but also the spammers. Legitimate users use the services of OSNs for a good purpose i.e., maintaining the relations with friends/colleagues, sharing the information of interest, increasing the reach of their business through advertisings


2021 ◽  
Vol 143 (8) ◽  
Author(s):  
Junegak Joung ◽  
Harrison M. Kim

Abstract The importance–performance analysis (IPA) is a widely used technique to guide strategic planning for the improvement of customer satisfaction. Compared with surveys, numerous online reviews can be easily collected at a lower cost. Online reviews provide a promising source for the IPA. This paper proposes an approach for conducting the IPA from online reviews for product design. Product attributes from online reviews are first identified by latent Dirichlet allocation. The performance of the identified attributes is subsequently estimated by the aspect-based sentiment analysis of IBM Watson. Finally, the importance of the identified attributes is estimated by evaluating the effect of sentiments of each product attribute on the overall rating using an explainable deep neural network. A Shapley additive explanation-based method is proposed to estimate the importance values of product attributes with a low variance by combining the effect of the input features from multiple optimal neural networks with a high performance. A case study of smartphones is presented to demonstrate the proposed approach. The performance and importance estimates of the proposed approach are compared with those of previous sentiment analysis and neural network-based method, and the results exhibit that the former can perform IPA more reliably. The proposed approach uses minimal manual operation and can support companies to take decisions rapidly and effectively, compared with survey-based methods.


2020 ◽  
Vol 12 (16) ◽  
pp. 6673 ◽  
Author(s):  
Kiattipoom Kiatkawsin ◽  
Ian Sutherland ◽  
Jin-Young Kim

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.


2021 ◽  
Vol 1 ◽  
pp. 417-426
Author(s):  
Kangcheng Lin ◽  
Harrison Kim

AbstractWith the growth of online marketplaces and social media, product designers have been seeing an exponential growth of data available, which can serve as an extremely valuable source of information communicated from customers without geographical limitations. The data will reveal customers’ preferences, which can be expensive and slow to obtain via traditional methods such as survey and questionnaires. While existing methods in the literature have been proposed to extract product information and make inference from online data, they have limitations, especially in providing reliable results and in dealing with data sparsity. Therefore, this paper proposes a method to conduct an Important-performance analysis from online reviews. The major steps of this method involve using latent Dirichlet allocation (LDA) to identify product attributes, using IBM Watson Natural Language Understanding tool to perform aspect-based sentiment analysis, and using XGBoost model to infer product attribute importance from the collected dataset. In our case study, we have collected over 150,000 text reviews of more than 3,000 laptops from Amazon.


2021 ◽  
Vol 16 (4) ◽  
pp. 1042-1065
Author(s):  
Anne Gottfried ◽  
Caroline Hartmann ◽  
Donald Yates

The business intelligence (BI) market has grown at a tremendous rate in the past decade due to technological advancements, big data and the availability of open source content. Despite this growth, the use of open government data (OGD) as a source of information is very limited among the private sector due to a lack of knowledge as to its benefits. Scant evidence on the use of OGD by private organizations suggests that it can lead to the creation of innovative ideas as well as assist in making better informed decisions. Given the benefits but lack of use of OGD to generate business intelligence, we extend research in this area by exploring how OGD can be used to generate business intelligence for the identification of market opportunities and strategy formulation; an area of research that is still in its infancy. Using a two-industry case study approach (footwear and lumber), we use latent Dirichlet allocation (LDA) topic modeling to extract emerging topics in these two industries from OGD, and a data visualization tool (pyLDAVis) to visualize the topics in order to interpret and transform the data into business intelligence. Additionally, we perform an environmental scanning of the environment for the two industries to validate the usability of the information obtained. The results provide evidence that OGD can be a valuable source of information for generating business intelligence and demonstrate how topic modeling and visualization tools can assist organizations in extracting and analyzing information for the identification of market opportunities.


Author(s):  
Rahul Rai

Identifying customer needs and preferences is one of the most important tasks in design process. Typically, a variation of interview based approaches is used to conduct need and preference analysis. In this paper, a new approach based on text mining online (internet based) customer reviews to supplement traditional methods of need and preference analysis is considered. The key idea underlying the proposed approach is to partition online customer generated product reviews into segments that evaluate the individual attributes of a product (e.g zoom capability and support of different image formats in a camcorder). Additionally, the proposed method also identifies the importance (ranking) that customers place on each product attributes. The method is demonstrated on 100 customer reviews submitted for camcorders on epinions.com over a two year period.


2020 ◽  
Vol 44 (5) ◽  
pp. 1027-1055
Author(s):  
Thanh-Tho Quan ◽  
Duc-Trung Mai ◽  
Thanh-Duy Tran

PurposeThis paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels. Categorical influencers are important for media marketing but to automatically detect them remains a challenge.Design/methodology/approachWe deployed the emerging deep learning approaches. Precisely, we used word embedding to encode semantic information of words occurring in the common microtext of social media and used variational autoencoder (VAE) to approximate the topic modeling process, through which the active categories of influencers are automatically detected. We developed a system known as Categorical Influencer Detection (CID) to realize those ideas.FindingsThe approach of using VAE to simulate the Latent Dirichlet Allocation (LDA) process can effectively handle the task of topic modeling on the vast dataset of microtext on social media channels.Research limitations/implicationsThis work has two major contributions. The first one is the detection of topics on microtexts using deep learning approach. The second is the identification of categorical influencers in social media.Practical implicationsThis work can help brands to do digital marketing on social media effectively by approaching appropriate influencers. A real case study is given to illustrate it.Originality/valueIn this paper, we discuss an approach to automatically identify the active categories of influencers by performing topic detection from the microtext related to the influencers in social media channels. To do so, we use deep learning to approximate the topic modeling process of the conventional approaches (such as LDA).


Sign in / Sign up

Export Citation Format

Share Document