Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations

Author(s):  
Dimple Chehal ◽  
Parul Gupta ◽  
Payal Gulati
Author(s):  
Ligaj Pradhan ◽  
Chengcui Zhang ◽  
Steven Bethard

Intricate user-behaviors can be understood by discovering user interests from their reviews. Topic modeling techniques have been extensively explored to discover latent user interests from user reviews. However, a topic extracted by topic modelling techniques can be a mixture of several quite different concepts and thus less interpretable. In this paper, the authors present a method that uses topic modeling techniques to discover a large number of topics and applies hierarchical clustering to generate a much smaller number of interpretable User-Concerns. These User-Concerns are further compared with topics generated by Latent Dirichlet Allocation (LDA) and Pachinko Allocation Model (PAM) and shown to be more coherent and interpretable. The authors cut the linkage tree formed while performing the hierarchical clustering of the User-Concerns, at different levels, and generate a hierarchy of User-Concerns. They also discuss how collaborative filtering based recommendation systems can be enriched by infusing additional user-behavioral knowledge from such hierarchy.


Author(s):  
Zarmeen Nasim

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.


Author(s):  
Ponmalar R ◽  
Ponnarasi D ◽  
Sangeetha A ◽  
Kingsy Grace R

Text mining is a process of converting unstructured data into meaningful data. It may be loosely characterized as the process of analyzing text to extract information that is useful for particular purposes. Topic modeling is a form of text mining, a way of identifying patterns in a corpus. The topics produced by topic modeling techniques are clusters of similar words that are frequently occur together. Topic modeling is also a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, a document is about a particular topic, one would expect particular words to appear in the document more or less frequently. This paper, presents a survey on topic modeling in clinical documents.


2021 ◽  
Vol 26 (6) ◽  
Author(s):  
Camila Costa Silva ◽  
Matthias Galster ◽  
Fabian Gilson

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.


Sign in / Sign up

Export Citation Format

Share Document