Toward a Language Modeling Approach for Consumer Review Spam Detection

Author(s):  
C.L. Lai ◽  
K.Q. Xu ◽  
Raymond Y.K. Lau ◽  
Y. Li ◽  
L. Jing
2011 ◽  
Vol 2 (4) ◽  
pp. 1-30 ◽  
Author(s):  
Raymond Y. K. Lau ◽  
S. Y. Liao ◽  
Ron Chi-Wai Kwok ◽  
Kaiquan Xu ◽  
Yunqing Xia ◽  
...  

2017 ◽  
Vol 51 (2) ◽  
pp. 202-208 ◽  
Author(s):  
Jay M. Ponte ◽  
W. Bruce Croft

Author(s):  
Zarmeen Nasim

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.


Sign in / Sign up

Export Citation Format

Share Document