Weakly Supervised Short Text Categorization Using World Knowledge

Author(s):  
Rima Türker ◽  
Lei Zhang ◽  
Mehwish Alam ◽  
Harald Sack
2019 ◽  
Vol 3 (3) ◽  
pp. 165-186 ◽  
Author(s):  
Chenliang Li ◽  
Shiqian Chen ◽  
Yan Qi

Abstract Filtering out irrelevant documents and classifying the relevant ones into topical categories is a de facto task in many applications. However, supervised learning solutions require extravagant human efforts on document labeling. In this paper, we propose a novel seed-guided topic model for dataless short text classification and filtering, named SSCF. Without using any labeled documents, SSCF takes a few “seed words” for each category of interest, and conducts short text filtering and classification in a weakly supervised manner. To overcome the issues of data sparsity and imbalance, the short text collection is mapped to a collection of pseudodocuments, one for each word. SSCF infers two kinds of topics on pseudo-documents: category-topics and general-topics. Each category-topic is associated with one category of interest, covering the meaning of the latter. In SSCF, we devise a novel word relevance estimation process based on the seed words, for hidden topic inference. The dominating topic of a short text is identified through post inference and then used for filtering and classification. On two real-world datasets in two languages, experimental results show that our proposed SSCF consistently achieves better classification accuracy than state-of-the-art baselines. We also observe that SSCF can even achieve superior performance than the supervised classifiers supervised latent dirichlet allocation (sLDA) and support vector machine (SVM) on some testing tasks.


Author(s):  
Sung-Hee Na ◽  
Jung-In Kim ◽  
Eun-Ji Lee ◽  
Pan-Koo Kim

2014 ◽  
Vol 573 ◽  
pp. 560-564
Author(s):  
P. Kumari Bala ◽  
D. Jemi Florinabel ◽  
S. Sivasakthi

The aim of the project work is automatically to filter the dirty words from other users without displaying to the profile owner. In Online Social Network may have possibilities of posting some dirty messages so it need to filter without displaying to owner. It has achieved by using Rule based Filtering System. The Rule Based Filtering System allows users customize to filter the noisy or dirty words by applying some filtering Criteria. It exploits Machine Learning (ML). Machine Learning is a text categorization techniques to specify some categories for assign the short text dirty words based on their content. The content-based filtering on messages posted on user space has specified the additional challenges to be given the short length of these messages. Online social networks not only make it easier for users to share their opinions with each other, but also serve as a platform for developing filter algorithms.


2010 ◽  
Vol 30 (4) ◽  
pp. 1015-1018 ◽  
Author(s):  
Yue-hong CAI ◽  
Qian ZHU ◽  
Ping SUN ◽  
Xian-yi CHENG

Sign in / Sign up

Export Citation Format

Share Document