Weakly Supervised Short Text Categorization Using World Knowledge

Abstract Filtering out irrelevant documents and classifying the relevant ones into topical categories is a de facto task in many applications. However, supervised learning solutions require extravagant human efforts on document labeling. In this paper, we propose a novel seed-guided topic model for dataless short text classification and filtering, named SSCF. Without using any labeled documents, SSCF takes a few “seed words” for each category of interest, and conducts short text filtering and classification in a weakly supervised manner. To overcome the issues of data sparsity and imbalance, the short text collection is mapped to a collection of pseudodocuments, one for each word. SSCF infers two kinds of topics on pseudo-documents: category-topics and general-topics. Each category-topic is associated with one category of interest, covering the meaning of the latter. In SSCF, we devise a novel word relevance estimation process based on the seed words, for hidden topic inference. The dominating topic of a short text is identified through post inference and then used for filtering and classification. On two real-world datasets in two languages, experimental results show that our proposed SSCF consistently achieves better classification accuracy than state-of-the-art baselines. We also observe that SSCF can even achieve superior performance than the supervised classifiers supervised latent dirichlet allocation (sLDA) and support vector machine (SVM) on some testing tasks.

Download Full-text

Improving Accuracy of Short Text Categorization Using Contextual Information

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-13-1708-8_26 ◽

2018 ◽

pp. 281-292

Author(s):

V. Vasantha Kumar ◽

S. Sendhilkumar ◽

G. S. Mahalakshmi

Keyword(s):

Text Categorization ◽

Contextual Information ◽

Short Text ◽

Improving Accuracy

Download Full-text

A framework for measuring similarity between Terms in Short Text Categorization

2016 Online International Conference on Green Engineering and Technologies (IC-GET) ◽

10.1109/get.2016.7916642 ◽

2016 ◽

Author(s):

Nandini V. ◽

Janani Chitra R. ◽

P. Uma Maheswari

Keyword(s):

Text Categorization ◽

Short Text

Download Full-text

A Study on the Short Text Categorization using SNS Feature Informations

The Journal of Korean Institute of Information Technology ◽

10.14801/jkiit.2016.14.6.159 ◽

2016 ◽

Vol 14 (6) ◽

pp. 159 ◽

Cited By ~ 1

Author(s):

Sung-Hee Na ◽

Jung-In Kim ◽

Eun-Ji Lee ◽

Pan-Koo Kim

Keyword(s):

Text Categorization ◽

Short Text

Download Full-text

Generic framework for multilingual short text categorization using convolutional neural network

Multimedia Tools and Applications ◽

10.1007/s11042-020-10314-9 ◽

2021 ◽

Author(s):

Liriam Enamoto ◽

Li Weigang ◽

Geraldo P. Rocha Filho

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Text Categorization ◽

Short Text ◽

Generic Framework

Download Full-text

Feature extended short text categorization based on theme ontology

2012 9th International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2012.6234220 ◽

2012 ◽

Author(s):

Yan Zhan ◽

Hao Chen

Keyword(s):

Text Categorization ◽

Short Text

Download Full-text

Filtering Dirty Words in Online Social Network by Applying Automated Filtering System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.573.560 ◽

2014 ◽

Vol 573 ◽

pp. 560-564

Author(s):

P. Kumari Bala ◽

D. Jemi Florinabel ◽

S. Sivasakthi

Keyword(s):

Machine Learning ◽

Social Network ◽

Online Social Networks ◽

Text Categorization ◽

Online Social Network ◽

Project Work ◽

Rule Based ◽

Short Text ◽

Filter Algorithms ◽

Content Based Filtering

The aim of the project work is automatically to filter the dirty words from other users without displaying to the profile owner. In Online Social Network may have possibilities of posting some dirty messages so it need to filter without displaying to owner. It has achieved by using Rule based Filtering System. The Rule Based Filtering System allows users customize to filter the noisy or dirty words by applying some filtering Criteria. It exploits Machine Learning (ML). Machine Learning is a text categorization techniques to specify some categories for assign the short text dirty words based on their content. The content-based filtering on messages posted on user space has specified the additional challenges to be given the short length of these messages. Online social networks not only make it easier for users to share their opinions with each other, but also serve as a platform for developing filter algorithms.

Download Full-text