Multiple weak supervision for short text classification

AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.

Download Full-text

Using Semantic Correlation of HowNet for Short Text Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1931 ◽

2014 ◽

Vol 513-517 ◽

pp. 1931-1934 ◽

Cited By ~ 1

Author(s):

Ya Hui Ning ◽

Li Zhang ◽

Ya Rong Ju ◽

Wei Jia Wang ◽

Shun Qin Li

Keyword(s):

Text Classification ◽

High Frequency ◽

Experimental Results ◽

Short Text ◽

Semantic Correlation ◽

Classification Efficiency ◽

High Frequency Words

A method using the HowNet ontologies for short texts classification was proposed. First, the domain high frequency words were got as the feature words. Then the feature words were extended to concept by HowNet, which extended the feature from semantic and amends the feature scarcity. Last, the word semantic correlation values were got by calculating the distance between different concepts in node tree. Experimental results prove that the classification efficiency and precision are both improved.

Download Full-text

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/406 ◽

2017 ◽

Cited By ~ 69

Author(s):

Jin Wang ◽

Zhongyuan Wang ◽

Dawei Zhang ◽

Jun Yan

Keyword(s):

Neural Networks ◽

Knowledge Base ◽

Convolutional Neural Networks ◽

Text Classification ◽

State Of The Art ◽

Experimental Results ◽

Text Representation ◽

Deep Convolutional Neural Networks ◽

Short Text ◽

Fine Grained

Text classification is a fundamental task in NLP applications. Most existing work relied on either explicit or implicit text representation to address this problem. While these techniques work well for sentences, they can not easily be applied to short text because of its shortness and sparsity. In this paper, we propose a framework based on convolutional neural networks that combines explicit and implicit representations of short text for classification. We first conceptualize a short text as a set of relevant concepts using a large taxonomy knowledge base. We then obtain the embedding of short text by coalescing the words and relevant concepts on top of pre-trained word vectors. We further incorporate character level features into our model to capture fine-grained subword information. Experimental results on five commonly used datasets show that our proposed method significantly outperforms state-of-the-art methods.

Download Full-text

Solving the Message Classification Problem in Voice Interaction Systems

Vestnik MEI ◽

10.24160/1993-6982-2020-5-132-139 ◽

2020 ◽

Vol 5 (5) ◽

pp. 132-139

Author(s):

Ivan E. Kurilenko ◽

◽

Igor E. Nikonov ◽

Keyword(s):

Artificial Intelligence ◽

Classification Problem ◽

Subject Area ◽

Text Messages ◽

Case Based Reasoning ◽

Short Text ◽

Proposed Modification ◽

Voice Interaction ◽

Text Content ◽

Case Based

A method for solving the problem of classifying short-text messages in the form of sentences of customers uttered in talking via the telephone line of organizations is considered. To solve this problem, a classifier was developed, which is based on using a combination of two methods: a description of the subject area in the form of a hierarchy of entities and plausible reasoning based on the case-based reasoning approach, which is actively used in artificial intelligence systems. In solving various problems of artificial intelligence-based analysis of data, these methods have shown a high degree of efficiency, scalability, and independence from data structure. As part of using the case-based reasoning approach in the classifier, it is proposed to modify the TF-IDF (Term Frequency - Inverse Document Frequency) measure of assessing the text content taking into account known information about the distribution of documents by topics. The proposed modification makes it possible to improve the classification quality in comparison with classical measures, since it takes into account the information about the distribution of words not only in a separate document or topic, but in the entire database of cases. Experimental results are presented that confirm the effectiveness of the proposed metric and the developed classifier as applied to classification of customer sentences and providing them with the necessary information depending on the classification result. The developed text classification service prototype is used as part of the voice interaction module with the user in the objective of robotizing the telephone call routing system and making a shift from interaction between the user and system by means of buttons to their interaction through voice.

Download Full-text

Attention-based Joint Representation Learning Network for Short text Classification

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence ◽

10.1145/3404555.3404578 ◽

2020 ◽

Author(s):

Xinyue Liu ◽

Yexuan Tang

Keyword(s):

Text Classification ◽

Representation Learning ◽

Short Text ◽

Learning Network ◽

Joint Representation

Download Full-text

Mobile App Third-Party Library traffic discovery method based on short text classification and clustering

2020 International Conference on Information Science, Parallel and Distributed Systems (ISPDS) ◽

10.1109/ispds51347.2020.00024 ◽

2020 ◽

Author(s):

Yuanhao Li ◽

Shuhui Chen ◽

Shuang Zhao ◽

Lian Liu

Keyword(s):

Text Classification ◽

Mobile App ◽

Third Party ◽

Short Text ◽

Discovery Method ◽

Classification And Clustering

Download Full-text

Review of short-text classification

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2017-0083 ◽

2019 ◽

Vol 15 (2) ◽

pp. 155-182 ◽

Cited By ~ 5

Author(s):

Issa Alsmadi ◽

Keng Hoon Gan

Keyword(s):

Social Networks ◽

Text Classification ◽

Development Trend ◽

Practical Reasons ◽

Significant Implication ◽

Classification Problems ◽

Content Type ◽

Short Text ◽

Promising Area ◽

Low Performance

PurposeRapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.Design/methodology/approachThe paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.FindingsThis paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.Originality/valueUsing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Download Full-text

A New SVM Method for Short Text Classification Based on Semi-Supervised Learning

2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS) ◽

10.1109/aits.2015.34 ◽

2015 ◽

Cited By ~ 12

Author(s):

Chunyong Yin ◽

Jun Xiang ◽

Hui Zhang ◽

Jin Wang ◽

Zhichao Yin ◽

...

Keyword(s):

Supervised Learning ◽

Text Classification ◽

Short Text

Download Full-text

COMBINATORIAL RECONSTRUCTION OF HALF-SIBLING GROUPS FROM MICROSATELLITE DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010004793 ◽

2010 ◽

Vol 08 (02) ◽

pp. 337-356 ◽

Cited By ~ 9

Author(s):

SAAD I. SHEIKH ◽

TANYA Y. BERGER-WOLF ◽

ASHFAQ A. KHOKHAR ◽

ISABEL C. CABALLERO ◽

MARY V. ASHLEY ◽

...

Keyword(s):

Exact Solutions ◽

Experimental Results ◽

Microsatellite Data ◽

Reconstruction Problem ◽

Full Sibling ◽

Synthetic Datasets ◽

Sibling Group ◽

Half Sibling ◽

Sibling Groups

While full-sibling group reconstruction from microsatellite data is a well-studied problem, reconstruction of half-sibling groups is much less studied, theoretically challenging, and computationally demanding. In this paper, we present a formulation of the half-sibling reconstruction problem and prove its APX-hardness. We also present exact solutions for this formulation and develop heuristics. Using biological and synthetic datasets we present experimental results and compare them with the leading alternative software COLONY. We show that our results are competitive and allow half-sibling group reconstruction in the presence of polygamy, which is prevalent in nature.

Download Full-text

Incorporate Syntactic Information for Short Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.697 ◽

2011 ◽

Vol 268-270 ◽

pp. 697-700

Author(s):

Rui Xue Duan ◽

Xiao Jie Wang ◽

Wen Feng Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Environment ◽

Text Classification ◽

The Internet ◽

Selection Methods ◽

Text Documents ◽

Short Text ◽

Syntactic Information ◽

Dependency Relations

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text