An Efficient Method for Text Classification Task

Abstract The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.

Download Full-text

Explicit Interaction Model towards Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016359 ◽

2019 ◽

Vol 33 ◽

pp. 6359-6366 ◽

Cited By ~ 3

Author(s):

Cunxiao Du ◽

Zhaozheng Chen ◽

Fuli Feng ◽

Lei Zhu ◽

Tian Gan ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Interaction Mechanism ◽

Interaction Model ◽

Classification Task ◽

Fine Grained ◽

Word Level ◽

Benchmark Datasets ◽

Classification Tasks

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

Download Full-text

Hybrid Model of Data Augmentation Methods for Text Classification Task

10.5220/0010688500003064 ◽

2021 ◽

Author(s):

Jia Feng ◽

Mahsa Mohaghegh

Keyword(s):

Hybrid Model ◽

Text Classification ◽

Data Augmentation ◽

Classification Task

Download Full-text

Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v2i1.6445 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 1

Author(s):

Jonathan Radot Fernando ◽

Raymond Budiraharjo ◽

Emeraldi Haganusa

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Task ◽

Bag Of Words ◽

Text Representation ◽

Frequency Data ◽

Bayes Algorithm ◽

Representation Method ◽

The Way

Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.

Download Full-text

Label Propagation for Text Classification Using Latent Topics

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0818 ◽

2014 ◽

Vol 18 (5) ◽

pp. 818-822

Author(s):

Akiko Eriguchi ◽

◽

Ichiro Kobayashi

Keyword(s):

Supervised Learning ◽

Text Classification ◽

Label Propagation ◽

Classification Task ◽

Surface Information ◽

Similarity Graph ◽

Latent Topics

The objective of this paper is to raise the accuracy of multiclass text classification through Graph-Based Semi-Supervised Learning (GBSSL). In GBSSL, it is essential to construct a proper graph which expresses the relation among nodes. We propose a method to construct a similarity graph by employing both surface information and latent information to express similarity between nodes. Experimenting on a Reuters-21578 corpus, we have confirmed that our proposal works well in raising the accuracy of GBSSL in a multiclass text classification task.

Download Full-text

Efficient method for feature selection in text classification

2017 International Conference on Engineering and Technology (ICET) ◽

10.1109/icengtechnol.2017.8308201 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jian Sun ◽

Xiang Zhang ◽

Dan Liao ◽

Victor Chang

Keyword(s):

Feature Selection ◽

Text Classification ◽

Efficient Method

Download Full-text

Text classification with semantically enriched word embeddings

Natural Language Engineering ◽

10.1017/s1351324920000170 ◽

2020 ◽

pp. 1-35

Author(s):

N. Pittaras ◽

G. Giannakopoulos ◽

G. Papadakis ◽

V. Karkaletsis

Keyword(s):

Text Classification ◽

Semantic Information ◽

Classification Performance ◽

Classification Task ◽

Propagation Mechanism ◽

Word Embeddings ◽

Performance Loss ◽

Part Of Speech ◽

Box Models ◽

Document Frequency

Abstract The recent breakthroughs in deep neural architectures across multiple machine learning fields have led to the widespread use of deep neural models. These learners are often applied as black-box models that ignore or insufficiently utilize a wealth of preexisting semantic information. In this study, we focus on the text classification task, investigating methods for augmenting the input to deep neural networks (DNNs) with semantic information. We extract semantics for the words in the preprocessed text from the WordNet semantic graph, in the form of weighted concept terms that form a semantic frequency vector. Concepts are selected via a variety of semantic disambiguation techniques, including a basic, a part-of-speech-based, and a semantic embedding projection method. Additionally, we consider a weight propagation mechanism that exploits semantic relationships in the concept graph and conveys a spreading activation component. We enrich word2vec embeddings with the resulting semantic vector through concatenation or replacement and apply the semantically augmented word embeddings on the classification task via a DNN. Experimental results over established datasets demonstrate that our approach of semantic augmentation in the input space boosts classification performance significantly, with concatenation offering the best performance. We also note additional interesting findings produced by our approach regarding the behavior of term frequency - inverse document frequency normalization on semantic vectors, along with the radical dimensionality reduction potential with negligible performance loss.

Download Full-text

Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/349 ◽

2017 ◽

Cited By ~ 3

Author(s):

Seungwhan Moon ◽

Jaime Carbonell

Keyword(s):

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

A Priori ◽

Classification Task ◽

Learning Approach ◽

Learning Framework ◽

Previous State

We study a transfer learning framework where source and target datasets are heterogeneous in both feature and label spaces. Specifically, we do not assume explicit relations between source and target tasks a priori, and thus it is crucial to determine what and what not to transfer from source knowledge. Towards this goal, we define a new heterogeneous transfer learning approach that (1) selects and attends to an optimized subset of source samples to transfer knowledge from, and (2) builds a unified transfer network that learns from both source and target knowledge. This method, termed "Attentional Heterogeneous Transfer", along with a newly proposed unsupervised transfer loss, improve upon the previous state-of-the-art approaches on extensive simulations as well as a challenging hetero-lingual text classification task.

Download Full-text

Using Case-Based Reasoning Approach for Text Classification Task

2020 International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) ◽

10.1109/reepe49198.2020.9059249 ◽

2020 ◽

Author(s):

Igor Nikonov ◽

Ivan Kurilenko

Keyword(s):

Text Classification ◽

Classification Task ◽

Case Based Reasoning ◽

Case Based

Download Full-text

Understanding Satirical Articles Using Common-Sense

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00116 ◽

2016 ◽

Vol 4 ◽

pp. 537-549 ◽

Cited By ~ 1

Author(s):

Dan Goldwasser ◽

Xiao Zhang

Keyword(s):

Text Classification ◽

Common Sense ◽

Latent Variable ◽

Latent Variable Model ◽

Classification Task ◽

Classification Methods ◽

Variable Model

Automatic satire detection is a subtle text classification task, for machines and at times, even for humans. In this paper we argue that satire detection should be approached using common-sense inferences, rather than traditional text classification methods. We present a highly structured latent variable model capturing the required inferences. The model abstracts over the specific entities appearing in the articles, grouping them into generalized categories, thus allowing the model to adapt to previously unseen situations.

Download Full-text