Ensemble Pruning for Text Categorization Based on Data Partitioning

Text mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. The hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifier system with high predictive performance and better diversity. In this scheme, four different diversity measures (namely, disagreement measure, Q-statistics, the correlation coefficient, and the double fault measure) among classifiers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifiers into a number of disjoint groups and one classifier (with the highest predictive performance) from each cluster is selected to build the final multiple classifier system. The experimental results based on five biomedical text benchmarks have been conducted. In the swarm-optimized LDA, different metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, firefly algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, five metaheuristic clustering algorithms are evaluated. The experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifier system outperforms the conventional classification algorithms, ensemble learning, and ensemble pruning methods.

Download Full-text

Text categorization approach based on probability standard deviation with evaluation of distribution information

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.03303 ◽

2010 ◽

Vol 29 (12) ◽

pp. 3303-3306

Author(s):

Qing-zheng JIAO ◽

Cheng-jian WEI

Keyword(s):

Standard Deviation ◽

Text Categorization

Download Full-text

Influence of feature weight on text categorization performance of Bayesian classifier

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.03080 ◽

2009 ◽

Vol 28 (12) ◽

pp. 3080-3083 ◽

Cited By ~ 1

Author(s):

Xiu-mei GAO ◽

Fang CHEN ◽

Feng-xi SONG ◽

Zhong JIN

Keyword(s):

Text Categorization ◽

Bayesian Classifier ◽

Feature Weight

Download Full-text

CONSTRUCTION OF AUTOMATED SYSTEM FOR INFORMATION EXTRACTION AND TEXT CATEGORIZATION

Journal of Al-Nahrain University-Science ◽

10.22401/jnus.11.3.20 ◽

2008 ◽

Vol 11 (3) ◽

pp. 156-174

Author(s):

Abdul Kareem M. Radhi ◽

Keyword(s):

Information Extraction ◽

Text Categorization ◽

Automated System

Download Full-text

Graph-based dynamic ensemble pruning for facial expression recognition

Applied Intelligence ◽

10.1007/s10489-019-01435-2 ◽

2019 ◽

Vol 49 (9) ◽

pp. 3188-3206 ◽

Cited By ~ 1

Author(s):

Danyang Li ◽

Guihua Wen ◽

Xu Li ◽

Xianfa Cai

Keyword(s):

Facial Expression ◽

Facial Expression Recognition ◽

Expression Recognition ◽

Ensemble Pruning

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

Attention-based hierarchical recurrent neural networks for MOOC forum posts analysis

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02747-9 ◽

2020 ◽

Author(s):

Nicola Capuano ◽

Santi Caballé ◽

Jordi Conesa ◽

Antonio Greco

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Text Categorization ◽

Online Courses ◽

Recurrent Network ◽

Subject Area ◽

Conversational Agents ◽

Collaborative Tools ◽

Experimental Findings ◽

Productive Talk

AbstractMassive open online courses (MOOCs) allow students and instructors to discuss through messages posted on a forum. However, the instructors should limit their interaction to the most critical tasks during MOOC delivery so, teacher-led scaffolding activities, such as forum-based support, can be very limited, even impossible in such environments. In addition, students who try to clarify the concepts through such collaborative tools could not receive useful answers, and the lack of interactivity may cause a permanent abandonment of the course. The purpose of this paper is to report the experimental findings obtained evaluating the performance of a text categorization tool capable of detecting the intent, the subject area, the domain topics, the sentiment polarity, and the level of confusion and urgency of a forum post, so that the result may be exploited by instructors to carefully plan their interventions. The proposed approach is based on the application of attention-based hierarchical recurrent neural networks, in which both a recurrent network for word encoding and an attention mechanism for word aggregation at sentence and document levels are used before classification. The integration of the developed classifier inside an existing tool for conversational agents, based on the academically productive talk framework, is also presented as well as the accuracy of the proposed method in the classification of forum posts.

Download Full-text

Pre‐filtering based summarization for data partitioning in distributed stream processing

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6338 ◽

2021 ◽

Author(s):

Adeel Aslam ◽

Hanhua Chen ◽

Hai Jin

Keyword(s):

Stream Processing ◽

Data Partitioning ◽

Distributed Stream Processing

Download Full-text

Text categorization Performance examination Using Machine Learning Algorithms

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/981/2/022044 ◽

2020 ◽

Vol 981 ◽

pp. 022044

Author(s):

Bonthala Prabhanjan Yadav ◽

Sukhaveerji Ghate ◽

A Harshavardhan ◽

G Jhansi ◽

Komuravelly Sudheer Kumar ◽

...

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Examination

Download Full-text

Effective Collaborative Representation Learning for Multilabel Text Categorization

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3069647 ◽

2021 ◽

pp. 1-15

Author(s):

Hao Wu ◽

Shaowei Qin ◽

Rencan Nie ◽

Jinde Cao ◽

Sergey Gorbachev

Keyword(s):

Text Categorization ◽

Representation Learning ◽

Collaborative Representation

Download Full-text