Multi-label text classification with an ensemble feature space

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219232 ◽

2021 ◽

pp. 1-12

Author(s):

Kushagri Tandon ◽

Niladri Chatterjee

Keyword(s):

Machine Learning ◽

Text Classification ◽

Topic Modeling ◽

Classification Scheme ◽

Feature Space ◽

Linguistic Features ◽

Fuzzy C Means ◽

Text Document ◽

Nonparametric Hypothesis ◽

Fuzzy C Means Clustering

Multi-label text classification aims at assigning more than one class to a given text document, which makes the task more ambiguous and challenging at the same time. The ambiguities come from the fact that often several labels in the prescribed label set are semantically close to each other, making clear demarcation between them difficult. As a consequence, any Machine Learning based approach for developing multi-label classification scheme needs to define its feature space by choosing features beyond linguistic or semi-linguistic features, so that the semantic closeness between the labels is also taken into account. The present work describes a scheme of feature extraction where the training document set and the prescribed label set are intertwined in a novel way to capture the ambiguity in a meaningful way. In particular, experiments were conducted using Topic Modeling and Fuzzy C-means clustering which aim at measuring the underlying uncertainty using probability and membership based measures, respectively. Several Nonparametric hypothesis tests establish the effectiveness of the features obtained through Fuzzy C-Means clustering in multi-label classification. A new algorithm has been proposed for training the system for multi-label classification using the above set of features.

Download Full-text

An Anomaly Detection and Scenario Classification Scheme Based on Fuzzy C-means Clustering

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326773 ◽

2020 ◽

Author(s):

Shuyu Fan ◽

Yangzhao Li ◽

Mengfan Zhang ◽

Dongqin Feng ◽

Qingyun Chen ◽

...

Keyword(s):

Anomaly Detection ◽

Classification Scheme ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

A Study on Topic Modeling for Feature Space Reduction in Text Classification

Flexible Query Answering Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-27629-4_37 ◽

2019 ◽

pp. 403-412

Author(s):

Daniel Pfeifer ◽

Jochen L. Leidner

Keyword(s):

Text Classification ◽

Topic Modeling ◽

Feature Space ◽

Space Reduction

Download Full-text

A Novel Fuzzy Entropy-Based Method to Improve the Performance of the Fuzzy C-Means Algorithm

Electronics ◽

10.3390/electronics9040554 ◽

2020 ◽

Vol 9 (4) ◽

pp. 554 ◽

Cited By ~ 1

Author(s):

Barbara Cardone ◽

Ferdinando Di Martino

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Optimal Solution ◽

Fuzzy Entropy ◽

Entropy Function ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Random Initialization ◽

Fuzzy C Means Algorithm ◽

Optimal Cluster

One of the main drawbacks of the well-known Fuzzy C-means clustering algorithm (FCM) is the random initialization of the centers of the clusters as it can significantly affect the performance of the algorithm, thus not guaranteeing an optimal solution and increasing execution times. In this paper we propose a variation of FCM in which the initial optimal cluster centers are obtained by implementing a weighted FCM algorithm in which the weights are assigned by calculating a Shannon Fuzzy Entropy function. The results of the comparison tests applied on various classification datasets of the UCI Machine Learning Repository show that our algorithm improved in all cases relating to the performances of FCM.

Download Full-text

A Novel Semi-Supervised Fuzzy C-Means Clustering Algorithm Using Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a14090258 ◽

2021 ◽

Vol 14 (9) ◽

pp. 258

Author(s):

Tran Dinh Khang ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Machine Learning Techniques ◽

Unsupervised Machine Learning ◽

Practical Applications ◽

Fuzzy C Means ◽

Learning Techniques ◽

Fuzzy C Means Clustering ◽

Data Points ◽

Data Elements

Clustering is an unsupervised machine learning method with many practical applications that has gathered extensive research interest. It is a technique of dividing data elements into clusters such that elements in the same cluster are similar. Clustering belongs to the group of unsupervised machine learning techniques, meaning that there is no information about the labels of the elements. However, when knowledge of data points is known in advance, it will be beneficial to use a semi-supervised algorithm. Within many clustering techniques available, fuzzy C-means clustering (FCM) is a common one. To make the FCM algorithm a semi-supervised method, it was proposed in the literature to use an auxiliary matrix to adjust the membership grade of the elements to force them into certain clusters during the computation. In this study, instead of using the auxiliary matrix, we proposed to use multiple fuzzification coefficients to implement the semi-supervision component. After deriving the proposed semi-supervised fuzzy C-means clustering algorithm with multiple fuzzification coefficients (sSMC-FCM), we demonstrated the convergence of the algorithm and validated the efficiency of the method through a numerical example.

Download Full-text

An effective image retrieval system using machine learning and fuzzy c- means clustering approach

Multimedia Tools and Applications ◽

10.1007/s11042-019-08090-2 ◽

2019 ◽

Vol 79 (15-16) ◽

pp. 10123-10140 ◽

Cited By ~ 1

Author(s):

Lakshmi R. Nair ◽

Kamalraj Subramaniam ◽

G. K. D. Prasanna Venkatesan

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Retrieval System ◽

Fuzzy C Means ◽

Image Retrieval System ◽

Fuzzy C Means Clustering ◽

Clustering Approach

Download Full-text

Examination of fake news from a viral perspective: an interplay of emotions, resonance, and sentiments

Journal of Systems and Information Technology ◽

10.1108/jsit-11-2020-0257 ◽

2022 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Krishnadas Nanath ◽

Supriya Kaitheri ◽

Sonia Malik ◽

Shahid Mustafa

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Machine Learning Algorithms ◽

Fake News ◽

Linguistic Features ◽

Data Set ◽

Content Type

Purpose The purpose of this paper is to examine the factors that significantly affect the prediction of fake news from the virality theory perspective. The paper looks at a mix of emotion-driven content, sentimental resonance, topic modeling and linguistic features of news articles to predict the probability of fake news. Design/methodology/approach A data set of over 12,000 articles was chosen to develop a model for fake news detection. Machine learning algorithms and natural language processing techniques were used to handle big data with efficiency. Lexicon-based emotion analysis provided eight kinds of emotions used in the article text. The cluster of topics was extracted using topic modeling (five topics), while sentiment analysis provided the resonance between the title and the text. Linguistic features were added to the coding outcomes to develop a logistic regression predictive model for testing the significant variables. Other machine learning algorithms were also executed and compared. Findings The results revealed that positive emotions in a text lower the probability of news being fake. It was also found that sensational content like illegal activities and crime-related content were associated with fake news. The news title and the text exhibiting similar sentiments were found to be having lower chances of being fake. News titles with more words and content with fewer words were found to impact fake news detection significantly. Practical implications Several systems and social media platforms today are trying to implement fake news detection methods to filter the content. This research provides exciting parameters from a viral theory perspective that could help develop automated fake news detectors. Originality/value While several studies have explored fake news detection, this study uses a new perspective on viral theory. It also introduces new parameters like sentimental resonance that could help predict fake news. This study deals with an extensive data set and uses advanced natural language processing to automate the coding techniques in developing the prediction model.

Download Full-text

Enhancing an Evolving Tree-based text document visualization model with Fuzzy c-Means clustering

2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2013.6622363 ◽

2013 ◽

Cited By ~ 2

Author(s):

Wui Lee Chang ◽

Kai Meng Tay ◽

Chee Peng Lim

Keyword(s):

Fuzzy C Means ◽

Text Document ◽

Fuzzy C Means Clustering ◽

Evolving Tree ◽

Document Visualization

Download Full-text

Robust Semisupervised Kernelized Fuzzy Local Information C-Means Clustering for Image Segmentation

Mathematical Problems in Engineering ◽

10.1155/2020/5648206 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Yao Yang ◽

Chengmao Wu ◽

Yawen Li ◽

Shaoyu Zhang

Keyword(s):

Image Segmentation ◽

Fuzzy Clustering ◽

Clustering Algorithms ◽

Feature Space ◽

Iterative Solution ◽

Segmentation Algorithm ◽

Gaussian Kernel ◽

Fuzzy C Means ◽

Different Types ◽

Fuzzy C Means Clustering

To improve the effectiveness and robustness of the existing semisupervised fuzzy clustering for segmenting image corrupted by noise, a kernel space semisupervised fuzzy C-means clustering segmentation algorithm combining utilizing neighborhood spatial gray information with fuzzy membership information is proposed in this paper. The mean intensity information of neighborhood window is embedded into the objective function of the existing semisupervised fuzzy C-means clustering, and the Lagrange multiplier method is used to obtain its iterative expression corresponding to the iterative solution of the optimization problem. Meanwhile, the local Gaussian kernel function is used to map the pixel samples from the Euclidean space to the high-dimensional feature space so that the cluster adaptability to different types of image segmentation is enhanced. Experiment results performed on different types of noisy images indicate that the proposed segmentation algorithm can achieve better segmentation performance than the existing typical robust fuzzy clustering algorithms and significantly enhance the antinoise performance.

Download Full-text

Xác định đặc điểm tác giả bài viết diễn đàn tiếng Việt dựa trên âm tiết và vần

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol1.no37.355 ◽

2017 ◽

pp. 41

Author(s):

Duong Tran Duc ◽

Pham Bao Son ◽

Tan Hanh

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Linguistic Features ◽

Specific Approach ◽

Domain Specific ◽

Text Document ◽

Machine Learning Approach ◽

Author Profiling

Author profiling is the task of identifying characteristics of the author just based on a text document. In the previous works, there are a number of linguistic features such as character-based, word-based, grammar-based (often grouped as style-based), and content-based features (content words) have been exploited. The previous results showed that content-based features often achieved better results than style-based features. However, using content-based features is considered as a domain-specific approach, because the content words chosen often have meaning related to the studied domain. In this work, we investigate the use of syllables and rhymes as features for author profiling of Vietnamese text. They are parts of words, but have much less meaning than words, especially the rhymes. Therefore, these features can be considered much less domain-dependent than content words. We experimented on forum post datasets using machine learning approach. With improvement up to 8% compared with baseline results on style-based features, our method shows a new promising approach on author profiling.

Download Full-text