Centroid Estimation Based on Symmetric KL Divergence for Multinomial Text Classification Problem

2014 ◽

Vol 513-517 ◽

pp. 2394-2397

Author(s):

Hong Biao Xie ◽

Hong Jun Qiu

Keyword(s):

Public Opinion ◽

Social Network ◽

Text Classification ◽

Social Groups ◽

Classification Problem ◽

Support Vector ◽

Social Stability ◽

Social Phenomena ◽

Svm Algorithm ◽

Network Link

Public opinion refers to the certain social groups subjective reflection of certain social phenomena and reality within a period of time. The important measures to maintain social stability and the ruling party's ruling safety are to instantly master the dynamic public opinion and to actively guide social public opinion. In this paper, the author found the model of social network public opinion hotspot issues. The SVM algorithm is adopted to improve the information processing and analysis testing, effectively resolving the text classification problem. It verifies that this method plays an important role in the hot issues analyses of the network link.

Download Full-text

Multiple weak supervision for short text classification

Applied Intelligence ◽

10.1007/s10489-021-02958-3 ◽

2022 ◽

Author(s):

Li-Ming Chen ◽

Bao-Xin Xiu ◽

Zhao-Yun Ding

Keyword(s):

Text Classification ◽

Classification Problem ◽

Experimental Results ◽

Prior Work ◽

Weak Supervision ◽

Short Text ◽

Imbalanced Classification ◽

Distant Supervision ◽

Synthetic Datasets ◽

Independent Model

AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE

Jurnal Teknologi ◽

10.11113/jt.v78.9553 ◽

2016 ◽

Vol 78 (8-2) ◽

Author(s):

Siti Sakira Kamaruddin ◽

Yuhanis Yusof ◽

Husniza Husni ◽

Mohammad Hayel Al Refai

Keyword(s):

Text Classification ◽

Association Rule ◽

Classification Accuracy ◽

Classification Problem ◽

Frequent Pattern ◽

Associative Classification ◽

Vertical Data ◽

Rule Method ◽

Class Association Rule ◽

Two Stages

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM FOR TEXT CLASSIFICATION BASED ON TFIDF-WEIGHT AND KL-DIVERGENCE

Proceedings of the 11th Joint International Computer Conference ◽

10.1142/9789812701534_0099 ◽

2005 ◽

Cited By ~ 3

Author(s):

Baoyi WANG ◽

Shaomin ZHANG

Keyword(s):

Feature Selection ◽

Text Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Kl Divergence

Download Full-text

Ant colony algorithm for text classification in multicore-multithread environment

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i3.pp1359-1366 ◽

2020 ◽

Vol 18 (3) ◽

pp. 1359

Author(s):

Ahmad Nazmi Fadzal ◽

Mazidah Puteh ◽

Nurazzah Abd Rahman

Keyword(s):

Text Classification ◽

Execution Time ◽

Ant Colony Algorithm ◽

Classification Problem ◽

Ant Colony ◽

Main Concept ◽

Processing Power ◽

Artificial Ants ◽

Positive Time ◽

Time Reduction

This paper presents about Ant Colony Algorithm (ACO) for Text Classification in Multicore-Multithread Environment in Artificial Intelligent domain. We had develop a software which assimilate concurrency concept to multiple artificial ants. Pheromone in ACO is the main concept used to solve the text classification problem. In regards to its role, pheromone value is changed depending on the solution finding that has been discovered at the pseudo random heuristic attempt in selecting path from text words. However, ACO can take up longer time to process larger training document. Based on the cooperative concept of ants living in colony, the ACO part is examined to work in multicore-multithread environment as to cater additional execution time benefit. In running multicore-multithread environment, the modification aims to make artificial ants actively communicate between multiple physical cores of processor. The execution time reduction is expected to show an improvement without compromising the original classification accuracy by the investment of trading on more processing power. The single and multicore-multithreaded version of ACO was compared statistically by conduction relevant test. It was found that the result shows a positive time reduction improvement.

Download Full-text

A comparative study on feature selection in Chinese text classification problem

2012 First National Conference for Engineering Sciences (FNCES 2012) ◽

10.1109/nces.2012.6544065 ◽

2012 ◽

Author(s):

Hu Li ◽

Peng Zou ◽

Weihong Han

Keyword(s):

Feature Selection ◽

Comparative Study ◽

Chinese Text ◽

Text Classification ◽

Classification Problem ◽

Chinese Text Classification

Download Full-text

Research on the internal influence factors of the text multi-classification problem

MATEC Web of Conferences ◽

10.1051/matecconf/201817303072 ◽

2018 ◽

Vol 173 ◽

pp. 03072

Author(s):

Wu Mingqiang ◽

Furong Chang ◽

Kui Zhang

Keyword(s):

Text Classification ◽

Optical Network ◽

Influence Factors ◽

Lower Class ◽

Classification Problem ◽

Classification Method ◽

Internal Factors ◽

Text Type ◽

Class Definition

This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.

Download Full-text

Association rules of fuzzy soft set based classification for text classification problem

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2020.03.014 ◽

2020 ◽

Author(s):

Dede Rohidin ◽

Noor A. Samsudin ◽

Mustafa Mat Deris

Keyword(s):

Association Rules ◽

Text Classification ◽

Classification Problem ◽

Soft Set ◽

Fuzzy Soft Set

Download Full-text

Automated ICD Coding Using Deep Learning

10.4018/978-1-7998-7709-7.ch007 ◽

2022 ◽

pp. 115-127

Author(s):

Sagar Sudhir Dhobale ◽

Sharda Bapat

Keyword(s):

Deep Learning ◽

Text Classification ◽

Medical Information ◽

Patient Discharge ◽

Classification Problem ◽

International Classification Of Diseases ◽

Medical Coding ◽

Huge Data ◽

Classification Of Diseases ◽

Discharge Summaries

ICD (international classification of diseases) is a system developed by the WHO in which every unique diagnosis and procedure has a unique code. It provides a standardized way to represent medical information and makes it sharable and comparable across different hospitals and countries. Currently, the task of assigning ICD codes to patient discharge summaries is performed manually by medical coders. Manual coding is costly, time consuming, and inefficient for huge data. So, the healthcare industry requires automated solutions to make the medical coding more efficient, accurate, and consistent. In this study, the automated ICD-9 coding is approached as a multi-label text classification problem. A deep learning system is presented to assign ICD-9 codes automatically to the patient discharge summaries. Convolutional neural networks and word2vec model are combined to automatically extract features from the input text. The best model has achieved 83.28% accuracy. The results of this research prove the usability of deep learning for multi-label text classification and medical coding.

Download Full-text

Centroid Estimation Based on Symmetric KL Divergence for Multinomial Text Classification Problem

The Algorithm Study of Support Vector Machine Based on Social Network of Public Opinions

Multiple weak supervision for short text classification

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE

A NOVEL FEATURE SELECTION ALGORITHM FOR TEXT CLASSIFICATION BASED ON TFIDF-WEIGHT AND KL-DIVERGENCE

Ant colony algorithm for text classification in multicore-multithread environment

A comparative study on feature selection in Chinese text classification problem

Research on the internal influence factors of the text multi-classification problem

Association rules of fuzzy soft set based classification for text classification problem

Automated ICD Coding Using Deep Learning

Export Citation Format