scholarly journals Centroid Estimation Based on Symmetric KL Divergence for Multinomial Text Classification Problem

Author(s):  
Jiangning Chen ◽  
Heinrich Matzinger ◽  
Haoyan Zhai ◽  
Mi Zhou
2014 ◽  
Vol 513-517 ◽  
pp. 2394-2397
Author(s):  
Hong Biao Xie ◽  
Hong Jun Qiu

Public opinion refers to the certain social groups subjective reflection of certain social phenomena and reality within a period of time. The important measures to maintain social stability and the ruling party's ruling safety are to instantly master the dynamic public opinion and to actively guide social public opinion. In this paper, the author found the model of social network public opinion hotspot issues. The SVM algorithm is adopted to improve the information processing and analysis testing, effectively resolving the text classification problem. It verifies that this method plays an important role in the hot issues analyses of the network link.


Author(s):  
Li-Ming Chen ◽  
Bao-Xin Xiu ◽  
Zhao-Yun Ding

AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.


2022 ◽  
pp. 171-195
Author(s):  
Jale Bektaş

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.


2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Siti Sakira Kamaruddin ◽  
Yuhanis Yusof ◽  
Husniza Husni ◽  
Mohammad Hayel Al Refai

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.


Author(s):  
Ahmad Nazmi Fadzal ◽  
Mazidah Puteh ◽  
Nurazzah Abd Rahman

This paper presents about Ant Colony Algorithm (ACO) for Text Classification in Multicore-Multithread Environment in Artificial Intelligent domain. We had develop a software which assimilate concurrency concept to multiple artificial ants. Pheromone in ACO is the main concept used to solve the text classification problem. In regards to its role, pheromone value is changed depending on the solution finding that has been discovered at the pseudo random heuristic attempt in selecting path from text words. However, ACO can take up longer time to process larger training document. Based on the cooperative concept of ants living in colony, the ACO part is examined to work in multicore-multithread environment as to cater additional execution time benefit. In running multicore-multithread environment, the modification aims to make artificial ants actively communicate between multiple physical cores of processor. The execution time reduction is expected to show an improvement without compromising the original classification accuracy by the investment of trading on more processing power. The single and multicore-multithreaded version of ACO was compared statistically by conduction relevant test. It was found that the result shows a positive time reduction improvement.


2018 ◽  
Vol 173 ◽  
pp. 03072
Author(s):  
Wu Mingqiang ◽  
Furong Chang ◽  
Kui Zhang

This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.


2022 ◽  
pp. 115-127
Author(s):  
Sagar Sudhir Dhobale ◽  
Sharda Bapat

ICD (international classification of diseases) is a system developed by the WHO in which every unique diagnosis and procedure has a unique code. It provides a standardized way to represent medical information and makes it sharable and comparable across different hospitals and countries. Currently, the task of assigning ICD codes to patient discharge summaries is performed manually by medical coders. Manual coding is costly, time consuming, and inefficient for huge data. So, the healthcare industry requires automated solutions to make the medical coding more efficient, accurate, and consistent. In this study, the automated ICD-9 coding is approached as a multi-label text classification problem. A deep learning system is presented to assign ICD-9 codes automatically to the patient discharge summaries. Convolutional neural networks and word2vec model are combined to automatically extract features from the input text. The best model has achieved 83.28% accuracy. The results of this research prove the usability of deep learning for multi-label text classification and medical coding.


Sign in / Sign up

Export Citation Format

Share Document