Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.

Download Full-text

Research on Automatic Text Classification Algorithm Based on ITF-IDF and KNN

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.1830 ◽

2015 ◽

Vol 713-715 ◽

pp. 1830-1834

Author(s):

Rong Chen ◽

Feng Chen ◽

Yi Sun

Keyword(s):

Text Classification ◽

Information Filtering ◽

Classification Model ◽

Related Information ◽

Text Feature ◽

Extraction Algorithm ◽

Automatic Text Classification ◽

Text Features ◽

Automatic Text ◽

Better Than

We consider how to efficiently text classification on all pairs of documents. This information can be used to information retrieval, digital library, information filtering, and search engine, among others. This paper describes text classification model which based on KNN algorithm. The text feature extraction algorithm, TF-IDF, can loss related information between text features, an improved ITF-IDF algorithm has been presented in order to overcome it. Our experiments show that our algorithm is better than others.

Download Full-text

Using clustering to aid text classification of single-labelled datasets

10.12681/eadd/30839 ◽

2009 ◽

Author(s):

Αντωνία Κυριακοπούλου

Keyword(s):

Unsupervised Learning ◽

Text Classification ◽

Data Representation ◽

Classification Performance ◽

Critical Research ◽

Social Bookmarking ◽

Supervised And Unsupervised Learning ◽

Text Classifiers ◽

Concise Representation

Supervised and unsupervised learning have been the focus of critical research in the areas of machine learning and artificial intelligence. In the literature, these two streams flow independently of each other, despite their close conceptual and practical connections. This dissertation demonstrates that unsupervised learning algorithms, i.e. clustering, can provide us with valuable information about the data and help in the creation of high-accuracy text classifiers. In the case of clustering,the aim is to extract a kind of \structure" from a given sample of objects. The reasoning behind this is that if some structure exists in the objects, it is possible to take advantage of this information and find a short description of the data,exploiting the dependence or association between index terms and documents.This concise representation of the whole dataset can be properly incorporated in the existing data representation. The use of prior knowledge about the nature oft he dataset helps in building a more efficient classifier for this set. This approach does not capture all the intricacies of text; however on some domains this technique substantially improves text classification accuracy.In this vein, a study of the interaction between supervised and unsupervised learning has been carried out. We have studied and implemented models that apply clustering in multiple ways and in conjunction with classification to construct robust text classifiers. The extensive experimentation has shown the effectiveness of using clustering to boost text classification performance. Additionally, preliminary experiments on some of the most important applications of text classification such as Spam Mail Filtering, Spam Detection in Social Bookmarking Systems,and Sentence Boundary Disambiguation, have shown promising enhancements by exploiting the proposed models.

Download Full-text

Research on Multi-label Text Classification Method Based on tALBERT-CNN

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00055-4 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Wenfu Liu ◽

Jianmin Pang ◽

Nan Li ◽

Xin Zhou ◽

Feng Yue

Keyword(s):

Language Processing ◽

Text Classification ◽

Topic Model ◽

Classification Method ◽

Semantic Features ◽

Context Vector ◽

Text Information ◽

Massive Information ◽

Different Levels ◽

Better Than

AbstractSingle-label classification technology has difficulty meeting the needs of text classification, and multi-label text classification has become an important research issue in natural language processing (NLP). Extracting semantic features from different levels and granularities of text is a basic and key task in multi-label text classification research. A topic model is an effective method for the automatic organization and induction of text information. It can reveal the latent semantics of documents and analyze the topics contained in massive information. Therefore, this paper proposes a multi-label text classification method based on tALBERT-CNN: an LDA topic model and ALBERT model are used to obtain the topic vector and semantic context vector of each word (document), a certain fusion mechanism is adopted to obtain in-depth topic and semantic representations of the document, and the multi-label features of the text are extracted through the TextCNN model to train a multi-label classifier. The experimental results obtained on standard datasets show that the proposed method can extract multi-label features from documents, and its performance is better than that of the existing state-of-the-art multi-label text classification algorithms.

Download Full-text

Emotionally charged text classification with deep learning and sentiment semantic

Neural Computing and Applications ◽

10.1007/s00521-021-06542-1 ◽

2021 ◽

Author(s):

Jeow Li Huan ◽

Arif Ahmed Sekh ◽

Chai Quek ◽

Dilip K. Prasad

Keyword(s):

Language Processing ◽

Text Classification ◽

Classification Accuracy ◽

State Of The Art ◽

Document Representation ◽

Classical Technique ◽

Text Classifiers ◽

Vector Sequences ◽

Fully Connected ◽

Better Than

AbstractText classification is one of the widely used phenomena in different natural language processing tasks. State-of-the-art text classifiers use the vector space model for extracting features. Recent progress in deep models, recurrent neural networks those preserve the positional relationship among words achieve a higher accuracy. To push text classification accuracy even higher, multi-dimensional document representation, such as vector sequences or matrices combined with document sentiment, should be explored. In this paper, we show that documents can be represented as a sequence of vectors carrying semantic meaning and classified using a recurrent neural network that recognizes long-range relationships. We show that in this representation, additional sentiment vectors can be easily attached as a fully connected layer to the word vectors to further improve classification accuracy. On the UCI sentiment labelled dataset, using the sequence of vectors alone achieved an accuracy of 85.6%, which is better than 80.7% from ridge regression classifier—the best among the classical technique we tested. Additional sentiment information further increases accuracy to 86.3%. On our suicide notes dataset, the best classical technique—the Naíve Bayes Bernoulli classifier, achieves accuracy of 71.3%, while our classifier, incorporating semantic and sentiment information, exceeds that at 75% accuracy.

Download Full-text

Correlation-Guided Representation for Multi-Label Text Classification

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/463 ◽

2021 ◽

Author(s):

Qian-Wen Zhang ◽

Ximing Zhang ◽

Zhao Yan ◽

Ruifang Liu ◽

Yunbo Cao ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Low Frequency ◽

Classification Performance ◽

Categorical Variables ◽

Text Representation ◽

Label Semantics ◽

Higher Weights ◽

Label Correlations ◽

Text Information

Multi-label text classification is an essential task in natural language processing. Existing multi-label classification models generally consider labels as categorical variables and ignore the exploitation of label semantics. In this paper, we view the task as a correlation-guided text representation problem: an attention-based two-step framework is proposed to integrate text information and label semantics by jointly learning words and labels in the same space. In this way, we aim to capture high-order label-label correlations as well as context-label correlations. Specifically, the proposed approach works by learning token-level representations of words and labels globally through a multi-layer Transformer and constructing an attention vector through word-label correlation matrix to generate the text representation. It ensures that relevant words receive higher weights than irrelevant words and thus directly optimizes the classification performance. Extensive experiments over benchmark multi-label datasets clearly validate the effectiveness of the proposed approach, and further analysis demonstrates that it is competitive in both predicting low-frequency labels and convergence speed.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

Beyond response output: More logical than we think

Behavioral and Brain Sciences ◽

10.1017/s0140525x09000326 ◽

2009 ◽

Vol 32 (1) ◽

pp. 87-88 ◽

Cited By ~ 4

Author(s):

Wim De Neys

Keyword(s):

Bayesian Model ◽

Response Selection ◽

Data Fitting ◽

Output Data ◽

The Core ◽

Response Output ◽

Exclusive Focus ◽

Better Than

AbstractOaksford & Chater (O&C) rely on a data fitting approach to show that a Bayesian model captures the core reasoning data better than its logicist rivals. The problem is that O&C's modeling has focused exclusively on response output data. I argue that this exclusive focus is biasing their conclusions. Recent studies that focused on the processes that resulted in the response selection are more positive for the role of logic.

Download Full-text

DCNN-based Ship Classification using Enhanced Edge Information and Inception Module

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2022.66.3.030501 ◽

2021 ◽

Author(s):

Bo Wang ◽

Xiaoting Yu ◽

Chengeng Huang ◽

Qinghong Sheng ◽

Yuanyuan Wang ◽

...

Keyword(s):

Neural Networks ◽

Classification Performance ◽

Image Features ◽

Deep Convolutional Neural Networks ◽

Edge Information ◽

Average Accuracy ◽

Ship Classification ◽

Edge Features ◽

High Level ◽

Better Than

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.

Download Full-text

Enhancing Big Data Auditing

Computer and Information Science ◽

10.5539/cis.v11n1p90 ◽

2018 ◽

Vol 11 (1) ◽

pp. 90

Author(s):

Sara Alomari ◽

Mona Alghamdi ◽

Fahd S. Alotaibi

Keyword(s):

Big Data ◽

Research Area ◽

Provable Data Possession ◽

Integrity Verification ◽

The Core ◽

Outsourced Data ◽

Active Research ◽

Data Auditing ◽

Active Research Area ◽

Proof Of Retrievability

The auditing services of the outsourced data, especially big data, have been an active research area recently. Many schemes of remotely data auditing (RDA) have been proposed. Both categories of RDA, which are Provable Data Possession (PDP) and Proof of Retrievability (PoR), mostly represent the core schemes for most researchers to derive new schemes that support additional capabilities such as batch and dynamic auditing. In this paper, we choose the most popular PDP schemes to be investigated due to the existence of many PDP techniques which are further improved to achieve efficient integrity verification. We firstly review the work of literature to form the required knowledge about the auditing services and related schemes. Secondly, we specify a methodology to be adhered to attain the research goals. Then, we define each selected PDP scheme and the auditing properties to be used to compare between the chosen schemes. Therefore, we decide, if possible, which scheme is optimal in handling big data auditing.

Download Full-text

Text Classification of Public Feedbacks using Convolutional Neural Network Based on Differential Evolution Algorithm

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2019.1.3420 ◽

2019 ◽

Vol 14 (1) ◽

pp. 124-134 ◽

Cited By ~ 2

Author(s):

Shuai Zhang ◽

Yong Chen ◽

Xiaoling Huang ◽

Yishuai Cai

Keyword(s):

Neural Network ◽

Differential Evolution ◽

Convolutional Neural Network ◽

Text Classification ◽

Differential Evolution Algorithm ◽

Classification Performance ◽

Classification Model ◽

Evolution Algorithm ◽

Classification Prediction

Online feedback is an effective way of communication between government departments and citizens. However, the daily high number of public feedbacks has increased the burden on government administrators. The deep learning method is good at automatically analyzing and extracting deep features of data, and then improving the accuracy of classification prediction. In this study, we aim to use the text classification model to achieve the automatic classification of public feedbacks to reduce the work pressure of administrator. In particular, a convolutional neural network model combined with word embedding and optimized by differential evolution algorithm is adopted. At the same time, we compared it with seven common text classification models, and the results show that the model we explored has good classification performance under different evaluation metrics, including accuracy, precision, recall, and F1-score.

Download Full-text