Is shallow semantic analysis really that shallow? A study on improving text classification performance

Online feedback is an effective way of communication between government departments and citizens. However, the daily high number of public feedbacks has increased the burden on government administrators. The deep learning method is good at automatically analyzing and extracting deep features of data, and then improving the accuracy of classification prediction. In this study, we aim to use the text classification model to achieve the automatic classification of public feedbacks to reduce the work pressure of administrator. In particular, a convolutional neural network model combined with word embedding and optimized by differential evolution algorithm is adopted. At the same time, we compared it with seven common text classification models, and the results show that the model we explored has good classification performance under different evaluation metrics, including accuracy, precision, recall, and F1-score.

Download Full-text

GILE: A Generalized Input-Label Embedding for Text Classification

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00259 ◽

2019 ◽

Vol 7 ◽

pp. 139-155 ◽

Cited By ~ 1

Author(s):

Nikolaos Pappas ◽

James Henderson

Keyword(s):

Text Classification ◽

Joint Space ◽

Classification Performance ◽

Cross Entropy ◽

Categorical Variables ◽

Classification Models ◽

Set Size ◽

Nonlinear Input ◽

Label Semantics

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.

Download Full-text

MULTI-LEVEL TEXT CLASSIFICATION METHOD BASED ON LATENT SEMANTIC ANALYSIS

Proceedings of the Ninth International Conference on Enterprise Information Systems ◽

10.5220/0002401703200324 ◽

2007 ◽

Keyword(s):

Text Classification ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Classification Method ◽

Multi Level

Download Full-text

Research of Text Classification Model Based on Latent Semantic Analysis and Improved HS-SVM

2010 2nd International Workshop on Intelligent Systems and Applications ◽

10.1109/iwisa.2010.5473702 ◽

2010 ◽

Author(s):

Yu-feng Zhang ◽

Chao He

Keyword(s):

Text Classification ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Classification Model ◽

Model Based

Download Full-text

An effective concept extraction method for improving text classification performance

Geo-spatial Information Science ◽

10.1007/bf02826953 ◽

2003 ◽

Vol 6 (4) ◽

pp. 66-72 ◽

Cited By ~ 6

Author(s):

Zhang Yuntao ◽

Gong Ling ◽

Wang Yongcheng ◽

Yin Zhonghang

Keyword(s):

Text Classification ◽

Extraction Method ◽

Classification Performance ◽

Concept Extraction

Download Full-text

Adapting Hidden Naive Bayes for Text Classification

Mathematics ◽

10.3390/math9192378 ◽

2021 ◽

Vol 9 (19) ◽

pp. 2378

Author(s):

Shengfeng Gan ◽

Shiqi Shao ◽

Long Chen ◽

Liangjun Yu ◽

Liangxiao Jiang

Keyword(s):

Text Classification ◽

Conditional Independence ◽

Structure Learning ◽

Naive Bayes ◽

Learning Algorithm ◽

Classification Performance ◽

Naïve Bayes ◽

Efficiency And Effectiveness ◽

The One ◽

Structure Extension

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Download Full-text

Efficient processing of GRU based on word embedding for text classification

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.3.4.289 ◽

2019 ◽

Vol 3 (4) ◽

Cited By ~ 2

Author(s):

Muhammad Zulqarnain ◽

Rozaida Ghazali ◽

Muhammad Ghulam Ghouse ◽

Muhammad Faheem Mushtaq

Keyword(s):

Language Processing ◽

Text Classification ◽

Classification Performance ◽

Word Embedding ◽

Training Data ◽

Superior Performance ◽

Sequential Data ◽

Online Data ◽

Benchmark Datasets ◽

Recurrent Architecture

Text classification has become very serious problem for big organization to manage the large amount of online data and has been extensively applied in the tasks of Natural Language Processing (NLP). Text classification can support users to excellently manage and exploit meaningful information require to be classified into various categories for further use. In order to best classify texts, our research efforts to develop a deep learning approach which obtains superior performance in text classification than other RNNs approaches. However, the main problem in text classification is how to enhance the classification accuracy and the sparsity of the data semantics sensitivity to context often hinders the classification performance of texts. In order to overcome the weakness, in this paper we proposed unified structure to investigate the effects of word embedding and Gated Recurrent Unit (GRU) for text classification on two benchmark datasets included (Google snippets and TREC). GRU is a well-known type of recurrent neural network (RNN), which is ability of computing sequential data over its recurrent architecture. Experimentally, the semantically connected words are commonly near to each other in embedding spaces. First, words in posts are changed into vectors via word embedding technique. Then, the words sequential in sentences are fed to GRU to extract the contextual semantics between words. The experimental results showed that proposed GRU model can effectively learn the word usage in context of texts provided training data. The quantity and quality of training data significantly affected the performance. We evaluated the performance of proposed approach with traditional recurrent approaches, RNN, MV-RNN and LSTM, the proposed approach is obtained better results on two benchmark datasets in the term of accuracy and error rate.

Download Full-text

A New Vector Representation of Short Texts for Classification

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/12 ◽

2019 ◽

Vol 17 (2) ◽

pp. 241-249

Author(s):

Yangyang Li ◽

Bo Liu

Keyword(s):

Text Classification ◽

Web Search ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Classification Performance ◽

New Method ◽

Data Sets ◽

Text Data ◽

Short Text ◽

Space Model

Short and sparse characteristics and synonyms and homonyms are main obstacles for short-text classification. In recent years, research on short-text classification has focused on expanding short texts but has barely guaranteed the validity of expanded words. This study proposes a new method to weaken these effects without external knowledge. The proposed method analyses short texts by using the topic model based on Latent Dirichlet Allocation (LDA), represents each short text by using a vector space model and presents a new method to adjust the vector of short texts. In the experiments, two open short-text data sets composed of google news and web search snippets are utilised to evaluate the classification performance and prove the effectiveness of our method.

Download Full-text

Exploring Binary Relations for Ontology Extension and Improved Adaptation to Clinical Text

10.1101/2020.12.04.411751 ◽

2020 ◽

Author(s):

Luke T Slater ◽

Robert Hoehndorf ◽

Andreas Karwath ◽

Georgios V Gkoutos

Keyword(s):

Text Mining ◽

Semantic Analysis ◽

Classification Performance ◽

Human Interaction ◽

Semantic Features ◽

Ontology Learning ◽

Disease Ontology ◽

Text Corpora ◽

Clinical Narrative ◽

Improved Performance

AbstractBackgroundThe controlled domain vocabularies provided by ontologies make them an indispensable tool for text mining. Ontologies also include semantic features in the form of taxonomy and axioms, which make annotated entities in text corpora useful for semantic analysis. Extending those semantic features may improve performance for characterisation and analytic tasks. Ontology learning techniques have previously been explored for novel ontology construction from text, though most recent approaches have focused on literature, with applications in information retrieval or human interaction tasks. We hypothesise that extension of existing ontologies using information mined from clinical narrative text may help to adapt those ontologies such that they better characterise those texts, and lead to improved classification performance.ResultsWe develop and present a framework for identifying new classes in text corpora, which can be integrated into existing ontology hierarchies. To do this, we employ the Stanford Open Information Extraction algorithm and integrate its implementation into the Komenti semantic text mining framework. To identify whether our approach leads to better characterisation of text, we present a case study, using the method to learn an adaptation to the Disease Ontology using text associated with a sample of 1,000 patient visits from the MIMIC-III critical care database. We use the adapted ontology to annotate and classify shared first diagnosis on patient visits with semantic similarity, revealing an improved performance over use of the base Disease Ontology on the set of visits the ontology was constructed from. Moreover, we show that the adapted ontology also improved performance for the same task over two additional unseen samples of 1,000 and 2,500 patient visits.ConclusionsWe report a promising new method for ontology learning and extension from text. We demonstrate that we can successfully use the method to adapt an existing ontology to a textual dataset, improving its ability to characterise the dataset, and leading to improved analytic performance, even on unseen portions of the dataset.

Download Full-text

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

TELKOMNIKA (Telecommunication Computing Electronics and Control) ◽

10.12928/telkomnika.v19i4.20369 ◽

2021 ◽

Vol 19 (4) ◽

pp. 1234

Author(s):

Reza Setiabudi ◽

Ni Made Satvika Iswari ◽

Andre Rusli

Keyword(s):

Text Classification ◽

Classification Performance

Download Full-text