Text Classification Model Based on BERT-Capsule with Integrated Deep Learning

Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imbalance of tweet data sets, we additionally employ the background knowledge for each tweet corpus: (1) CVE data set for CSI-positive corpus and (2) Wikitext data set for CSI-negative corpus. Second, we adopt the deep learning models such as CNN or LSTM to extract adequate feature vectors from the embedding models and integrate the feature vectors into one classifier. To validate the effectiveness of the proposed model, we compare our method with two baseline classification models: (1) a model based on a single embedding model constructed with CSI-positive corpus only and (2) another model with CSI-negative corpus only. As a result, we indicate that the proposed model shows high accuracy, i.e., 0.934 of F1-score and 0.935 of area under the curve (AUC), which improves the baseline models by 1.76∼6.74% of F1-score and by 1.64∼6.98% of AUC.

Download Full-text

Hybrid Chinese text classification model based on pretraining model

Journal of Physics Conference Series ◽

10.1088/1742-6596/1961/1/012002 ◽

2021 ◽

Vol 1961 (1) ◽

pp. 012002

Author(s):

Xing Zhaoye ◽

Liu Xiaoqun ◽

Sun Peijie

Keyword(s):

Chinese Text ◽

Text Classification ◽

Classification Model ◽

Chinese Text Classification ◽

Model Based

Download Full-text

Chinese Text Classification Model Based on Deep Learning

Future Internet ◽

10.3390/fi10110113 ◽

2018 ◽

Vol 10 (11) ◽

pp. 113 ◽

Cited By ~ 17

Author(s):

Yue Li ◽

Xutao Wang ◽

Pengjian Xu

Keyword(s):

Neural Network ◽

Deep Learning ◽

Language Processing ◽

Chinese Text ◽

Text Classification ◽

Short Term Memory ◽

Classification Model ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.

Download Full-text

A New Classification Model Based on Stacknet and Deep Learning for Fast Detection of COVID 19 Through X Rays Images

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268777 ◽

2020 ◽

Author(s):

Jalal RABBAH ◽

Mohammed RIDOUANI ◽

Larbi HASSOUNI

Keyword(s):

Deep Learning ◽

Classification Model ◽

X Rays ◽

New Classification ◽

Fast Detection ◽

Model Based

Download Full-text

A Complaint Text Classification Model Based on Character-Level Convolutional Network

2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2018.8663873 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xuesong Tong ◽

Bin Wu ◽

Shuyang Wang ◽

Jinna Lv

Keyword(s):

Text Classification ◽

Classification Model ◽

Convolutional Network ◽

Model Based

Download Full-text

Deep Learning-based Text Classification Model for Poisonous Clauses Classification

Journal of KIISE ◽

10.5626/jok.2020.47.11.1054 ◽

2020 ◽

Vol 47 (11) ◽

pp. 1054-1060

Author(s):

Gihyeon Choi ◽

Youngjin Jang ◽

Harksoo Kim ◽

Kwanwoo Kim

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Model

Download Full-text

Research of Text Classification Model Based on Latent Semantic Analysis and Improved HS-SVM

2010 2nd International Workshop on Intelligent Systems and Applications ◽

10.1109/iwisa.2010.5473702 ◽

2010 ◽

Author(s):

Yu-feng Zhang ◽

Chao He

Keyword(s):

Text Classification ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Classification Model ◽

Model Based

Download Full-text

An ensemble information retrieval method for the biomedical domain (Preprint)

10.2196/preprints.28272 ◽

2021 ◽

Author(s):

Zhiqiang Liu ◽

Jingkun Feng ◽

Zhihao Yang ◽

Lei Wang

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Text Classification ◽

Query Expansion ◽

Ensemble Method ◽

Classification Model ◽

Retrieval Performance ◽

Matching Model ◽

Ranking List ◽

Initial Retrieval

BACKGROUND With the development of biomedicine, the number of biomedical documents has increased rapidly, which brings a great challenge for researchers retrieving the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in some specific retrieval tasks and thereby increases the difficulty of biomedical information retrieval. OBJECTIVE This study aims to find a more systematic method to retrieve relevant scientific literature for a given patient. METHODS In the initial retrieval stage, we supplement query terms through query expansion strategies and apply query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employ a text classification model and relevance matching model to evaluate documents respectively from different dimensions, then we combine the outputs through logistic regression to re-rank all the documents from the initial ranking list. RESULTS The proposed ensemble method contributes to the improvement of biomedical retrieval performance. Comparing with the existing deep learning-based methods, experimental results show that our method achieves state-of-the-art performance on the data collection provided by TREC 2019 Precision Medicine Track. CONCLUSIONS In this paper, we propose a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and the relevance matching model can better capture semantic context information and improve retrieval performance.

Download Full-text