Text classification based on gated recurrent unit combines with support vector machine

As the amount of unstructured text data that humanity produce largely and a lot of texts are grows on the Internet, so the one of the intelligent technique is require processing it and extracting different types of knowledge from it. Gated recurrent unit (GRU) and support vector machine (SVM) have been successfully used to Natural Language Processing (NLP) systems with comparative, remarkable results. GRU networks perform well in sequential learning tasks and overcome the issues of “vanishing and explosion of gradients in standard recurrent neural networks (RNNs) when captureing long-term dependencies. In this paper, we proposed a text classification model based on improved approaches to this norm by presenting a linear support vector machine (SVM) as the replacement of Softmax in the final output layer of a GRU model. Furthermore, the cross-entropy function shall be replaced with a margin-based function. Empirical results present that the proposed GRU-SVM model achieved comparatively better results than the baseline approaches BLSTM-C, DABN.

Download Full-text

Opinion mining on newspaper headlines using SVM and NLP

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i3.pp2152-2163 ◽

2019 ◽

Vol 9 (3) ◽

pp. 2152 ◽

Cited By ~ 1

Author(s):

Chaudhary Jashubhai Rameshbhai ◽

Joy Paulose

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Language Processing ◽

Opinion Mining ◽

Confusion Matrix ◽

Support Vector ◽

Text Data ◽

Mining Technique ◽

Svm Model ◽

Linear Svm

Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.

Download Full-text

Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-formal Indonesian Conversation

10.21203/rs.3.rs-41431/v2 ◽

2020 ◽

Author(s):

Rianto Rianto ◽

Achmad Benny Mutiara ◽

Eri Prasetyo Wibowo ◽

Paulus Insap Santosa

Keyword(s):

Support Vector Machine ◽

Information Retrieval ◽

Text Classification ◽

Experimental Evaluation ◽

Hate Speech ◽

Text Processing ◽

High Accuracy ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Text Data

Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.

Download Full-text

Drug Target Group Prediction with Multiple Drug Networks

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666190702103927 ◽

2020 ◽

Vol 23 (4) ◽

pp. 274-284 ◽

Cited By ~ 12

Author(s):

Jingang Che ◽

Lei Chen ◽

Zi-Han Guo ◽

Shuaiqun Wang ◽

Aorigele

Keyword(s):

Drug Target ◽

Low Cost ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Multiple Drug ◽

Property A ◽

Multiple Networks ◽

Proposed Model ◽

The One

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.

Download Full-text

Predicting Hub Genes of Glioblastomas Based on Support Vector Machine Combined with CFS algorithms

Current Bioinformatics ◽

10.2174/1574893615999200819162140 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chun Qiu ◽

Sai Li ◽

Shenghui Yang ◽

Lin Wang ◽

Aihui Zeng ◽

...

Keyword(s):

Support Vector Machine ◽

Expression Profiles ◽

Independent Set ◽

Classification Model ◽

Support Vector ◽

Feature Subset ◽

Hub Genes ◽

Effective Prevention ◽

Key Genes ◽

Control Samples

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.

Download Full-text

Text classification on the Instagram caption using support vector machine

Journal of Physics Conference Series ◽

10.1088/1742-6596/1722/1/012023 ◽

2021 ◽

Vol 1722 ◽

pp. 012023

Author(s):

P P Ramadhani ◽

S Hadi

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Support Vector

Download Full-text

A Support Vector Machine Classification Model for Benzo[c]phenathridine Analogues with Topoisomerase-I Inhibitory Activity

Molecules ◽

10.3390/molecules17044560 ◽

2012 ◽

Vol 17 (4) ◽

pp. 4560-4582 ◽

Cited By ~ 2

Author(s):

Khac-Minh Thai ◽

Thuy-Quyen Nguyen ◽

Trieu-Du Ngo ◽

Thanh-Dao Tran ◽

Thi-Ngoc-Phuong Huynh

Keyword(s):

Support Vector Machine ◽

Inhibitory Activity ◽

Topoisomerase I ◽

Classification Model ◽

Support Vector ◽

Support Vector Machine Classification

Download Full-text

KLASIFIKASI TEKS SOSIAL MEDIA TWITTER MENGGUNAKAN SUPPORT VECTOR MACHINE (Studi Kasus Penusukan Wiranto)

Jurnal Informatika dan Rekayasa Elektronik ◽

10.36595/jire.v2i2.117 ◽

2019 ◽

Vol 2 (2) ◽

pp. 43

Author(s):

Lalu Mutawalli ◽

Mohammad Taufan Asri Zaen ◽

Wire Bagye

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Big Data ◽

Mass Communication ◽

Confusion Matrix ◽

Classification Model ◽

Support Vector ◽

Large Set ◽

Appearance Time ◽

Data Production

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.

Download Full-text

Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.3499 ◽

2014 ◽

Vol 8 (1) ◽

pp. 91 ◽

Cited By ~ 5

Author(s):

Noviah Dwi Putranti ◽

Edi Winarko

Keyword(s):

Support Vector Machine ◽

Maximum Entropy ◽

Social Networking Site ◽

Training Data ◽

Classification Model ◽

Support Vector ◽

Public Sentiment ◽

Pos Tagger ◽

Negative Sentiment ◽

Bahasa Indonesia

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif. Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 % pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%. Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter. AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.

Download Full-text

Enhancing the performance of cancer text classification model based on cancer hallmarks

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp316-323 ◽

2021 ◽

Vol 10 (2) ◽

pp. 316

Author(s):

Noha Ali ◽

Ahmed H. AbuEl-Atta ◽

Hala H. Zayed

Keyword(s):

Neural Network ◽

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Classification Model ◽

Biomedical Text ◽

Cancer Hallmarks ◽

Embedding Technique ◽

Proposed Model ◽

Biomedical Text Classification

Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the convolutional neural network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the recurrent neural network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.

Download Full-text

Text Classification Using Support Vector Machine with Mixture of Kernel

Journal of Software Engineering and Applications ◽

10.4236/jsea.2012.512b012 ◽

2012 ◽

Vol 05 (12) ◽

pp. 55-58 ◽

Cited By ~ 12

Author(s):

Liwei Wei ◽

Bo Wei ◽

Bin Wang

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Support Vector

Download Full-text