An Enhanced Hybrid Feature Selection Technique using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Download Full-text

Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method

Journal of Physics Conference Series ◽

10.1088/1742-6596/1192/1/012025 ◽

2019 ◽

Vol 1192 ◽

pp. 012025

Author(s):

A Fauzi ◽

E B Setiawan ◽

Z K A Baizal

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Machine Method ◽

Inverse Document Frequency ◽

Support Vector Machine Method ◽

Term Frequency ◽

Document Frequency

Download Full-text

OPTIMIZATION OF LEAST SQUARES SUPPORT VECTOR MACHINE TECHNIQUE USING GENETIC ALGORITHM FOR ELECTROENCEPHALOGRAM MULTI-DIMENSIONAL SIGNALS

Jurnal Teknologi ◽

10.11113/jt.v78.8842 ◽

2016 ◽

Vol 78 (5-10) ◽

Cited By ~ 3

Author(s):

Farzana Kabir Ahmad ◽

Abdullah Yousef Awwad Al-Qammaz ◽

Yuhanis Yusof

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Feature Selection ◽

Computational Model ◽

Least Squares ◽

Support Vector ◽

Feature Selection Technique ◽

Selection Technique ◽

Human Emotion ◽

Valence And Arousal

Human-computer intelligent interaction (HCII) is a rising field of science that aims to refine and enhance the interaction between computer and human. Since emotion plays a vital role in human daily life, the ability of computer to interpret and response to human emotion is a crucial element for future intelligent system. Accordingly, several studies have been conducted to recognise human emotion using different technique such as facial expression, speech, galvanic skin response (GSR), or heart rate (HR). However, such techniques have problems mainly in terms of credibility and reliability as people can fake their feeling and response. Electroencephalogram (EEG) on the other has shown to be a very effective way in recognising human emotion as this technique records the brain activity of human and they can hardly be deceived by voluntary control. Regardless the popularity of EEG in recognizing human emotion, this study field is relatively challenging as EEG signal is nonlinear, involves myriad factors and chaotic in nature. These issues have led to high dimensional problem and poor classification results. To address such problems, this study has proposed a novel computational model, which consist of three main stages, namely a) feature extraction; b) feature selection and c) classifier. Discrete wavelet packet transform (DWPT) has been used to extract EEG signals feature and ultimately 204,800 features from 32 subject-independent have been obtained. Meanwhile, Genetic Algorithm (GA) and Least squares support vector machine (LS-SVM) have been used as a feature selection technique and classifier respectively. This computational model is tested on the common DEAP pre-processed EEG dataset in order to classify three levels of valence and arousal. The empirical results have shown that the proposed GA-LSSVM, has improved the classification results to 49.22% and 54.83% for valence and arousal respectively, whereas is it observed that 46.33% of valence and 48.30% of arousal classification were achieved when no feature selection technique is applied on the identical classifier

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Term Frequency-Inverse Document Frequency Answer Categorization with Support Vector Machine on Automatic Short Essay Grading System with Latent Semantic Analysis for Japanese Language

2019 International Conference on Electrical Engineering and Computer Science (ICECOS) ◽

10.1109/icecos47637.2019.8984530 ◽

2019 ◽

Author(s):

Anak Agung Putri Ratna ◽

Aaliyah Kaltsum ◽

Lea Santiar ◽

Hanifah Khairunissa ◽

Ihsan Ibrahim ◽

...

Keyword(s):

Support Vector Machine ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Support Vector ◽

Grading System ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Essay Grading ◽

Short Essay

Download Full-text

Analisis Sentimen Opini Pemindahan Ibu Kota Pada Twitter Dengan Metode Support Vector Machine

Jurnal Ilmu Komputer ◽

10.24843/jik.2021.v14.i01.p06 ◽

2021 ◽

Vol 14 (1) ◽

pp. 49

Author(s):

Tezza Fazar Tri Hidayat ◽

Garno Garno ◽

Azhari Ali Ridha

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Support Vector ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Relokasi ibu kota Indonesia kini telah diresmikan oleh Presiden Joko Widodo pada 26 Agustus 2019 ke Kalimantan, ini adalah sejarah baru dalam sejarah Indonesia karena belum pernah terjadi sebelumnya, sehingga memunculkan banyak pendapat atau tanggapan dari masyarakat. Analisis sentimen adalah kegiatan yang digunakan untuk menganalisis pendapat atau opini seseorang tentang suatu topik. Twitter adalah media sosial yang digunakan untuk mengekspresikan pendapat pengguna dan menyatukannya pada suatu topik. Support Vector Machine adalah metode text mining yang mencakup metode klasifikasi dan Term Frequency - Inverse Document Frequency adalah metode pembobotan karakter. SVM dan TF-IDF dapat digunakan untuk menganalisis sentimen opini publik tentang topik pemindahan ibukota Indonesia. Tujuan dari penelitian ini adalah untuk mengklasifikasikan opini publik tentang topik memindahkan Ibu Kota Indonesia dari ribuan tweet yang telah dikumpulkan dan disaring. Tweet pada dari 22-29 Maret 2020 telah diproses menjadi 992 tweet dan terdiri dari 221 data dengan label positif dan 771 data negatif. Dan menggunakan metode SVM yang memiliki akurasi 77,72% dan dikombinasikan dengan TFIDF yang meningkatkan akurasinya menjadi 78,33%.

Download Full-text

Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine

2021 32nd Irish Signals and Systems Conference (ISSC) ◽

10.1109/issc52156.2021.9467842 ◽

2021 ◽

Author(s):

Ankitkumar Patel ◽

Kevin Meehan

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Support Vector ◽

Fake News ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Download Full-text

Predicting Tourists’ Behavior Of Virtual Museum Using Support Vector Machine With Feature Selection Technique

2018 International Conference on Machine Learning and Cybernetics (ICMLC) ◽

10.1109/icmlc.2018.8526959 ◽

2018 ◽

Author(s):

Krit Sriporn ◽

Cheng-Fa Tsai

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Support Vector ◽

Virtual Museum ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Klasifikasi Topik Keluhan Pelanggan Berdasarkan Tweet dengan Menggunakan Penggabungan Feature Hasil Ekstraksi pada Metode Support Vector Machine (SVM)

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v1i2.11023 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 1

Author(s):

Enda Esyudha Pratama ◽

Bambang Riyanto Trilaksono

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Customer Service ◽

Information Gain ◽

Support Vector ◽

Chi Square ◽

Term Frequency ◽

Document Frequency

Pemanfaatan twitter sebagai layanan customer serevice perusahaan sudah mulai banyak digunakan, tak terkecuali Speedy. Mekanisme yang ada saat ini untuk proses klasifikasi bentuk dan jenis keluhan serta informasi tentang jumlah keluhan lewat twitter masih dilakukan secara manual. Belum lagi data twitter yang bersifat tidak terstruktur tentunya akan menyulitkan untuk dilakukan analisa dan penggalian informasi dari data tersebut. Berdasarkan permasalahan tersebut, penelitian ini bertujuan untuk memproses data teks dari tweet pengguna twitteryang masuk ke akun @TelkomSpeedy untuk diolah menjadi informasi. Informasi tersebut nantinya digunakan untuk klasifikasi bentuk dan jenis keluhan. Merujuk pada beberapa penelitian terkait, salah satu metode klasifikasi yang paling baik untuk digunakan adalah metode Support Vector Machine (SVM). Konsep dari SVM dapat dijelaskan secara sederhana sebagai usaha mencari hyperplane yang dapat memisahkan dataset sesuai dengan kelasnya. Kelas yang digunakan dalam penelitian kali ini berdasarkan topik keluhan pelanggan yaitu billing, pemasangan/instalasi, putus (disconnect), dan lambat. Faktor penting lainnya dalam hal klasifikasi adalah penentuan feature atau atribut kata yang akan digunakan. Metode feature selection yang digunakan pada penlitian ini adalah term frequency (TF), document frequency (DF), information gain, dan chi-square. Pada penelitian ini juga dilakukan metode penggabungan feature yang telah dihasilkan dari beberapa metode feature selection sebelumnya. Dari hasil penelitian menunjukan bahwa SVM mampu melakukan klasifikasi keluhan dengan baik, hal ini dibuktikan dengan akurasi 82,50% untuk klasifikasi bentuk keluhan dan 86,67% untuk klasifikasi jenis keluhan. Sedangkan untuk kombinasi penggunaan feature dapat meningkatkan akurasi menjadi 83,33% untuk bentuk keluhan dan 89,17% untuk jenis keluhan. Kata Kunci—customer service, klasifikasi topik keluhan, penggabungan feature, support vector machine

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text