Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Scientific Journal of Informatics ◽

10.15294/sji.v6i1.14244 ◽

2019 ◽

Vol 6 (1) ◽

pp. 138-149

Author(s):

Ukhti Ikhsani Larasati ◽

Much Aziz Muslim ◽

Riza Arifudin ◽

Alamsyah Alamsyah

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Mining ◽

Sentiment Analysis ◽

Feature Weighting ◽

Support Vector ◽

Chi Square ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Download Full-text

Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method

Journal of Physics Conference Series ◽

10.1088/1742-6596/1192/1/012025 ◽

2019 ◽

Vol 1192 ◽

pp. 012025

Author(s):

A Fauzi ◽

E B Setiawan ◽

Z K A Baizal

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Machine Method ◽

Inverse Document Frequency ◽

Support Vector Machine Method ◽

Term Frequency ◽

Document Frequency

Download Full-text

Term Frequency-Inverse Document Frequency Answer Categorization with Support Vector Machine on Automatic Short Essay Grading System with Latent Semantic Analysis for Japanese Language

2019 International Conference on Electrical Engineering and Computer Science (ICECOS) ◽

10.1109/icecos47637.2019.8984530 ◽

2019 ◽

Author(s):

Anak Agung Putri Ratna ◽

Aaliyah Kaltsum ◽

Lea Santiar ◽

Hanifah Khairunissa ◽

Ihsan Ibrahim ◽

...

Keyword(s):

Support Vector Machine ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Support Vector ◽

Grading System ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Essay Grading ◽

Short Essay

Download Full-text

An Enhanced Hybrid Feature Selection Technique using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification

IEEE Access ◽

10.1109/access.2021.3069001 ◽

2021 ◽

pp. 1-1

Author(s):

Nur Syafiqah Mohd Nafis ◽

Suryanti Awang

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Sentiment Classification ◽

Recursive Feature Elimination ◽

Support Vector ◽

Feature Selection Technique ◽

Inverse Document Frequency ◽

Selection Technique ◽

Term Frequency ◽

Document Frequency

Download Full-text

Analisis Sentimen Opini Pemindahan Ibu Kota Pada Twitter Dengan Metode Support Vector Machine

Jurnal Ilmu Komputer ◽

10.24843/jik.2021.v14.i01.p06 ◽

2021 ◽

Vol 14 (1) ◽

pp. 49

Author(s):

Tezza Fazar Tri Hidayat ◽

Garno Garno ◽

Azhari Ali Ridha

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Support Vector ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Relokasi ibu kota Indonesia kini telah diresmikan oleh Presiden Joko Widodo pada 26 Agustus 2019 ke Kalimantan, ini adalah sejarah baru dalam sejarah Indonesia karena belum pernah terjadi sebelumnya, sehingga memunculkan banyak pendapat atau tanggapan dari masyarakat. Analisis sentimen adalah kegiatan yang digunakan untuk menganalisis pendapat atau opini seseorang tentang suatu topik. Twitter adalah media sosial yang digunakan untuk mengekspresikan pendapat pengguna dan menyatukannya pada suatu topik. Support Vector Machine adalah metode text mining yang mencakup metode klasifikasi dan Term Frequency - Inverse Document Frequency adalah metode pembobotan karakter. SVM dan TF-IDF dapat digunakan untuk menganalisis sentimen opini publik tentang topik pemindahan ibukota Indonesia. Tujuan dari penelitian ini adalah untuk mengklasifikasikan opini publik tentang topik memindahkan Ibu Kota Indonesia dari ribuan tweet yang telah dikumpulkan dan disaring. Tweet pada dari 22-29 Maret 2020 telah diproses menjadi 992 tweet dan terdiri dari 221 data dengan label positif dan 771 data negatif. Dan menggunakan metode SVM yang memiliki akurasi 77,72% dan dikombinasikan dengan TFIDF yang meningkatkan akurasinya menjadi 78,33%.

Download Full-text

Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine

2021 32nd Irish Signals and Systems Conference (ISSC) ◽

10.1109/issc52156.2021.9467842 ◽

2021 ◽

Author(s):

Ankitkumar Patel ◽

Kevin Meehan

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Support Vector ◽

Fake News ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Download Full-text

Phisher Fighter: Website Phishing Detection System Based on URL and Term Frequency-Inverse Document Frequency Values

Journal of Cyber Security and Mobility ◽

10.13052/jcsm2245-1439.1114 ◽

2021 ◽

Author(s):

E. Sri Vishva ◽

D. Aju

Keyword(s):

Machine Learning ◽

Detection System ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Sensitive Information ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Single Piece ◽

Phishing Detection

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.

Download Full-text

A Two Layer Machine Learning System for Intrusion Detection Based on Random Forest and Support Vector Machine

2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE) ◽

10.1109/wiecon-ece52138.2020.9397945 ◽

2020 ◽

Author(s):

Sabrina Afroz ◽

S.M Ariful Islam ◽

Samin Nawer Rafa ◽

Maheen Islam

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Intrusion Detection ◽

Learning System ◽

Support Vector

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text