Detection Of Spam Comments On Instagram Using Complementary Naïve Bayes

Nur Azizul Haqimi; Nur Rokhman; Sigit Priyanta

doi:10.22146/ijccs.47046

Detection Of Spam Comments On Instagram Using Complementary Naïve Bayes

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.47046 ◽

2019 ◽

Vol 13 (3) ◽

pp. 263

Author(s):

Nur Azizul Haqimi ◽

Nur Rokhman ◽

Sigit Priyanta

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Test Data ◽

Training Data ◽

Classification Method ◽

Support Vector ◽

Test Results ◽

Imbalanced Dataset ◽

Web Based ◽

F Measure

Instagram (IG) is a web-based and mobile social media application where users can share photos or videos with available features. Upload photos or videos with captions that contain an explanation of the photo or video that can reap spam comments. Comments on spam containing comments that are not relevant to the caption and photos. The problem that arises when identifying spam is non-spam comments are more dominant than spam comments so that it leads to the problem of the imbalanced dataset. A balanced dataset can influence the performance of a classification method. This is the focus of research related to the implementation of the CNB method in dealing with imbalance datasets for the detection of Instagram spam comments. The study used TF-IDF weighting with Support Vector Machine (SVM) as a comparison classification. Based on the test results with 2500 training data and 100 test data on the imbalanced dataset (25% spam and 75% non-spam), the CNB accuracy was 92%, precision 86% and f-measure 93%. Whereas SVM produces 87% accuracy, 79% precision, 88% f-measure. In conclusion, the CNB method is more suitable for detecting spam comments in cases of imbalanced datasets.

Download Full-text

Analisis sentimen kritik dan saran pelatihan aplikasi teknologi informasi (PATI) menggunakan algoritma support vector machine (SVM)

Repositor ◽

10.22219/repositor.v1i1.11 ◽

2019 ◽

Vol 1 (1) ◽

pp. 39

Author(s):

Alimuddin Hasan Al Kabir ◽

Setio Basuki ◽

Galih Wasis Wicaksono

Keyword(s):

Support Vector Machine ◽

Quality Improvement ◽

Training Data ◽

Support Vector ◽

Test Results ◽

Training Process ◽

School Year ◽

Analysis Process ◽

Data Arrangement ◽

F Measure

Public opinion is one of many instrument that can be used to evaluate of an event. This research based on several problems, they are (1) quality improvement are necessary for Pelatihan Aplikasi Teknologi Informasi, (2) too much of data make the participant’s opinion that has been collected has not been maximally utilized. Criticism and suggestions datas are taken from 2016/2017th school year in amount of 1050. Support Vector Machine is used as a method in sentiment analysis. The data training process will produce the best hyperplane used as a reference to determine which sentiment class is much appropriate for a sentence. The test is done by dividing the dataset into the test data as much as 20% and the training data as much as 80% so it can be done the analysis process up to 5 times iteration with different data arrangement. The test results show the calculation of Accuracy, Precision, Recall, and F-Measure generated by system is equal to 82,08%, 83,42%, 81,16%, and 81,82%.

Download Full-text

Analisis Sentimen Data Twitter Tentang Pasangan Capres-Cawapres Pemilu 2019 Dengan Metode Lexicon Based Dan Support Vector Machine

Jurnal Ilmiah FIFO ◽

10.22441/fifo.2019.v11i2.004 ◽

2019 ◽

Vol 11 (2) ◽

pp. 144

Author(s):

Danar Wido Seno ◽

Arief Wibowo

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Vice President ◽

Training Data ◽

Support Vector ◽

New Words ◽

Textual Data ◽

Data Content ◽

Combination Of Methods

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.

Download Full-text

SENTIMENT ANALYSIS ON TWITTER BY USING MAXIMUM ENTROPY AND SUPPORT VECTOR MACHINE METHOD

SINERGI ◽

10.22441/sinergi.2020.2.002 ◽

2020 ◽

Vol 24 (2) ◽

pp. 87

Author(s):

Mona Cindo ◽

Dian Palupi Rini ◽

Ermatita Ermatita

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Extraction ◽

Product Quality ◽

Sentiment Analysis ◽

Maximum Entropy ◽

The Other ◽

Classification Method ◽

Support Vector ◽

General Opinion

With the advancement of social media and its growth, there is a lot of data that can be presented for research in social mining. Twitter is a microblogging that can be used. In this event, a lot of companies used the data on Twitter to analyze the satisfaction of their customer about product quality. On the other hand, a lot of users use social media to express their daily emotions. The case can be developed into a research study that can be used both to improve product quality, as well as to analyze the opinion on certain events. The research is often called sentiment analysis or opinion mining. While The previous research does a particularly useful feature for sentiment analysis, but it is still a lack of performance. Furthermore, they used Support Vector Machine as a classification method. On the other hand, most researchers found another classification method, which is considered more efficient such as Maximum Entropy. So, this research used two types of a dataset, the general opinion data, and the airline's opinion data. For feature extraction, we employ four feature extraction, such as pragmatic, lexical-grams, pos-grams, and sentiment lexical. For the classification, we use both of Support Vector Machine and Maximum Entropy to find the best result. In the end, the best result is performed by Maximum Entropy with 85,8% accuracy on general opinion data, and 92,6% accuracy on airlines opinion data.

Download Full-text

Perbandingan Algoritme Machine Learning untuk Memprediksi Pengambil Matakuliah

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019651755 ◽

2019 ◽

Vol 6 (5) ◽

pp. 543 ◽

Cited By ~ 1

Author(s):

Fitra A. Bachtiar ◽

Indra K. Syahputra ◽

Satrio A. Wicaksono

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Classification Method ◽

Support Vector ◽

Test Results ◽

K Nearest Neighbor ◽

Student Data ◽

Prediction Problems ◽

Imbalance Dataset

Pada setiap awal semester bagian akademik melakukan penjadwalan dan penentuan matakuliah yang akan dibuka untuk semester berikutnya. Akan tetapi proses tersebut memiliki permasalahan antara lain kelas yang dibuka terlalu banyak dibanding jumlah siswa yang berminat atau sebaliknya. Selain itu, dalam permasalahan prediksi data yang terkumpul memiliki kecenderungan tidak seimbang pada setiap kelas (imbalance class). Hal ini akan berdampak pada proses penjadwalan yang kurang tepat. Sehingga dibutuhkan sistem yang dapat memprediksi mahasiswa pengambil mata kuliah. Akan tetapi ada banyak algoritme yang dapat digunakan untuk proses prediksi. Penelitian ini membandingkan performa algoritma untuk klasifikasi mahasiswa pengambil matakuliah. Pada penelitian ini prediksi dilakukan berdasarkan atribut dari data mahasiswa. Atribut-atribut tersebut yaitu Nilai, IP, IPK, SKS, SKSK dan Semester. Pada setiap observasi pada atribut-atribut tersebut prediksi akan dilakukan apakah mahasiswa tersebut mengambil mata kuliah tertentu. Prediksi dibagi menjadi 2 kelas yaitu ‘Ya’ untuk mahasiswa yang diprediksi mengambil matakuliah dan ‘Tidak’ untuk mahasiswa yang diprediksi tidak mengambil matakuliah. Teknik Synthetic Minority Oversampling Technique (SMOTE) digunakan untuk menangani data yang tidak seimbang. Pada penelitian ini klasifikasi dilakukan dengan membandingkan algoritme k-Nearest Neighbor (k-NN) dan Support Vector Machine (SVM) untuk kasus prediksi pengambil matakuliah. Hasil pengujian menggunakan 3 mata kuliah sebagai sampel. Dari hasil rerata, diperoleh hasil prediksi k-NN memiliki kinerja yang lebih baik daripada SVM. Selain itu, penggunaan teknik SMOTE dapat mempengaruhi hasil klasifikasi berupa peningkatan nilai AUC, CA, F1, precision dan recall. AbstractAt the beginning of each semester, the academic section conducts scheduling and determining the courses offered for the next semester. However, the process has problems such as too many classes offered to the student compared to the number of students who take the class or vice versa. Besides that, in the prediction problems, the collected data has an imbalance tendency in each class. As a result, these problems could cause in ineffective scheduling. Thus, there is a need to build a system that can predict students taking courses. However, there are many algorithms that can be used for the prediction. This study compares the performance of algorithms for classifications of students taking courses. In this study, predictions are modeled based on the attributes of student data, namely Grades, GPA, Cumulative GPA, Semester Credits, Cumulative Semester Credits and Semester. The classification process will be carried out to produce a prediction of whether the student takes a particular subject or not. Classification results are divided into 2 classes, namely 'Yes' for students who are predicted to take and 'No' for students who are predicted not to take the class. To handle imbalance dataset will use Synthetic Minority Oversampling Technique (SMOTE) techniques. Classification method used in this study are k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM) algorithms to compare their performance for prediction cases. The test results used 3 courses as a sample. In average k-NN prediction results have a better performance than SVM. In addition, the use of SMOTE techniques can influence the classification results in the form of an increase in AUC, CA, F1, precision and recall values.

Download Full-text

Hoax News Classification using Machine Learning Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3753.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3938-3944

Keyword(s):

Machine Learning ◽

Social Media ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Training Data ◽

Stochastic Gradient Descent ◽

Support Vector ◽

The Impact ◽

F Measure

Hoax news on social media has had a dramatic effect on our society in recent years. The impact of hoax news felt by many people, anxiety, financial loss, and loss of the right name. Therefore we need a detection system that can help reduce hoax news on social media. Hoax news classification is one of the stages in the construction of a hoax news detection system, and this unsupervised learning algorithm becomes a method for creating hoax news datasets, machine learning tools for data processing, and text processing for detecting data. The next will produce a classification of a hoax or not a Hoax based on the text inputted. Hoax news classification in this study uses five algorithms, namely Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression, Stochastic Gradient Descent, and Neural Network (MLP). These five algorithms to produce the best algorithm that can use to detect hoax news, with the highest parameters, accuracy, F-measure, Precision, and recall. From the results of testing conducted on five classification algorithms produced shows that the NN-MPL algorithm has an average of 93% for the value of accuracy, F-Measure, and Precision, the highest compared to five other algorithms, but for the highest Recall value generated from the algorithm SVM which is 94%. the results of this experiment show that different effects for different classifiers, and that means that the more hoax data used as training data, the more accurate the system calculates accuracy in more detail.

Download Full-text

ANALISIS SENTIMEN PUBLIK PADA MEDIA SOSIAL TWITTER TERHADAP PELAKSANAAN PILKADA SERENTAK MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE

CCIT Journal ◽

10.33050/ccit.v10i2.539 ◽

2017 ◽

Vol 10 (2) ◽

pp. 197-206

Author(s):

Atika Rahmawati ◽

Aris Marjuni ◽

Junta Zeniarja

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Public Response ◽

The People ◽

Word Detection ◽

Twitter Users ◽

Media Sentiment

Pilkada Serentak is a very important event for the future viability regions and countries. Through this election people can cast their vote and elect representatives of the people according to their choice. Public respond can be expressed through twitter social media. Using twitter social media sentiment analysis can then be made about the public response to the implementation of the election simultaneously. The classification process can be detected via text tweeted by twitter users. In this study, the classification of responses detected by text because it is easily obtained and applied. This study determined the classification of the response to the Indonesian language text and increase accuracy by using SVM.Tweet classification method used by the categorical approach is divided into two classes tweet basic level: positive and negative. Data collected from Indonesian twitter tweet as much as 3000. The labeling is not done manually but using clustering method that divides the 3000 data into two groups. Cluster 1 as a group of positive tweets and Cluster 2 as a negative group tweet.2700 for training data and 300 for the test data. The stage of pre-processing the data includetokenization, casenormalization, stop word detection, and stemming. The process of classification using Support Vector Machine (SVM). Accuracy of SVM showed the highest yield that is 91% compared to the k-means clustering with the results of 82%.

Download Full-text

Analysis of rainfall classification over Tanah Laut disrict based on global climate indicators using support vector machine method

Journal of Physics Conference Series ◽

10.1088/1742-6596/2106/1/012009 ◽

2021 ◽

Vol 2106 (1) ◽

pp. 012009

Author(s):

N Hayah ◽

O Soesanto ◽

M A Rahman

Keyword(s):

Support Vector Machine ◽

Global Climate ◽

Climatic Conditions ◽

Training Data ◽

Classification Method ◽

Support Vector ◽

Rainfall Forecasting ◽

Machine Method ◽

Svm Classification ◽

The Relationship

Abstract The Support Vector Machine (SVM) classification method can be applied in various fields, one of which is meteorology and climatology in rainfall forecasting. Thus, a study was conducted by classifying rainfall to recognize the relationship between global phenomena and rainfall and the results of applying the classification using the SVM method to rainfall in the Tanah Laut Regency. The analysis is carried out using the SVM Multiclass concept with 4 categories of rainfall classification: low, medium, high, and Extreme. The kernel used in SVM is the RBF kernel with optimization parameters used, namely Cost (C) 1,5,10,15 and Gamma (γ) 1,5,10,15. The dataset formed is based on the annual period, climatic conditions, and seasonality. The Spearman Rank correlation test describes the relationship between global phenomena and rainfall with a correlation range of (−0.1456 ) − (0.43144) for the entire dataset. The implementation of the SVM classification method shows that the Cost (C) 10 and Gamma (γ) ≥ 5 parameters obtained the highest accuracy of 100% on the training data. In contrast, in testing the data testing, the accuracy was good, namely the accuracy of 78.00% in La Nina and 81.38% in seasonal periods.

Download Full-text

Pengembangan Sistem Penilaian Kematangan Tandan Buah Segar Kelapa Sawit menggunakan Citra 680 dan 750 NM

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020722603 ◽

2020 ◽

Vol 7 (2) ◽

pp. 379

Author(s):

Agung Wahyu Setiawan ◽

Alfie R. Ananda

Keyword(s):

Support Vector Machine ◽

Test Data ◽

Palm Oil ◽

Image Contrast ◽

Human Vision ◽

Training Data ◽

Training Dataset ◽

Support Vector ◽

Light Sources ◽

Maturity Level

Salah satu permasalahan utama dalam industri kelapa sawit adalah proses sortasi Tandan Buah Segar (TBS) di pabrik kelapa sawit. Parameter yang digunakan dalam sortasi TBS adalah jumlah brondolan kelapa sawit. Pada saat ini, sortasi dilakukan oleh grader yang bersifat subyektif dan sering kali tidak konsisten. Hal ini terjadi karena keterbatasan penglihatan dan kemampuan manusia untuk mengolah informasi jumlah brondolan setiap TBS dalam waktu yang terbatas. Oleh karena itu, pada penelitian ini dikembangkan sistem penilaian kematangan TBS kelapa sawit berbasis spektroskopi dan nilai kontras citras. Sumber cahaya yang digunakan pada penelitian ini adalah lampu berjenis Light-emitting Diode (LED) dengan panjang gelombang 680 dan 750 nm. Akuisisi citra TBS dilakukan dengan menggunakan kamera DSLR yang telah dimodifikasi. sehingga diperoleh dua citra TBS pada panjang gelombang 680 dan 750 nm. Kemudian, dilakukan perhitungan nilai kontras kedua citra tersebut. Dalam penelitian ini, terdapat 24 TBS yang digunakan sebagai data latih, dengan komposisi 10 TBS matang dan 14 TBS mentah. Data uji yang digunakan berjumlah 77 TBS yang terdiri dari 38 matang dan 39 mentah. Pada penelitian ini, Support Vector Machine (SVM) digunakan sebagai metode klasifikasi. Akurasi data latih yang diperoleh adalah 66,67%. Sedangkan akurasi data uji dari sistem yang dikembangkan dalam penelitian ini adalah 57,14%. Hasil yang diperoleh ini masih perlu diperbaiki untuk meningkatkan akurasi sistem dengan cara menambah jumlah data, baik data latih maupun uji, serta menggunakan pembelajaran mesin. AbstractOne of the main problems in the palm oil industry is the grading of Fresh Fruit Bunches (FFB) in the palm oil mills. The parameter used for the process is the number of fruitlets detached from the bunch. Nowadays, the FFB grading is conducted by graders which is subjective and often inconsistent due to the limitation of human vision and ability to process information on the number of fruitlets detached per FFB in a very limited time. Therefore, this study developed a grading system to assess and estimate the FFB maturity based on spectroscopy and image contrast value. From the literature review, visible light and NIR spectrum in 680 and 780 nm can be used as light sources to detect the maturity level of FFB. DSLR camera is used to acquire the FFB image. Using this scheme, two FFB images in 680 and 750 nm are obtained. The next process is to calculate the image contrast. In this research, there are 24 FFB that are used as training data that consists of 10 ripe and 14 unripe. A total of 77 FFB are used as test data that consists of 38 ripe and 39 unripe. Support Vector Machine (SVM) is used in this research to classify the maturity level of FFB. The accuracy of the training dataset is 66.67%. Meanwhile, the accuracy of the test data is 57.14%. Future works will focus on enhancing accuracy of the system through increasing the number of training and testing data using machine learning.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Mie scattering and microparticle-based characterization of heavy metal ions and classification by statistical inference methods

Royal Society Open Science ◽

10.1098/rsos.190001 ◽

2019 ◽

Vol 6 (5) ◽

pp. 190001 ◽

Cited By ~ 1

Author(s):

Katherine E. Klug ◽

Christian M. Jennings ◽

Nicholas Lytal ◽

Lingling An ◽

Jeong-Yeol Yoon

Keyword(s):

Heavy Metal ◽

Support Vector Machine ◽

Metal Ions ◽

Heavy Metal Ions ◽

Mie Scattering ◽

Training Data ◽

Scattering Data ◽

Support Vector ◽

Linear Discriminant ◽

Machine Analysis

A straightforward method for classifying heavy metal ions in water is proposed using statistical classification and clustering techniques from non-specific microparticle scattering data. A set of carboxylated polystyrene microparticles of sizes 0.91, 0.75 and 0.40 µm was mixed with the solutions of nine heavy metal ions and two control cations, and scattering measurements were collected at two angles optimized for scattering from non-aggregated and aggregated particles. Classification of these observations was conducted and compared among several machine learning techniques, including linear discriminant analysis, support vector machine analysis, K-means clustering and K-medians clustering. This study found the highest classification accuracy using the linear discriminant and support vector machine analysis, each reporting high classification rates for heavy metal ions with respect to the model. This may be attributed to moderate correlation between detection angle and particle size. These classification models provide reasonable discrimination between most ion species, with the highest distinction seen for Pb(II), Cd(II), Ni(II) and Co(II), followed by Fe(II) and Fe(III), potentially due to its known sorption with carboxyl groups. The support vector machine analysis was also applied to three different mixture solutions representing leaching from pipes and mine tailings, and showed good correlation with single-species data, specifically with Pb(II) and Ni(II). With more expansive training data and further processing, this method shows promise for low-cost and portable heavy metal identification and sensing.

Download Full-text