Detection of cyber harassment (cyberbullying) on Instagram using naïve bayes classifier with bag of words and lexicon based features

Author(s):  
Putra Pandu Adikara ◽  
Sigit Adinugroho ◽  
Salsabila Insani
Repositor ◽  
2020 ◽  
Vol 2 (7) ◽  
pp. 933
Author(s):  
Siti Maghfiroh ◽  
Setio Basuki ◽  
Yufis Azhar

Kasus tindak kejahatan konvensional seperti penganiayaan, penculikan, pencurian, dll masih jarang digunakan sebagai objek penelitian. Kasus kejahatan yang biasa diteliti hanya pada lingkup kejahatan cyber seperti pembajakan software, carding, penipuan online, dll. Maka dalam penelitian ini penulis mengangkat kasus kejahatan konvensional sebagai objek penelitian. Penulis mencoba mendapatkan informasi kejahatan dari media sosial, Twitter. Dari Twitter didapatkan data berupa cuitan para pengguna yang mengandung unsur kejahatan. Selanjutnya, akan dilakukan klasifikasi untuk menentukan mana di antara data tersebut yang benar-benar mengandung informasi kejahatan, dan bukan merupakan sebuah opini. Metode yang digunakan dalam pengklasifikasian data adalah algoritma Naive Bayes Classifier dengan 2 jenis dataset. Dataset pertama berisi fitur lexical atau bag of words dan dataset kedua berisi fitur sintaktik. Penulis menggunakan 2 dataset untuk membandingkan kinerja dari kedua fitur dalam proses klasifikasi data tweets. Rata-rata hasil akurasi model klasifikasi menggunakan fitur sintaktik adalah sebesar 88,1398% sedangkan pada fitur lexical atau bag of words sebesar 79,25%. Kemudian dari hasil klasifikasi, penulis mendapatkan lokasi di mana tindak kejahatan tersebut terjadi menggunakan metode Named Entity Recognition (NER). Dari proses NER tersebut, maka didapatkan hasil akurasi sebesar 65%.


2021 ◽  
Author(s):  
Deniz Ertuncay ◽  
Giovanni Costa

AbstractNear-fault ground motions may contain impulse behavior on velocity records. To calculate the probability of occurrence of the impulsive signals, a large dataset is collected from various national data providers and strong motion databases. The dataset has a large number of parameters which carry information on the earthquake physics, ruptured faults, ground motion parameters, distance between the station and several parts of the ruptured fault. Relation between the parameters and impulsive signals is calculated. It is found that fault type, moment magnitude, distance and azimuth between a site of interest and the surface projection of the ruptured fault are correlated with the impulsiveness of the signals. Separate models are created for strike-slip faults and non-strike-slip faults by using multivariate naïve Bayes classifier method. Naïve Bayes classifier allows us to have the probability of observing impulsive signals. The models have comparable accuracy rates, and they are more consistent on different fault types with respect to previous studies.


2021 ◽  
Vol 30 (1) ◽  
pp. 774-792
Author(s):  
Mazin Abed Mohammed ◽  
Dheyaa Ahmed Ibrahim ◽  
Akbal Omran Salman

Abstract Spam electronic mails (emails) refer to harmful and unwanted commercial emails sent to corporate bodies or individuals to cause harm. Even though such mails are often used for advertising services and products, they sometimes contain links to malware or phishing hosting websites through which private information can be stolen. This study shows how the adaptive intelligent learning approach, based on the visual anti-spam model for multi-natural language, can be used to detect abnormal situations effectively. The application of this approach is for spam filtering. With adaptive intelligent learning, high performance is achieved alongside a low false detection rate. There are three main phases through which the approach functions intelligently to ascertain if an email is legitimate based on the knowledge that has been gathered previously during the course of training. The proposed approach includes two models to identify the phishing emails. The first model has proposed to identify the type of the language. New trainable model based on Naive Bayes classifier has also been proposed. The proposed model is trained on three types of languages (Arabic, English and Chinese) and the trained model has used to identify the language type and use the label for the next model. The second model has been built by using two classes (phishing and normal email for each language) as a training data. The second trained model (Naive Bayes classifier) has been applied to identify the phishing emails as a final decision for the proposed approach. The proposed strategy is implemented using the Java environments and JADE agent platform. The testing of the performance of the AIA learning model involved the use of a dataset that is made up of 2,000 emails, and the results proved the efficiency of the model in accurately detecting and filtering a wide range of spam emails. The results of our study suggest that the Naive Bayes classifier performed ideally when tested on a database that has the biggest estimate (having a general accuracy of 98.4%, false positive rate of 0.08%, and false negative rate of 2.90%). This indicates that our Naive Bayes classifier algorithm will work viably on the off chance, connected to a real-world database, which is more common but not the largest.


Sign in / Sign up

Export Citation Format

Share Document