Analysis of Cyber Bullying on Facebook Using Text Mining

Cyberbullying is a type of cybercrime that involves the use of the internet and other information technology resources to deliberately insult, embarrass, harass, bully, and threaten people online. The ubiquity of internet connectivity has enabled an increase in the volume and pace of cyberbullying activities because the criminals no longer need to be physically present when committing the crime. This work aims to analyze and predict cyberbullying on Facebook using Naïve Bayes algorithm. The score accuracy, classification report, and confusion matrix are also employed to assess the performance of the classifier. The accuracy of the classifier is 0.95(95%) which means the model can predict 95 of every 100 instances correctly. Also, the result of the experimental analysis shows that Naïve Bayes is effective in classifying a word into a bully or non-bully word and can identify the category of the bully word that is being sent online.

Download Full-text

Determining Bullying Text Classification Using Naive Bayes Classification on Social Media

Jurnal Varian ◽

10.30812/varian.v4i2.1086 ◽

2021 ◽

Vol 4 (2) ◽

pp. 133-140

Author(s):

Ade Clinton Sitepu ◽

Wanayumini Wanayumini ◽

Zakarias Situmorang

Keyword(s):

Social Media ◽

Naive Bayes ◽

Rapid Development ◽

Confusion Matrix ◽

Area Under The Curve ◽

Naïve Bayes ◽

Cyber Bullying ◽

Training Data ◽

The Media ◽

Bayes Algorithm

Cyber-bullying includes repeated acts with the aim of scaring, angering, or embarrassing those who are targeted Cyber-bullying is happening along with the rapid development of technology and social media in society. The media and users need to filter out bully comments because they can indirectly affect the mental psychology that reads them especially directly aimed at that person. By utilizing information mining, the system is expected to be able to classify information circulating in the community. One of the classification techniques that can be applied to text-based classification is Naïve Bayes. The algorithm is good at performing the classification process. In this research, the precision of the algorithm's has been carried out on 1000 comment datasets. The data is grouped manually first into the labels "bully" and "not bully" then the data is divided into training data and test data. To test the system's ability, the classified data is analyzed using the confusion matrix method. The results showed that the Naïve Bayes Algorithm got the level of precision at 87%. and the level of area under the curve (AUC) at 88%. In terms of speed of completing the system, the Naïve Bayes Algorithm has a very good rate of speed with completion time of 0.033 seconds.

Download Full-text

SENTIMEN ANALISIS KEBIJAKAN GANJIL GENAP DI TOL BEKASI MENGGUNAKAN ALGORITMA NAIVE BAYES DENGAN OPTIMALISASI INFORMATION GAIN

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.705 ◽

2019 ◽

Vol 15 (2) ◽

pp. 247-254

Author(s):

Heru Sukma Utama ◽

Didi Rosiyadi ◽

Dedi Aridarma ◽

Bobby Suryo Prakoso

Keyword(s):

Social Media ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

Toll Road ◽

Textual Data ◽

Bayes Algorithm

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.

Download Full-text

Prediction of Solid Garbage Waste Generation in Smart Cities using Naive Bayes Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1031.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 53-56

Keyword(s):

Naive Bayes ◽

Learning Algorithm ◽

Smart Cities ◽

Confusion Matrix ◽

Daily Basis ◽

Naïve Bayes ◽

Human Beings ◽

Waste Generation ◽

Future Prediction ◽

Bayes Algorithm

Smart cities which are becoming overcrowded today are making human beings life miserable and prone to more challenges on daily basis. Overcrowded is leading to vast generation of wastes contributing to air pollution and in turn is affecting health causing various diseases. Even though various measures are taken to recycle wastes, the rate at which it is being produced is becoming higher and higher. This paper deals with prediction of waste generation using Naïve Bayes machine learning algorithm(Classifier) based on the statistics of previous waste datasets. The datasets used for the future prediction are obtained from reliable sources. The implementation of the algorithm is done in Pyspark using Anaconda Jupyter. The performance of the classifier on the datasets is analyzed with confusion matrix and accuracy metric is used to rate the efficiency of the classifier. The accuracy obtained indicates that algorithm can be effectively used for real time prediction and it gives more accurate results for huge input datasets based on independence assumption.

Download Full-text

Attribute Selection in Naive Bayes Algorithm Using Genetic Algorithms and Bagging for Prediction of Liver Disease

JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING ◽

10.31289/jite.v4i1.3793 ◽

2020 ◽

Vol 4 (1) ◽

pp. 76-85

Author(s):

Dwi Yuni Utami ◽

Elah Nurlelah ◽

Noer Hikmah

Keyword(s):

Genetic Algorithms ◽

Liver Disease ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Attribute Selection ◽

World Health ◽

The Difference ◽

Bayes Algorithm ◽

Health Organization

Liver disease is an inflammatory disease of the liver and can cause the liver to be unable to function as usual and even cause death. According to WHO (World Health Organization) data, almost 1.2 million people per year, especially in Southeast Asia and Africa, have died from liver disease. The problem that usually occurs is the difficulty of recognizing liver disease early on, even when the disease has spread. This study aims to compare and evaluate Naive Bayes algorithm as a selected algorithm and Naive Bayes algorithm based on Genetic Algorithm (GA) and Bagging to find out which algorithm has a higher accuracy in predicting liver disease by processing a dataset taken from the UCI Machine Learning Repository database (GA). University of California Invene). From the results of testing by evaluating both the confusion matrix and the ROC curve, it was proven that the testing carried out by the Naive Bayes Optimization algorithm using Algortima Genetics and Bagging has a higher accuracy value than only using the Naive Bayes algorithm. The accuracy value for the Naive Bayes algorithm model is 66.66% and the accuracy value for the Naive Bayes model with attribute selection using Genetic Algorithms and Bagging is 72.02%. Based on this value, the difference in accuracy is 5.36%.Keywords: Liver Disease, Naïve Bayes, Genetic Agorithms, Bagging.

Download Full-text

The Use of Naive Bayes for Broiler Digestive Tract Disease Detection

Journal on Information Technology and Computer Engineering ◽

10.25077/jitce.3.01.1-7.2019 ◽

2019 ◽

Vol 3 (01) ◽

pp. 1-7

Author(s):

Hindriyanto Dwi Purnomo

Keyword(s):

Evaluation Method ◽

Naive Bayes ◽

Confusion Matrix ◽

Gastrointestinal Diseases ◽

Naïve Bayes ◽

Common Disease ◽

High Productivity ◽

Bayes Algorithm ◽

Tract Disease

Broiler chicken is a species of chicken that have high productivity. In order to get a good quality of chicken, good treatments of the breeding factors is needed, so the chicken will not easily infected by diseases. Gastrointestinal diseases are common disease that infects chickens. The mortality level caused by gastrointestinal diseases is considered high. This study is designed to address the problem by developing a system using the Naive Bayes algorithm. 60 chicken data samples were used, and the result shows that Naive Bayes might be used to detect gastrointestinal diseases among chickens with accuracy level of 93.3%. The number was confirmed by using confusion matrix evaluation method, and gave same level of accuracy compared to the expert judgments.

Download Full-text

Sentiment Analysis Using Naive Bayes Algorithm with Feature Selection Particle Swarm Optimization (PSO) and Genetic Algorithm

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i2.1224 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Abi Rafdi ◽

Herman Mawengkang Herman ◽

Syahril Efendi

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Naive Bayes ◽

Confusion Matrix ◽

Particle Swarm ◽

Naïve Bayes ◽

Swarm Optimization ◽

Bayes Algorithm

This study analyzes Sentiment to see opinions, points of view, judgments, attitudes, and emotions towards creatures and aspects expressed through texts. One of Social Media is like Twitter is one of the most widely used means of communication as a research topic. The main problem with sentiment analysis is voting and using the best feature options for maximum results. Either, the most widely known classification method is Naive Bayes. However, Naive Bayes is very sensitive to significant features. That way, in this test, a comparison of feature selection is carried out using Particle Swarm Optimization and Genetic Algorithm to improve the accuracy performance of the Naive Bayes algorithm. Analyses are performed by comparing before and after testing using feature selection. Validation uses a cross-validation technique, while the confusion matrix ??is appealed to measure accuracy. The results showed the highest increase for Naïve Bayes algorithm accuracy when using the feature selection of the Particle Swarm Optimization Algorithm from 60.26% to 77.50%, while the genetic algorithm from 60.26% to 70.71%. Therefore, the choice of the best characteristics is Particle Swarm Optimization which is superior with an increase in accuracy of 17.24%.

Download Full-text

ANALISIS PEMANTAUAN LAN MENGGUNAKAN METODE QoS DAN PENGKLASIFIKASIAN STATUS JARINGAN INTERNET MENGGUNAKAN ALGORITMA NAIVE BAYES

Jurnal Ilmiah Teknologi Infomasi Terapan ◽

10.33197/jitter.vol4.iss2.2018.159 ◽

2018 ◽

Vol 4 (2) ◽

Author(s):

Sachin Sabloak ◽

Jasuandi Wijaya ◽

Abdul Rahman ◽

Molavi Arman

Keyword(s):

Quality Of Service ◽

Naive Bayes ◽

Naïve Bayes ◽

The Internet ◽

Data Set ◽

Computer Laboratory ◽

The Status ◽

Bayes Algorithm ◽

Internet Network

[Id]Pentingnya jaringan komputer pada kehidupan sekarang, perlu adanya kestabilan jaringan komputer yang digunakan. Pemantauan kualitas jaringan internet didalam sebuah jaringan LAN dilakukan network administrator untuk mendapatkan nilai dari data yang didapat, penelitian ini menerapkan algoritma Naive Bayes menggunakan dataset TIPHON dengan parameter yang terdapat dalam metode QoS yaitu delay, packetloss dan jitter untuk memonitor kualitas jaringan internet. Metode QoS akan menghasilkan nilai dari setiap parameter yang dibutuhkan untuk pemantauan jaringan, guna mendapatkan kesimpulan mengenai status jaringan internet digunakan Algoritma Naive Bayes. Metode Quality of Service (QoS) merupakan sebuah metode yang digunakan dalam mendefinisikan kemampuan suatu jaringan yang ?digunakan untuk pengukuran tentang kualitas ?jaringan. Penggunaan algoritma Naive Bayes diperlukan karena algoritma tersebut digunakan dalam pengklasifikasian yang menggunakan probabilitas dan statistik serta mampu mengambil keputusan dengan menggunakan dataset yang telah disediakan. Tujuan penelitian ini dilakukan untuk mengetahui status jaringan internet di lab komputer STMIK Global Informatika MDP serta mengetahui tingkat akurasi dari algoritma Naive Bayes untuk mengklasifikasikan status jaringan internet. Pengujian penelitian dilakukan di lab komputer STMIK Global Informatika MDP. Hasil pengujian dalam penelitian ini menunjukkan bahwa akurasi Naive Bayes yang didapatkan sebesar 87,78% dan status jaringan internet di lab komputer STMIK Global Informatika MDP masuk ke dalam kategori memuaskan dengan nilai dominan yaitu sebesar 47,78%.Kata Kunci: Naive Bayes, network administrator, Quality of Service (QoS), status jaringan internet.[En]Since computer network is very important nowadays, it needs the stability of the network used. Monitoring the quality of the internet network in LAN is conducted by an administrator to get the value of the data obtained. This research applied Naive Bayes algorithm using TIPHON data set with parameters in QoS method; delay, packetloss and jitter, to monitor the quality of the internet network. QoS method will gain value in every parameter needed for network monitoring. To get a conclusion about the status of the internet network, Naive Bayes algorithm was used. Quality of Service (QoS) method is a method used to define the ability of a network to measure its quality. Naive Bayes algorithm is needed since the algorithm is used in classifying using probability and statistic as well as making decision using dataset provided. This research is conducted to see the status of the internet network in STMIK Global Informatika MDP computer laboratory and to know the level of accuracy of Naive Bayes algorithm to classify the status of the network. The research was conducted in STMIK Global Informatika MDP computer laboratory. The result of the research showed that the accuracy of Naive Bayes was 87,78% and the status of the internet network STMIK Global Informatika MDP was in the category of satisfactory with dominant value 47,78%.

Download Full-text

A Machine Learning Framework for Improving Classification Performance on Credit Approval

IJID (International Journal on Informatics for Development) ◽

10.14421/ijid.2021.2384 ◽

2021 ◽

Vol 10 (1) ◽

pp. 47-52

Author(s):

Pulung Hendro Prastyo ◽

Septian Eko Prasetyo ◽

Shindy Arti

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Information Gain ◽

Learning Algorithm ◽

Confusion Matrix ◽

Credit Scoring ◽

Research Work ◽

Classification Performance ◽

Naïve Bayes ◽

Bayes Algorithm

Credit scoring is a model commonly used in the decision-making process to refuse or accept loan requests. The credit score model depends on the type of loan or credit and is complemented by various credit factors. At present, there is no accurate model for determining which creditors are eligible for loans. Therefore, an accurate and automatic model is needed to make it easier for banks to determine appropriate creditors. To address the problem, we propose a new approach using the combination of a machine learning algorithm (Naïve Bayes), Information Gain (IG), and discretization in classifying creditors. This research work employed an experimental method using the Weka application. Australian Credit Approval data was used as a dataset, which contains 690 instances of data. In this study, Information Gain is employed as a feature selection to select relevant features so that the Naïve Bayes algorithm can work optimally. The confusion matrix is used as an evaluator and 10-fold cross-validation as a validator. Based on experimental results, our proposed method could improve the classification performance, which reached the highest performance in average accuracy, precision, recall, and f-measure with the value of 86.29%, 86.33%, 86.29%, 86.30%, and 91.52%, respectively. Besides, the proposed method also obtains 91.52% of the ROC area. It indicates that our proposed method can be classified as an excellent classification.

Download Full-text

Comparison Analysis of K-Nearest Neighbor and Naïve Bayes in Determining Talent of Adolescence

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i1.118 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Yessi Jusman ◽

Widdya Rahmalina ◽

Juni Zarman

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Combined Training ◽

Testing Data ◽

Bayes Algorithm ◽

Children's Interests

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.

Download Full-text

Klasifikasi Opini Masyarakat Terhadap Jasa Ekspedisi JNE dengan Naïve Bayes

JURNAL SISTEM INFORMASI BISNIS ◽

10.21456/vol8iss1pp92-98 ◽

2018 ◽

Vol 8 (1) ◽

pp. 92

Author(s):

Fithri Selva Jumeilah

Keyword(s):

Naive Bayes ◽

Probability Model ◽

Confusion Matrix ◽

Naïve Bayes ◽

Service Users ◽

Average Percentage ◽

Online Sales ◽

Bayes Algorithm ◽

Using Data

The large number of online sales transactions has increased the number of service users. One of the companies engaged in the delivery service in Indonesia is Tiki Nugraha Ekakurir or more known JNE. Currently, JNE service users reach 14.000.000 per month. JNE has used many media communications with its customers one of them with Twitter. The number of followers of JNECare is 108,000 and the number of tweets is 375,000. The number of comments for people who can be used to see what they think of JNE is an inseparable comment is a negative, positive or neutral category. To simplify the grouping of comments, the data will be classified using the Naïve Bayes method present in Rstudio. The amount data used on the internet is 1725 tweets. The data will be divided into allegations of 70% data training as much as 1208 data and 30% data testing or as many as 517 data. Before the data is classified the previous data must go through the process of preprocessing that is changing all the letters into lowercase and other letters other than letters and spaces (case folding), tokenizing words, and the removal of the word common (stopword remove). After the data is cleared the data will be labeled manually one by one and new data can be used for the training process to get the probability model for each category. Probailitas obtained by using Naïve bayes algorithm. Models obtained from the training will be used using data testing. The categories obtained from the test will be used to process the data used by using the confusion matrix and will calculate the accuracy, precision and recall. From the results of the classification of JNE comments obtained that Naïve Bayes was able to classify the data well. This is evidenced by the average percentage accuracy of 85%, 78% precision and 67% recall.

Download Full-text