scholarly journals Determining Bullying Text Classification Using Naive Bayes Classification on Social Media

Jurnal Varian ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 133-140
Author(s):  
Ade Clinton Sitepu ◽  
Wanayumini Wanayumini ◽  
Zakarias Situmorang

Cyber-bullying includes repeated acts with the aim of scaring, angering, or embarrassing those who are targeted Cyber-bullying is happening along with the rapid development of technology and social media in society. The media and users need to filter out bully comments because they can indirectly affect the mental psychology that reads them especially directly aimed at that person. By utilizing information mining, the system is expected to be able to classify information circulating in the community. One of the classification techniques that can be applied to text-based classification is Naïve Bayes. The algorithm is good at performing the classification process. In this research, the precision of the algorithm's has been carried out on 1000 comment datasets. The data is grouped manually first into the labels "bully" and "not bully" then the data is divided into training data and test data. To test the system's ability, the classified data is analyzed using the confusion matrix method. The results showed that the Naïve Bayes Algorithm got the level of precision at 87%. and the level of  area under the curve (AUC) at 88%. In terms of speed of completing the system, the Naïve Bayes Algorithm has a very good rate of speed with completion time of 0.033 seconds.

2019 ◽  
Vol 15 (2) ◽  
pp. 247-254
Author(s):  
Heru Sukma Utama ◽  
Didi Rosiyadi ◽  
Dedi Aridarma ◽  
Bobby Suryo Prakoso

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.


SinkrOn ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 9-20
Author(s):  
Antonius Yadi Kuntoro

Abstract — The current Governor of DKI Jakarta, even though he has been elected since 2017 is always interesting to talk about or even comment on. Comments that appear come from the media directly or through social media. Twitter has become one of the social media that is often used as a media to comment on elected governors and can even become a trending topic on Twitter social media. Netizens who comment are also varied, some are always Tweeting criticism, some are commenting Positively, and some are only re-Tweeting. In this research, a prediction of whether active Netizens will tend to always lead to Positive or Negative comments will be carried out in this study. Model algorithms used are Decision Tree, Naïve Bayes, Random Forest, and also Ensemble. Twitter data that is processed must go through preprocessing first before proceeding using Rapidminer. In trials using Rapidminer conducted in four trials by dividing into two parts, namely testing data and training data. Comparisons made are 10% testing data: 90% Training data, then 20% testing data: 80% training data, then 30% testing data: 70% training data, and the last is 35% testing data: 65% training data. The average Accuracy for the Decision Tree algorithm is 93.15%, while for the Naïve Bayes algorithm the Accuracy is 91.55%, then for the Random Forest algorithm is 93.41, and the last is the Ensemble algorithm with an Accuracy of 93, 42%. here. Keywords — Decision Tree, Naïve Bayes, Random Forest, Set, Twitter.  


Author(s):  
Yessi Jusman ◽  
Widdya Rahmalina ◽  
Juni Zarman

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.


2020 ◽  
Vol 1 (1) ◽  
pp. 17
Author(s):  
Astia Weni Syaputri ◽  
Erno Irwandi ◽  
Mustakim Mustakim

Majors are important in determining student specialization. If there is an error in the direction of the student, it will certainly affect the education of subsequent students. In SMA Negeri 1 Kampar Timur, there are two majors, namely Natural Sciences and Social Sciences. To determine these majors, it is necessary to reference the average value of student grades from semester 3 to semester 5 which includes the average value of Islamic religious education, Indonesian, Citizenship Education, English, Natural Sciences, Social Sciences, and Mathematics. Naive Beyes algorithm is an algorithm that can be used in classifying majors found in SMA Negeri 1 Kampar Timur. To determine the classification of majors in SMA Negeri 1 Kampar Timur, training data and test data are used, respectively at 70% and 30%. This data will be tested for accuracy using a confusion matrix and produces a fairly high accuracy of 96.19%. With this high accuracy, the Naive Bayes algorithm is very suitable to be used in determining the direction of students in SMA Negeri 1 Kampar Timur.


2021 ◽  
Vol 5 (4) ◽  
pp. 820-826
Author(s):  
Yuyun ◽  
Nurul Hidayah ◽  
Supriadi Sahibu

Currently, the spread of information Covid-19 is spreading rapidly. Not only through electronic media, but this information is also disseminated by user posts on social media. Due to the user text posted is varies greatly, it’s needs a special approach to classify these types of posts. This research aims to classify the public sentiment towards the handling of COVID-19. The data from this study were obtained from the social media application i.e., Twitter. This study uses a derivative of the Naïve Bayes algorithm, namely Multinomial Nave Bayes to optimize the classification results.  Three class labels are used to classify public sentiment namely positive, negative, and neutral sentiments. The stage starts with text preprocessing; cleaning, case folding, tokenization, filtering and stemming. Then proceed with weighting using the TF-IDF approach. To evaluate the classification results, data is tested using confusion matrix by testing accuracy, precision, and recall. From the test results, it is found that the weighted average for precision, recall and accuracy is 74%. Research shows that the accuracy of the proposed method has fair classification levels.


2020 ◽  
Vol 1 (1) ◽  
pp. 1-12
Author(s):  
NASIRU ISHOLA ALIYU ◽  
Abdulrahaman Musbau Dogo ◽  
Fatimah Olajumoke Ajibade ◽  
Tosho Abdurauf

Cyberbullying is a type of cybercrime that involves the use of the internet and other information technology resources to deliberately insult, embarrass, harass, bully, and threaten people online. The ubiquity of internet connectivity has enabled an increase in the volume and pace of cyberbullying activities because the criminals no longer need to be physically present when committing the crime. This work aims to analyze and predict cyberbullying on Facebook using Naïve Bayes algorithm. The score accuracy, classification report, and confusion matrix are also employed to assess the performance of the classifier. The accuracy of the classifier is 0.95(95%) which means the model can predict 95 of every 100 instances correctly. Also, the result of the experimental analysis shows that Naïve Bayes is effective in classifying a word into a bully or non-bully word and can identify the category of the bully word that is being sent online.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2020 ◽  
Vol 1 (2) ◽  
pp. 61-66
Author(s):  
Febri Astiko ◽  
Achmad Khodar

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.


Smart cities which are becoming overcrowded today are making human beings life miserable and prone to more challenges on daily basis. Overcrowded is leading to vast generation of wastes contributing to air pollution and in turn is affecting health causing various diseases. Even though various measures are taken to recycle wastes, the rate at which it is being produced is becoming higher and higher. This paper deals with prediction of waste generation using Naïve Bayes machine learning algorithm(Classifier) based on the statistics of previous waste datasets. The datasets used for the future prediction are obtained from reliable sources. The implementation of the algorithm is done in Pyspark using Anaconda Jupyter. The performance of the classifier on the datasets is analyzed with confusion matrix and accuracy metric is used to rate the efficiency of the classifier. The accuracy obtained indicates that algorithm can be effectively used for real time prediction and it gives more accurate results for huge input datasets based on independence assumption.


Author(s):  
Muskan Patidar

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit


Sign in / Sign up

Export Citation Format

Share Document