scholarly journals Efficient Implementation using Multinomial Naive Bayes for Prediction of Fake Job Profile

Author(s):  
Prof. R. S. Shishupal ◽  
Varsha ◽  
Supriya Mane ◽  
Vinita Singh ◽  
Damini Wasekar

The growing social media has increased the chances of fake job postings. To avoid fraudulent posts for job, an android application is designed for classification using machine learning. This paper proposes the implementation and working of machine learning based android application. For these various classifiers are used and results of these classifiers are compared for prediction of fake job profiles. Various single classifiers are used and based on the experimental results ,Multinomial Naive Bayes is the best classification to detect fake job over other classifiers.

2020 ◽  
Vol 1 (2) ◽  
pp. 61-66
Author(s):  
Febri Astiko ◽  
Achmad Khodar

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.


Author(s):  
Muskan Patidar

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit


In this never-ending social media era it is estimated that over 5 billion people use smartphones. Out of these, there are over 1.5 billion active users in the world. In which we all are a major part and before opening our messages we all are curious about what message we have received. No doubt, we all always hope for a good message to be received. So Sentiment analysis on social media data has been seen by many as an effective tool to monitor user preferences and inclination. Finally, we propose a scalable machine learning model to analyze the polarity of a communicative text using Naive Bayes’ Bernoulli classifier. This paper works on only two polarities that is whether the sentence is positive or negative. Bernoulli classifier is used in this paper because it is best suited for binary inputs which in turn enhances the accuracy of up to 97%.


2021 ◽  
Vol 22 (1) ◽  
pp. 78-92
Author(s):  
GA Buntoro ◽  
R Arifin ◽  
GN Syaifuddiin ◽  
A Selamat ◽  
O Krejcar ◽  
...  

In 2019, citizens of Indonesia participated in the democratic process of electing a new president, vice president, and various legislative candidates for the country. The 2019 Indonesian presidential election was very tense in terms of the candidates' campaigns in cyberspace, especially on social media sites such as Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, etc. The Indonesian people used social media platforms to express their positive, neutral, and also negative opinions on the respective presidential candidates. The campaigning of respective social media users on their choice of candidates for regents, governors, and legislative positions up to presidential candidates was conducted via the Internet and online media. Therefore, the aim of this paper is to conduct sentiment analysis on the candidates in the 2019 Indonesia presidential election based on Twitter datasets. The study used datasets on the opinions expressed by the Indonesian people available on Twitter with the hashtags (#) containing "Jokowi and Prabowo." We conducted data pre-processing using a selection of comments, data cleansing, text parsing, sentence normalization and tokenization based on the given text in the Indonesian language, determination of class attributes, and, finally, we classified the Twitter posts with the hashtags (#) using Naïve Bayes Classifier (NBC) and a Support Vector Machine (SVM) to achieve an optimal and maximum optimization accuracy. The study provides benefits in terms of helping the community to research opinions on Twitter that contain positive, neutral, or negative sentiments. Sentiment Analysis on the candidates in the 2019 Indonesian presidential election on Twitter using non-conventional processes resulted in cost, time, and effort savings. This research proved that the combination of the SVM machine learning algorithm and alphabetic tokenization produced the highest accuracy value of 79.02%. While the lowest accuracy value in this study was obtained with a combination of the NBC machine learning algorithm and N-gram tokenization with an accuracy value of 44.94%. ABSTRAK: Pada tahun 2019 rakyat Indonesia telah terlibat dalam proses demokrasi memilih presiden baru, wakil presiden, dan berbagai calon legislatif negara. Pemilihan presiden Indonesia 2019 sangat tegang dalam kempen calon di ruang siber, terutama di laman media sosial seperti Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, dll. Rakyat Indonesia menggunakan platfom media sosial bagi menyatakan pendapat positif, berkecuali, dan juga negatif terhadap calon presiden masing-masing. Kampen pencalonan menteri, gabenor, dan perundangan hingga pencalonan presiden dilakukan melalui media internet dan atas talian. Oleh itu, kajian ini dilakukan bagi menilai sentimen terhadap calon pemilihan presiden Indonesia 2019 berdasarkan kumpulan data Twitter. Kajian ini menggunakan kumpulan data yang diungkapkan oleh rakyat Indonesia yang terdapat di Twitter dengan hashtag (#) yang mengandungi "Jokowi dan Prabowo." Proses data dibuat menggunakan pilihan komentar, pembersihan data, penguraian teks, normalisasi kalimat, dan tokenisasi teks dalam bahasa Indonesia, penentuan atribut kelas, dan akhirnya, pengklasifikasian catatan Twitter dengan hashtag (#) menggunakan Klasifikasi Naïve Bayes (NBC) dan Mesin Vektor Sokongan (SVM) bagi mencapai ketepatan optimum dan maksimum. Kajian ini memberikan faedah dari segi membantu masyarakat meneliti pendapat di Twitter yang mengandungi sentimen positif, neutral, atau negatif. Analisis Sentimen terhadap calon dalam pemilihan presiden Indonesia 2019 di Twitter menggunakan proses bukan konvensional menghasilkan penjimatan kos, waktu, dan usaha. Penyelidikan ini membuktikan bahawa gabungan algoritma pembelajaran mesin SVM dan tokenisasi abjad menghasilkan nilai ketepatan tertinggi iaitu 79.02%. Manakala nilai ketepatan terendah dalam kajian ini diperoleh dengan kombinasi algoritma pembelajaran mesin NBC dan tokenisasi N-gram dengan nilai ketepatan 44.94%.


Author(s):  
Akshma Chadha ◽  
Baijnath Kaushik

Abstract Suicide is a major health issue nowadays and has become one of the highest reason for deaths. There are many negative emotions like anxiety, depression, stress that can lead to suicide. By identifying the individuals having suicidal ideation beforehand, the risk of them completing suicide can be reduced. Social media is increasingly becoming a powerful platform where people around the world are sharing emotions and thoughts. Moreover, this platform in some way is working as a catalyst for invoking and inciting the suicidal ideation. The objective of this proposal is to use social media as a tool that can aid in preventing the same. Data is collected from Twitter, a social networking site using some features that are related to suicidal ideation. The tweets are preprocessed as per the semantics of the identified features and then it is converted into probabilistic values so that it will be suitably used by machine learning and ensemble learning algorithms. Different machine learning algorithms like Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Decision Tree, Logistic Regression, Support Vector Machine were applied on the data to predict and identify trends of suicidal ideation. Further the proposed work is evaluated with some ensemble approaches like Random Forest, AdaBoost, Voting Ensemble to see the improvement.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Paisal Paisal

<p class="SammaryHeader" align="center"><strong>Abstract</strong></p><p><em>The use of social media today is not only to communicate between friends, but also is needed to make facilities to convey the aspirations of certain people in Indonesia about legal issues relating to government and other issues. One of the aspirations conveyed through social media is a hash that is widely seen by one of the Sjakhyakirti University from the use of social media. Then there arises a lot of sentiment from every community, there are those that give positive sentiments and also negative sentiments that can have a good or bad impact on daily life. days in the community. Some reasons for positive and negative sentiments sourced from this social media, will use social media. From this debate the researchers found a solution where this hashtag can provide good results for the general public or vice versa. In analyzing this, the researcher uses the Naïve Bayes Classifier method which is one of the machine learning methods that uses calculations, the classification of automated hashes can help minimize personal misclassification by obtaining positive or negative sentiment information by using data mining that is carried out by using tools that execute the tools that execute data mining operations that have been determined based on the analysis of models of hidden data on big data thus outlining the discovery of knowledge about Sjakhyakirti University.</em></p><p><strong><em>Keywords </em></strong><strong><em>:</em></strong><strong><em> </em></strong><em>Social</em><em> </em><em>Media, Sjakhyakirti, Naïve Bayes Classifie</em></p><p class="SammaryHeader" align="center"><strong>Abstrak</strong></p><p><em>Pemanfaatan sosial media </em><em>saat </em><em>ini tidak hanya untuk berkomunikasi antara teman saja, akan tetapi sering juga dijadikan sebuah sarana untuk menyampaikan suatu aspirasi bagi masyarakat khususnya masyarakat indonesia mengenai masalah hukum ataupun masalah yang berhubungan dengan pemerintahan</em><em> serta masalah lainnnya</em><em>. Salah satu aspirasi yang disampaikan melalui sosial media ini adalah sebuah hastag yang banyak dilihat setiap harinya </em><em>salah satunya </em><em>mengenai </em><em>Universitas Sjakhyakirti </em><em>dari </em><em>pemanfaaat sosial media </em><em>ini </em><em>maka </em><em>munculah banyak sentimen dari setiap masyarakat, ada yang memberikan sentimen positif dan juga sentimen negatif mengenai tanggapan terhadap hastag tersebut yang dapat berdampak baik atau buruk bagi kehidupan sehari-hari dimasyarakat.</em><em> B</em><em>eberapa alasan sentimen posit</em><em>i</em><em>f</em><em> </em><em>dan negatif yang bersumber dari sosial media ini</em><em>, </em><em>akan memanfaatkan sosial media</em><em>. Dari </em><em>permasalahan ini peneliti menghasilkan sebuah solusi dimana hastag tersebut apakah dapat memberikan dampak yang baik bagi masyarakat umumumnya ataupun sebaliknya. Dalam menganalisa ini, peneliti menggunakan metode Naïve Bayes Classifier yang merupakan salah satu metode machine learning yang menggunakan perhitungan probabilitas, pengklasifikasian hastag otomatis ini dapat disesuaikan sehingga meminimalisasi aksi salah pengklasifikasian secara personal dengan memproleh informasi sentimen positif atau negative</em><em> dengan menggunakan data mining yang dilakukan dengan tool weka yang mengeksekusi operasi data mining yang telah didefinisikan berdasarkan model analisis dari data tersembunyi pada sejumlah data besar sehingga menguraikan penemuan pengetahuan mengenai Universitas Sjakhyakirti.</em></p><strong><em>Kata kunci : </em></strong><em>Sosial Media, Sjakhyakirti, Naïve Bayes Classifie</em>


2020 ◽  
Vol 8 (5) ◽  
pp. 2488-2493

The technological advancement can help the entire application field to predict the damage and to forecast the future target of the object. The wealth of the world is in the health of the people. So the technology must support the technologists in predicting the disease in advance. The machine learning is the emerging field which is used to forecast the existence of the heart disease through the values of the clinical parameters. With this view, we focus on predicting the customer churn for the banking application. This paper uses the customer churn bank modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data is processed to find the relationship between the elements of the dataset. Second, the data set is applied for Ada Boost regressors and the important elements are identified. Third, the dataset is applied to feature scaling and then fitted to kernel support vector machine, logistic regression classifier, Naive bayes classifier, random forest classifier, decision tree classifier and KNN classifier. Fourth, the dataset is dimensionality reduced with principal component analysis with five components and then applied to the previously mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the indication metrics like precision, accuracy, recall and Fscore. The implementation is carried out with python code using Anaconda Navigator. Experimental results show that, the Naïve bayes classifier is more effective with the precision of 0.90 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the recall of 0.91 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the Fscore of 0.92 for dataset with random boost, feature scaled and PCA. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 91% without random boost, 93% with random boosting and 92% with principal component analysis.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2021 ◽  
Vol 40 (5) ◽  
pp. 9361-9382 ◽  
Author(s):  
Naeem Iqbal ◽  
Rashid Ahmad ◽  
Faisal Jamil ◽  
Do-Hyeun Kim

Quality prediction plays an essential role in the business outcome of the product. Due to the business interest of the concept, it has extensively been studied in the last few years. Advancement in machine learning (ML) techniques and with the advent of robust and sophisticated ML algorithms, it is required to analyze the factors influencing the success of the movies. This paper presents a hybrid features prediction model based on pre-released and social media data features using multiple ML techniques to predict the quality of the pre-released movies for effective business resource planning. This study aims to integrate pre-released and social media data features to form a hybrid features-based movie quality prediction (MQP) model. The proposed model comprises of two different experimental models; (i) predict movies quality using the original set of features and (ii) develop a subset of features based on principle component analysis technique to predict movies success class. This work employ and implement different ML-based classification models, such as Decision Tree (DT), Support Vector Machines with the linear and quadratic kernel (L-SVM and Q-SVM), Logistic Regression (LR), Bagged Tree (BT) and Boosted Tree (BOT), to predict the quality of the movies. Different performance measures are utilized to evaluate the performance of the proposed ML-based classification models, such as Accuracy (AC), Precision (PR), Recall (RE), and F-Measure (FM). The experimental results reveal that BT and BOT classifiers performed accurately and produced high accuracy compared to other classifiers, such as DT, LR, LSVM, and Q-SVM. The BT and BOT classifiers achieved an accuracy of 90.1% and 89.7%, which shows an efficiency of the proposed MQP model compared to other state-of-art- techniques. The proposed work is also compared with existing prediction models, and experimental results indicate that the proposed MQP model performed slightly better compared to other models. The experimental results will help the movies industry to formulate business resources effectively, such as investment, number of screens, and release date planning, etc.


Sign in / Sign up

Export Citation Format

Share Document