Financial fraud detection using naive bayes algorithm in highly imbalance data set

2021 ◽  
Vol 24 (5) ◽  
pp. 1559-1572
Author(s):  
Amit Gupta ◽  
M. C. Lohani ◽  
Mahesh Manchanda
2018 ◽  
Vol 246 ◽  
pp. 03027
Author(s):  
Manfu Ma ◽  
Wei Deng ◽  
Hongtong Liu ◽  
Xinmiao Yun

Due to using the single classification algorithm can not meet the performance requirements of intrusion detection, combined with the numerical value of KNN and the advantage of naive Bayes in the structure of data, an intrusion detection model KNN-NB based on KNN and Naive Bayes hybrid classification algorithm is proposed. The model first preprocesses the NSL-KDD intrusion detection data set. And then by exploiting the advantages of KNN algorithm in data values, the model calculates the distance between the samples according to the feature items and selects the K sample data with the smallest distance. Finally, by naive Bayes to get the final result. The experimental results on the NSL-KDD dataset show that the KNN-NB algorithm can meet the requirement of balanced performance than the traditional KNN and Naive Bayes algorithm in term of accuracy, sensitivity, false detection rate, specificity, and missed detection rate.


Tech-E ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 44
Author(s):  
Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.


Author(s):  
Sachin Sabloak ◽  
Jasuandi Wijaya ◽  
Abdul Rahman ◽  
Molavi Arman

[Id]Pentingnya jaringan komputer pada kehidupan sekarang, perlu adanya kestabilan jaringan komputer yang digunakan. Pemantauan kualitas jaringan internet didalam sebuah jaringan LAN dilakukan network administrator untuk mendapatkan nilai dari data yang didapat, penelitian ini menerapkan algoritma Naive Bayes menggunakan dataset TIPHON dengan parameter yang terdapat dalam metode QoS yaitu delay, packetloss dan jitter untuk memonitor kualitas jaringan internet. Metode QoS akan menghasilkan nilai dari setiap parameter yang dibutuhkan untuk pemantauan jaringan, guna mendapatkan kesimpulan mengenai status jaringan internet digunakan Algoritma Naive Bayes. Metode Quality of Service (QoS) merupakan sebuah metode yang digunakan dalam mendefinisikan kemampuan suatu jaringan yang ?digunakan untuk pengukuran tentang kualitas ?jaringan. Penggunaan algoritma Naive Bayes diperlukan karena algoritma tersebut digunakan dalam pengklasifikasian yang menggunakan probabilitas dan statistik serta mampu mengambil keputusan dengan menggunakan dataset yang telah disediakan. Tujuan penelitian ini dilakukan untuk mengetahui status jaringan internet di lab komputer STMIK Global Informatika MDP serta mengetahui tingkat akurasi dari algoritma Naive Bayes untuk mengklasifikasikan status jaringan internet. Pengujian penelitian dilakukan di lab komputer STMIK Global Informatika MDP. Hasil pengujian dalam penelitian ini menunjukkan bahwa akurasi Naive Bayes yang didapatkan sebesar 87,78% dan status jaringan internet di lab komputer STMIK Global Informatika MDP masuk ke dalam kategori memuaskan dengan nilai dominan yaitu sebesar 47,78%.Kata Kunci: Naive Bayes, network administrator, Quality of Service (QoS), status jaringan internet.[En]Since computer network is very important nowadays, it needs the stability of the network used. Monitoring the quality of the internet network in LAN is conducted by an administrator to get the value of the data obtained. This research applied Naive Bayes algorithm using TIPHON data set with parameters in QoS method; delay, packetloss and jitter, to monitor the quality of the internet network. QoS method will gain value in every parameter needed for network monitoring. To get a conclusion about the status of the internet network, Naive Bayes algorithm was used. Quality of Service (QoS) method is a method used to define the ability of a network to measure its quality. Naive Bayes algorithm is needed since the algorithm is used in classifying using probability and statistic as well as making decision using dataset provided. This research is conducted to see the status of the internet network in STMIK Global Informatika MDP computer laboratory and to know the level of accuracy of Naive Bayes algorithm to classify the status of the network. The research was conducted in STMIK Global Informatika MDP computer laboratory. The result of the research showed that the accuracy of Naive Bayes was 87,78% and the status of the internet network STMIK Global Informatika MDP was in the category of satisfactory with dominant value 47,78%.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2684-2687 ◽  

The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set


Author(s):  
Delisman Laia ◽  
Efori Buulolo ◽  
Matias Julyus Fika Sirait

PT. Go-Jek Indonesia is a service company. Go-jek online is a technology-based motorcycle taxi service that leads the transportation industry revolution. Predictions on ordering go-jek drivers using data mining algorithms are used to solve problems faced by the company PT. Go-Jek Indonesia to predict the level of ordering of online go-to drivers. In determining the crowded and lonely time. The proposed method is Naive Bayes. Naive Bayes algorithm aims to classify data in certain classes. The purpose of this study is to look at the prediction patterns of each of the attributes contained in the data set by using the naive algorithm and testing the training data on testing data to see whether the data pattern is good or not. what will be predicted is to collect the data of the previous driver ordering, which is based on the day, time for one month. The Naive Bayes algorithm is used to predict the ordering of online go-to-go drivers that will be experienced every day by seeing each order such as morning, afternoon and evening. The results of this study are to make it easier for the company to analyze the data of each go-jek driver booking in taking policies to ensure that both drivers and consumers or customers.Keywords: Go-jek Driver, Data Mining, Naive Bayes


2020 ◽  
Vol 7 (1) ◽  
pp. 15
Author(s):  
Fakhriza Firdaus ◽  
Ali Mukhlis

A number of studies about bankruptcy prediction have widely applied the Data Mining technique to find useful knowledge automatically based on an assessment of the management's assessment of the risks that exist in a company. In the process of risk assessment the actual knowledge of experts is still considered an important task because the predictions of experts depend on their effectiveness. This study aims to extract information from qualitative bankruptcy data sets so that they can be used as a useful learning resource for improving the management of a company. The technique used in this study is classification using the Naive Bayes algorithm. Naive Bayes uses probabilistic predictions to classify data.


2020 ◽  
Vol 8 (5) ◽  
pp. 1119-1124

Around 50.9 Million People in India suffer from diabetics and Tamil Nadu stands second in the list of Indian states. The main objective of this paper is to develop prediction modeling of the given medical data of patients with and without diabetics. Through this paper, we aim to create hybrid models that can be easily used by doctors to treat patients with diabetics. Naïve Bayes and Random forest algorithms are used to predict whether a person having diabetics or not, by keeping his health conditions in mind. Thus this process enables doctors to easily group, classify and categorize the disease type accordingly treatment can be given to them. We split the Dataset into 1) Training set and 2) Testing Set and perform analysis on them. The Pima Indian dataset was used to study and analyze the data, alongside with data mining techniques. It is the data obtained from the National Institute for Diabetics patients which contains n number of medical predictor variables and one target variable. Initially, we replace the null values that are there in the dataset with the mean values of the respective columns. We then split the dataset into different ways to perform analysis on them: 85/15, 80/20, 70/30, 60/40. After procuring the data set, we apply Naïve Bayes and Random Forest algorithms on this. The Naïve Bayes algorithm is used here to find the probability of the independent features/columns. The data set is given as an input and the prediction takes place according to the NB Model. The Random Forest algorithm is used here in order to perform feature selection. It takes n inputs from the dataset and builds numerous uncorrelated decision trees during the time of training. It then displays the class that is the mode of all of the class outputs by individual trees.


Author(s):  
Nosiel Nosiel ◽  
Sigit Andriyanto ◽  
Muhammad Said Hasibuan

Mobile phones have become a necessity for everyone. SMS is a communication service that is used to send and receive short messages in the form of text on mobile phones. Among all the advantages of SMS, there is a very annoying activity called spam (unsolicited commercial advertisements). Spam is the continuous use of electronic devices to send messages. called spammers. Spam messages are sent by advertisers with the lowest operating costs. Therefore, there are a lot of spammers and the number of messages requested is huge. Therefore, many aspects are harmed and disturbed. When SMS enters the user's mobile device, this study aims to classify spam and ham SMS. SMS classification adopts naive Bayes method. By looking at the contents of the SMS, the application of the naive Bayes method in data mining can distinguish unwanted SMS from non-spam. Results The classification accuracy rate is 0.999%. Based on the research that I have done, the Naive Bayes method can classify 1000 SMS spam data contained in the SMS spam data set file correctly.


2020 ◽  
Vol 17 (1) ◽  
pp. 9-16
Author(s):  
Yoga Aditama Ika Nanda ◽  
Bety Wulan Sari

We live in a society that still sees problems regarding one's soul and personality as taboo, even though mental health is as important as physical health. A personality disorder itself is a disorder that can be seen from behavior, mindset, and attitude, which brings difficulties to life. Based on this problem, this study applies the method of Naive Bayes classifier as early detection of human personality disorders. Using a data set of 130 correspondences from the AMIKOM university scope with the age limit of 18-25 years and identified personality disorders is a borderline type disorder. The data obtained was 94 with undiagnosed classes and 36 with undiagnosed classes, with the research variables in the form of questionnaire questions as many as 13 questions. The testing process is done with 10 fold and 5 fold cross-validation, and confusion matrix with the results in the form of accurate 10 folds superior with a value of 88.8% compared to 5 folds that is 88.2%, for precision 10 folds superior with 88.7%, but for 5 fold recall superior with 88.3%, while the final results of these two performances in F1-Score, produce the same value, which is 86.1%.


Sign in / Sign up

Export Citation Format

Share Document