Comparative Study of Classification Method on Customer Candidate Data to Predict its Potential Risk

Leasing vehicles are a company engaged in the field of vehicle loans. Purchase by way of credit becomes a mainstay because it can attract potential customers to generate more profit. But if there is a mistake in approving a customer candidate, the risk of stalled credit payments can happen. To minimize the risk, it can be applied the certain data mining technique to predict the future behavior of the customers. In this study, it is explored in some data mining techniques such as C4.5 and Naive Bayes for this purpose. The customer attributes used in this study are: salary, age, marital status, other installments and worthiness. The experiments are performed by using the Weka software. Based on evaluation criteria, i.e. accuracy, C4.5 algorithm outperforms compared to Naive Bayes. The percentage split experiment scenarios provide the precision value of 89.16% and the accuracy value of 83.33% wheres the cross validation experiment scenarios give the higher accuracy values of all used k-fold. The C4.5 experiment results also confirm that the most influential instant data attribute in this research is the salary.

Download Full-text

Algoritma Naïve Bayes Untuk Memprediksi Kredit Macet Pada Koperasi Simpan Pinjam

Jurnal Informatika Upgris ◽

10.26877/jiu.v4i2.2919 ◽

2019 ◽

Vol 4 (2) ◽

Author(s):

Diah Puspitasari ◽

Syifa Sintia Al Khautsar ◽

Wida Prima Mustika

Keyword(s):

Data Mining ◽

Predictive Value ◽

Naive Bayes ◽

False Negative ◽

False Negative Rate ◽

True Positive Rate ◽

Naïve Bayes ◽

Data Mining Technique ◽

Application Form ◽

Using Data

Cooperatives are a forum that can help people, especially small and medium-sized communities. Cooperatives play an important role in the economic growth of the community such as the price of basic commodities which are relatively cheap and there are also cooperatives that offer borrowing and storing money for the community. Constraints that have been felt by this cooperative are that borrowers find it difficult to repay loan installments, causing bad credit. Because the cooperative in conducting credit analysis is carried out in a personal manner, namely by filling out the loan application form along with the requirements and conducting a field survey. Therefore there is a need for an evaluation to be carried out in lending to borrowers. To minimize these problems, it is necessary to detect customer criteria that are used to predict bad loans and to determine whether or not the elites are eligible to take credit using data mining. The data mining technique used is classification with the Naive Bayes method. Based on testing the accuracy of the resulting model obtained accuracy level of 59%, sensitivity (True Positive Rate (TP Rate) or Recall) of 46.80%, specificity (False Negative Rate (FN Rate or Precision) of 69.81%, Positive Predictive Value (PPV) of 57.89%, and Negative Predictive Value (NPV) of 59.67%.

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

OPTIMASI DATA MINING MENGGUNAKAN ALGORITMA NAÏVE BAYES DAN C4.5 UNTUK KLASIFIKASI KELULUSAN MAHASISWA

Jurnal Teknologi Informasi dan Komputer ◽

10.36002/jutik.v5i1.634 ◽

2019 ◽

Vol 5 (1) ◽

Author(s):

Ni Luh Ratniasih

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Method Comparison ◽

Naïve Bayes ◽

Bayes Method ◽

C4.5 Algorithm ◽

Long Time ◽

Student Graduation ◽

Small Capacity ◽

Information Values

ABSTRACT Presentation of data to produce information values is often displayed in the form of tabulations. If the data displayed has a small capacity, it may not be difficult to process the information. But if the data presented has a very large capacity, it is feared there are obstacles to absorbing information accurately and quickly. This is because that it takes a long time to read the data displayed in detail until the end of the data. The data to be discussed in this study are data of STMIK STIKOM Bali students. Historical data displayed will be converted into a decision tree. Thus the absorption of information will become easier. This research implements data mining disciplines using the naïve bayes method comparison with C4.5 algorithm which is a method for performing classification techniques and applied with Rapid Miner tools. Keywords : C4.5, KNN, Student Graduation ABSTRAK Penyajian data untuk menghasilkan nilai informasi sering kali ditampilkan dalam bentuk tabulasi. Apabila data yang ditampilkan memiliki kapasitas kecil, mungkin tidak terlalu sulit untuk mencerna kandungan informasi tersebut. Tetapi apabila data yang disajikan memiliki kapasitas yang sangat besar, dikawatirkan adanya kendala untuk menyerap informasi secara tepat dan cepat. Hal ini dikarenakan bahwa dibutuhkan waktu yang cukup lama untuk membaca data yang ditampilkan secara rinci hingga akhir data. Data yang akan dibahas dalam penelitian ini adalah data mahasiswa STMIK STIKOM Bali. Data historis yang ditampilkan akan dikonversi menjadi bentuk pohon keputusan. Dengan demikian penyerapan informasi akan menjadi lebih mudah. Penelitian ini mengimplemen-tasikan disiplin ilmu data mining menggunakan komparasi metode naïve bayes dengan algoritma C4.5 yang merupakan sebuah metode untuk melakukan teknik klasifikasi serta diaplikasikan dengan tools Rapid Miner. Kata kunci : C4.5, KNN, Kelulusan Mahasiswa

Download Full-text

The Comparison of Data Mining Methods Using C4.5 Algorithm and Naive Bayes in Predicting Heart Disease

Tech-E ◽

10.31253/te.v4i2.543 ◽

2021 ◽

Vol 4 (2) ◽

pp. 44

Author(s):

Rino Rino

Keyword(s):

Data Mining ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

A Value ◽

C4.5 Algorithm ◽

Calculation Results ◽

Mining Methods ◽

Bayes Algorithm

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.

Download Full-text

Performance Analysis Based on Data Mining Technique in Predicting the Diabetic Disease - Decision tree and Naïve Bayes

2019 1st International Conference on Advances in Information Technology (ICAIT) ◽

10.1109/icait47043.2019.8987382 ◽

2019 ◽

Author(s):

Karthikeyan S. M ◽

Gopinath C B ◽

Chethan P. J ◽

Manikanta J

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

An Ingenious Methodology for the Collation of Existing Algorithms for the Prognosis of Student Performance

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2874.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1749-1752

Keyword(s):

Data Mining ◽

Academic Performance ◽

Random Forest ◽

Student Performance ◽

Naive Bayes ◽

Research Work ◽

Large Data ◽

Naïve Bayes ◽

Impact Factors ◽

Data Mining Technique

In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.

Download Full-text

Adatbányászati technikák alkalmazása magyar vállalkozások adatait tartalmazó adatbázison Microsoft Excel 2007-ben

Jelenkori Társadalmi és Gazdasági Folyamatok ◽

10.14232/jtgf.2010.1-2.229-233 ◽

2010 ◽

Vol 5 (1-2) ◽

pp. 229-233

Author(s):

György Hampel ◽

Zoltán Fabulya ◽

Elemérné Nagy

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

The Other ◽

Annual Income ◽

Data Mining Technique ◽

Microsoft Excel ◽

Mining Technique ◽

Bayes Algorithm ◽

Main Activity

Using a simple data mining technique, the Analyze Key Influencers, in Excel 2007 Data Mining Add-ins, we searched for relationship among the seat (county and town), the form of business, the main activity, the number of employees and the annual income of the Hungarian companies. This technique uses the Naive Bayes algorithm. According to the used method the seat has no influencers. Most of the main activities have no influencers, but some activities (82 out of 495) have relationship with the other criteria, mainly with the form of business. The form of business (all 30 categories), the number of employees (17 of 18 categories) and the annual income (all 9 categories) are each others key influencers. Cramer's association was used to check the results of the data mining. The Cramer contin-gency coefficient showed similar results as the data mining, but the results also indicated that the strength of the association was less than moderate in all cases. The highest associa-tion were between the annual income and the number of employees (0.46, moderate asso-ciation), the main activity and form of business (0.36, moderate association) and the annual income and the form of business (0.27, low association).

Download Full-text

Analysis of the effect of the lecturer satisfaction with the Naive Bayes Data Mining technique on institutional performance

Journal of Physics Conference Series ◽

10.1088/1742-6596/1933/1/012034 ◽

2021 ◽

Vol 1933 (1) ◽

pp. 012034

Author(s):

Siti Aisyah ◽

Preddy Marpaung ◽

Wiwin Aprinai ◽

Komda Saharja ◽

I Made Yuda Suryawan ◽

...

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Institutional Performance ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text

Penerapan Metode Naive Bayes Untuk Klasifikasi Pelanggan

Jurnal Teknologi Informasi dan Komunikasi (TIKomSiN) ◽

10.30646/tikomsin.v8i2.500 ◽

2020 ◽

Vol 8 (2) ◽

Author(s):

Hakam Febtadianrano Putro ◽

Retno Tri Vulandari ◽

Wawan Laksito Yuly Saptomo

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Time Interval ◽

Bayes Method ◽

Business Location ◽

Customer Classification ◽

Potential Customers

Business location plays an important role in sales. The business location in cities makes the seller easier to distribute activities for people. Distribution activities are closely related to sales activities. If there is a sales transaction, a classification of potential and non-potential customers will be required. One method that can be used for classification is mining data. One of the most frequently used data mining for classification is the Naive Bayes method. The attributes used in the customer classification process are purchase amount, time interval, and location. The result of the classification system is 23 true reactions and 2 false reactions. Based on the results are using the confusion matrix method, it shows that the accuracy value reaches 92%, the precision value reaches 100%, the recall value reaches 91%.Keywords: Trading Business, Customer Classification, Naive Bayes, Confusion Matrix

Download Full-text