scholarly journals Comparative Study of Classification Method on Customer Candidate Data to Predict its Potential Risk

Author(s):  
Mujiono Sadikin ◽  
Fahri Alfiandi

Leasing vehicles are a company engaged in the field of vehicle loans. Purchase by way of credit becomes a mainstay because it can attract potential customers to generate more profit. But if there is a mistake in approving a customer candidate, the risk of stalled credit payments can happen. To minimize the risk, it can be applied the certain data mining technique to predict the future behavior of the customers. In this study, it is explored in some data mining techniques such as C4.5 and Naive Bayes for this purpose. The customer attributes used in this study are: salary, age, marital status, other installments and worthiness. The experiments are performed by using the Weka software. Based on evaluation criteria, i.e. accuracy, C4.5 algorithm outperforms compared to Naive Bayes. The percentage split experiment scenarios provide the precision value of 89.16% and the accuracy value of 83.33% wheres the cross validation experiment scenarios give the higher accuracy values of all used k-fold. The C4.5 experiment results also confirm that the most influential instant data attribute in this research is the salary.

2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Diah Puspitasari ◽  
Syifa Sintia Al Khautsar ◽  
Wida Prima Mustika

Cooperatives are a forum that can help people, especially small and medium-sized communities. Cooperatives play an important role in the economic growth of the community such as the price of basic commodities which are relatively cheap and there are also cooperatives that offer borrowing and storing money for the community. Constraints that have been felt by this cooperative are that borrowers find it difficult to repay loan installments, causing bad credit. Because the cooperative in conducting credit analysis is carried out in a personal manner, namely by filling out the loan application form along with the requirements and conducting a field survey. Therefore there is a need for an evaluation to be carried out in lending to borrowers. To minimize these problems, it is necessary to detect customer criteria that are used to predict bad loans and to determine whether or not the elites are eligible to take credit using data mining. The data mining technique used is classification with the Naive Bayes method. Based on testing the accuracy of the resulting model obtained accuracy level of 59%, sensitivity (True Positive Rate (TP Rate) or Recall) of 46.80%, specificity (False Negative Rate (FN Rate or Precision) of 69.81%, Positive Predictive Value (PPV) of 57.89%, and Negative Predictive Value (NPV) of 59.67%.


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Ni Luh Ratniasih

ABSTRACT<br />Presentation of data to produce information values is often displayed in the form of tabulations. If the data displayed has a small capacity, it may not be difficult to process the information. But if the data presented has a very large capacity, it is feared there are obstacles to absorbing information accurately and quickly. This is because that it takes a long time to read the data displayed in detail until the end of the data. The data to be discussed in this study are data of STMIK STIKOM Bali students. Historical data displayed will be converted into a decision tree. Thus the absorption of information will become easier. This research implements data mining disciplines using the naïve bayes method comparison with C4.5 algorithm which is a method for performing classification techniques and applied with Rapid Miner tools.<br />Keywords : C4.5, KNN, Student Graduation<br />ABSTRAK<br />Penyajian data untuk menghasilkan nilai informasi sering kali ditampilkan dalam bentuk tabulasi. Apabila data yang ditampilkan memiliki kapasitas kecil, mungkin tidak terlalu sulit untuk mencerna kandungan informasi tersebut. Tetapi apabila data yang disajikan memiliki kapasitas yang sangat besar, dikawatirkan adanya kendala untuk menyerap informasi secara tepat dan cepat. Hal ini dikarenakan bahwa dibutuhkan waktu yang cukup lama untuk membaca data yang ditampilkan secara rinci hingga akhir data. Data yang akan dibahas dalam penelitian ini adalah data mahasiswa STMIK STIKOM Bali. Data historis yang ditampilkan akan dikonversi menjadi bentuk pohon keputusan. Dengan demikian penyerapan informasi akan menjadi lebih mudah. Penelitian ini mengimplemen-tasikan disiplin ilmu data mining menggunakan komparasi metode naïve bayes dengan algoritma C4.5 yang merupakan sebuah metode untuk melakukan teknik klasifikasi serta diaplikasikan dengan tools Rapid Miner.<br />Kata kunci : C4.5, KNN, Kelulusan Mahasiswa


Tech-E ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 44
Author(s):  
Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.


In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.


2010 ◽  
Vol 5 (1-2) ◽  
pp. 229-233
Author(s):  
György Hampel ◽  
Zoltán Fabulya ◽  
Elemérné Nagy

Using a simple data mining technique, the Analyze Key Influencers, in Excel 2007 Data Mining Add-ins, we searched for relationship among the seat (county and town), the form of business, the main activity, the number of employees and the annual income of the Hungarian companies. This technique uses the Naive Bayes algorithm. According to the used method the seat has no influencers. Most of the main activities have no influencers, but some activities (82 out of 495) have relationship with the other criteria, mainly with the form of business. The form of business (all 30 categories), the number of employees (17 of 18 categories) and the annual income (all 9 categories) are each others key influencers. Cramer's association was used to check the results of the data mining. The Cramer contin-gency coefficient showed similar results as the data mining, but the results also indicated that the strength of the association was less than moderate in all cases. The highest associa-tion were between the annual income and the number of employees (0.46, moderate asso-ciation), the main activity and form of business (0.36, moderate association) and the annual income and the form of business (0.27, low association).


2021 ◽  
Vol 1933 (1) ◽  
pp. 012034
Author(s):  
Siti Aisyah ◽  
Preddy Marpaung ◽  
Wiwin Aprinai ◽  
Komda Saharja ◽  
I Made Yuda Suryawan ◽  
...  

Author(s):  
Andri Wijaya ◽  
Abba Suganda Girsang

This  article  discusses  the  analysis  of  customer  loyalty  using  three  data  mining  methods:  C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world  empirical  data.  The  data  contain  ten  attributes related to the customer loyalty and are obtained from a national  multimedia  company  in  Indonesia.  The  dataset contains 2269 records. The study also evaluates the effects of  the  size  of  the  training  data  to  the  accuracy  of  the classification.  The  results  suggest  that  C4.5  algorithm produces   highest classification   accuracy   at   the   order of  81%  followed  by  the  methods  of  Naive  Bayes  76% and  Nearest  Neighbor  55%.  In  addition,  the  numerical evaluation  also  suggests  that  the  proportion  of  80%  is optimal  for  the  training  set.


Author(s):  
Hakam Febtadianrano Putro ◽  
Retno Tri Vulandari ◽  
Wawan Laksito Yuly Saptomo

Business location plays an important role in sales. The business location in cities makes the seller easier to distribute activities for people. Distribution activities are closely related to sales activities. If there is a sales transaction, a classification of potential and non-potential customers will be required. One method that can be used for classification is mining data. One of the most frequently used data mining for classification is the Naive Bayes method. The attributes used in the customer classification process are purchase amount, time interval, and location. The result of the classification system is 23 true reactions and 2 false reactions. Based on the results are using the confusion matrix method, it shows that the accuracy value reaches 92%, the precision value reaches 100%, the recall value reaches 91%.Keywords: Trading Business, Customer Classification, Naive Bayes, Confusion Matrix


Sign in / Sign up

Export Citation Format

Share Document