Analysis for Clients Churn of Credit Cards in Model Construction in Banking Industry

Data mining technology has been more and more important in the economics and financial market. Helping the banks to predict a customers’ behavior, which is that whether the existing customers will continue use their credit cards or not, we utilize the data mining technology to construct a convenient and effective model, Decision Tree. By using our Decision Tree model, which can classify the customers according to different features step by step, the banks are able to predict the customers’ behavior well. The main steps of our experiment includes collecting statistics from the bank, utilizing Min-Max normalization to preprocess the data set, employing the training data set to construct our model, examining the model by testing data set, and analyzing the results.

Download Full-text

Applying Data Mining Techniques on Continuous Sensed Data for Daily Living Activity Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.738-739.191 ◽

2015 ◽

Vol 738-739 ◽

pp. 191-196

Author(s):

Yun Jie Li ◽

Hui Song

Keyword(s):

Neural Network ◽

Data Mining ◽

Decision Tree ◽

Daily Living ◽

Tree Model ◽

Data Set ◽

Data Mining Techniques ◽

Daily Living Activity ◽

The Neural Network ◽

Better Than

In this paper, several data mining techniques were discussed and analyzed in order to achieve the objective of human daily activities recognition based on a continuous sensing data set. The data mining techniques of decision tree, Naïve Bayes and Neural Network were successfully applied to the data set. The paper also proposed an idea of combining the Neural Network with the Decision Tree, the result shows that it works much better than the typical Neural Network and the typical Decision Tree model.

Download Full-text

AI Testing: Ensuring a Good Data Split Between Data Sets (Training and Test) using K-means Clustering and Decision Tree Analysis

International Journal on Soft Computing ◽

10.5121/ijsc.2021.12101 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-11

Author(s):

Kishore Sugali ◽

Chris Sprunger ◽

Venkata N Inukollu

Keyword(s):

Decision Tree ◽

Software Testing ◽

Training Data ◽

Data Sets ◽

Full Data ◽

Data Set ◽

Full Dataset ◽

Development Methodology ◽

Testing Data ◽

Long Time

Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.

Download Full-text

Machine Learning-Based Detection for Cyber Security Attacks on Connected and Autonomous Vehicles

Mathematics ◽

10.3390/math8081311 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1311

Author(s):

Qiyi He ◽

Xiaolin Meng ◽

Rong Qu ◽

Ruijie Xi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Cyber Security ◽

Unified Modeling Language ◽

Attack Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Cyber Attack ◽

Tree Model ◽

Data Set

Connected and Autonomous Vehicle (CAV)-related initiatives have become some of the fastest expanding in recent years, and have started to affect the daily lives of people. More and more companies and research organizations have announced their initiatives, and some have started CAV road trials. Governments around the world have also introduced policies to support and accelerate the deployments of CAVs. Along these, issues such as CAV cyber security have become predominant, forming an essential part of the complications of CAV deployment. There is, however, no universally agreed upon or recognized framework for CAV cyber security. In this paper, following the UK CAV cyber security principles, we propose a UML (Unified Modeling Language)-based CAV cyber security framework, and based on which we classify the potential vulnerabilities of CAV systems. With this framework, a new CAV communication cyber-attack data set (named CAV-KDD) is generated based on the widely tested benchmark data set KDD99. This data set focuses on the communication-based CAV cyber-attacks. Two classification models are developed, using two machine learning algorithms, namely Decision Tree and Naive Bayes, based on the CAV-KDD training data set. The accuracy, precision and runtime of these two models when identifying each type of communication-based attacks are compared and analysed. It is found that the Decision Tree model requires a shorter runtime, and is more appropriate for CAV communication attack detection.

Download Full-text

Data Mining Implementation Using Naïve Bayes Algorithm and Decision Tree J48 In Determining Concentration Selection

International Journal of Quantitative Research and Modeling ◽

10.46336/ijqrm.v1i3.72 ◽

2020 ◽

Vol 1 (3) ◽

pp. 123-134

Author(s):

Budiman Budiman ◽

Reni Nursyanti ◽

R Yadi Rakhman Alamsyah ◽

Imannudin Akbar

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Study Program ◽

Data Set ◽

Lower Accuracy ◽

Accuracy Result ◽

Bayes Algorithm

Computerization of society has substantially improved the ability to generate and collect data from a variety of sources. A large amount of data has flooded almost every aspect of people's lives. AMIK HASS Bandung has an Informatic Management Study Program consisting of three areas of concentration that can be selected by students in the fourth semester including Computerized Accounting, Computer Administration, and Multimedia. The determination of concentration selection should be precise based on past data, so the academic section must have a pattern or rule to predict concentration selection. In this work, the data mining techniques were using Naive Bayes and Decision Tree J48 using WEKA tools. The data set used in this study was 111 with a split test percentage mode of 75% used as training data as the model formation and 25% as test data to be tested against both models that had been established. The highest accuracy result obtained on Naive Bayes which is obtaining a 71.4% score consisting of 20 instances that were properly clarified from 28 training data. While Decision Tree J48 has a lower accuracy of 64.3% consisting of 18 instances that are properly clarified from 28 training data. In Decision Tree J48 there are 4 patterns or rules formed to determine concentration selection so that the academic section can assist students in determining concentration selection.

Download Full-text

Research on college English teaching based on data mining technology

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-02071-6 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Jinhui Duan ◽

Rui Gao

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Conversion ◽

Tree Model ◽

Mining Technology ◽

Processing Technologies ◽

College English ◽

English Teaching ◽

School Teaching

AbstractTo improve the efficiency and quality of college English teaching, we analyzed the feasibility and application process of data mining technology in college English teaching. The entire process of data classification mining was fully realized. A new teaching program was proposed. The object and target of data mining were determined. Online surveys were used to collect data. Data integration, data cleaning, data conversion, data reduction and other pre-processing technologies were adopted. The decision tree was generated by using the C4.5 algorithm, and the pruning was carried out. The result analysis decision tree model was completed. A detailed survey of the students' English learning in University was made in detail. The results showed that the qualified rate of students' English performance was increased from 20–30% to 50–60%. Therefore, the classification rules provide theoretical support for the school teaching decision. This method can improve the quality of English teaching.

Download Full-text

A Spheriform Quantization Method Based on Sub-Region Inherent Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.4244 ◽

2014 ◽

Vol 556-562 ◽

pp. 4244-4247

Author(s):

Zhen Duo Wang ◽

Jing Wang ◽

Huan Wang

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Real Data ◽

Data Driven ◽

Data Sets ◽

Quantization Method ◽

Data Set ◽

Testing Data ◽

C4.5 Decision Tree

Quantization methods are very significant mining task for presenting some operations, i.e., learning and classification in machine learning and data mining since many mining and learning methods in such fields require that the testing data set must include the partitioned features. In this paper, we propose a spheriform quantization method based on sub-region inherent dimension, which induces the quantified interval number and size in data-driven way. The method assumes that a quantified cluster of points can be contained in a lower intrinsicm-dimensional spheriform space of expected radius. These sample points in the spheriform can be obtained by adaptively selecting the neighborhood at initial observation based on sub-region inherent dimension. Experimental results and analysis on UCI real data sets demonstrate that our method significantly enhances the accuracy of classification than traditional quantization methods by implementing C4.5 decision tree.

Download Full-text

Comparison of Naive Bayes Method, K-NN (K-Nearest Neighbor) and Decision Tree for Predicting the Graduation of ‘Aisyiyah University Students of Yogyakarta

International Journal of Health Science and Technology ◽

10.31101/ijhst.v2i1.1829 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Tikaridha Hardiani

Keyword(s):

Data Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Classification Technique ◽

Testing Data ◽

Student Graduation ◽

The University

The students of Universitas ‘Aisyiyah Yogyakarta have been increasing including the number of students in the Faculty of Health Sciences. In 2016 the total number of UNISA students was 1851. The increasing number of students every year leads to great numbers of data stored in the university database. The data provide useful information for the university to predict student graduation or student study period whether they graduate on time with a study period of 4 years or late with a study period of more than 4 years. This can be processed by using a data mining technique that is the classification technique. Data needed in the classification technique are data of students who have graduated as training data and data of students who are still studying in the university as testing data. The training data were 501 records with 10 goals and the testing data were 428 records. Data mining process method used was the Cross-Industry Standard Prosses for Data Mining (CRISPDM). The algorithms used in this study were Naive Bayes, K-Nearest Neighbor (KNN) and Decision Tree. The three algorithms were compared to see the accuracy by using Rapidminer software. Based on the accuracy, it was found that the K-NN algorithm was the best in predicting student graduation with an accuracy of 91.82%. The K-NN algorithm showed that 100% of the students of Nursing study program of Universitas Aisyiyah Yogyakarta are predicted to graduate on time.

Download Full-text

Data Mining Based Intelligent System for Voting Behavior Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3070 ◽

2013 ◽

Vol 284-287 ◽

pp. 3070-3073

Author(s):

Duen Kai Chen

Keyword(s):

Data Mining ◽

Behavior Analysis ◽

Voting Behavior ◽

Intelligent System ◽

Data Sets ◽

Tree Model ◽

Mining Technology ◽

Identification Rate ◽

Voter Identification ◽

Election Studies

In this study, we report a voting behavior analysis intelligent system based on data mining technology. From previous literature, we have witnessed increasing number of studies applied information technology to facilitate voting behavior analysis. In this study, we built a likely voter identification model through the use of data mining technology, the classification algorithm used here constructs decision tree model to identify voters and non voters. This model is evaluated by its accuracy and number of attributes used to correctly identify likely voter. Our goal is to try to use just a small number of survey questions while maintaining the accuracy rates of other similar models. This model was built and tested on Taiwan’s Election and Democratization Study (TEDS) data sets. According to the experimental results, the proposed model can improve likely voter identification rate and this finding is consistent with previous studies based on American National Election Studies.

Download Full-text

Predicting customer churn in mobile industry using data mining technology

Industrial Management & Data Systems ◽

10.1108/imds-12-2015-0509 ◽

2017 ◽

Vol 117 (1) ◽

pp. 90-109 ◽

Cited By ~ 16

Author(s):

Eui-Bang Lee ◽

Jinwha Kim ◽

Sang-Gun Lee

Keyword(s):

Data Mining ◽

Decision Tree ◽

Online News ◽

Partial Least Square ◽

Churn Prediction ◽

Mining Technology ◽

Content Type ◽

Customer Churn ◽

Mobile Industry ◽

Using Data

Purpose The purpose of this paper is to identify the influence of the frequency of word exposure on online news based on the availability heuristic concept. So that this is different from most churn prediction studies that focus on subscriber data. Design/methodology/approach This study examined the churn prediction through words presented the previous studies and additionally identified words what churn generate using data mining technology in combination with logistic regression, decision tree graphing, neural network models, and a partial least square (PLS) model. Findings This study found prediction rates similar to those delivered by subscriber data-based analyses. In addition, because previous studies do not clearly suggest the effects of the factors, this study uses decision tree graphing and PLS modeling to identify which words deliver positive or negative influences. Originality/value These findings imply an expansion of churn prediction, advertising effect, and various psychological studies. It also proposes concrete ideas to advance the competitive advantage of companies, which not only helps corporate development, but also improves industry-wide efficiency.

Download Full-text

Private Evaluation of Decision Trees using Sublinear Cost

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0015 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 266-286 ◽

Cited By ~ 6

Author(s):

Anselme Tueno ◽

Florian Kerschbaum ◽

Stefan Katzenbeisser

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Main Idea ◽

Computation Time ◽

Tree Model ◽

Real World Data ◽

Data Set ◽

Attribute Vector ◽

Garbled Circuits ◽

Large Trees

Abstract Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1[triangleright]2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.

Download Full-text