Rule Extraction from Privacy Preserving Neural Network: Application to Banking

2011 ◽  
Vol 403-408 ◽  
pp. 920-928 ◽  
Author(s):  
Nekuri Naveen ◽  
V. Ravi ◽  
C. Raghavendra Rao

In the last two decades in areas like banking, finance and medical research privacy policies restrict the data owners to share the data for data mining purpose. This issue throws up a new area of research namely privacy preserving data mining. In this paper, we proposed a privacy preservation method by employing Particle Swarm Optimization (PSO) trained Auto Associative Neural Network (PSOAANN). The modified (privacy preserved) input values are fed to a decision tree (DT) and a rule induction algorithm viz., Ripper for rule extraction purpose. The performance of the hybrid is tested on four benchmark and bankruptcy datasets using 10-fold cross validation. The results are compared with those obtained using the original datasets where privacy is not preserved. The proposed hybrid approach achieved good results in all datasets.

Author(s):  
Panny Agustia Rahayuningsih

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).


2021 ◽  
Vol 56 (4) ◽  
pp. 220-240
Author(s):  
Shimaa Ouf ◽  
Ahmed I. B. ElSeddawy

The data mining techniques-based systems could have a crucial impact on the employees’ lifestyle to predict heart diseases. There are many scientific papers, which use the techniques of data mining to predict heart diseases. However, limited scientific papers have addressed the four cross-validation techniques of splitting the data set that plays an important role in selecting the best technique for predicting heart disease. It is important to choose the optimal combination between the cross-validation techniques and the data mining, classification techniques that can enhance the performance of the prediction models. This paper aims to apply the four-cross-validation techniques (holdout, k-fold cross-validation, stratified k fold cross-validation, and repeated random) with the eight data mining, classification techniques (Linear Discriminant Analysis, Logistic regression, Support Vector Model, KNN, Decision Tree, Naïve Bayes, Random Forest, and Neural Network) to improve the accuracy of heart disease prediction and select the best prediction models. It analyzes these techniques on a small and large dataset collected from different data sources like Kaggle and the UCI machine-learning repository. The evaluation metrics like accuracy, precision, recall, and F-measure were used to measure the performance of prediction models. Experimentation is performed on two datasets, and the results show that when the dataset is colossal (70000 records), the optimal combination that achieves the highest accuracy is holdout cross-validation with the neural network with an accuracy of 71.82%. At the same time, Repeated Random with Random Forest considers the optimal combination in a small dataset (303 records) with an accuracy of 89.01%. The best models will be recommended to the physicians in business organizations to help them predicting heart disease in employees into one of two categories, cardiac and non-cardiac, at an early stage. The early detection of heart diseases in employees will improve productivity in the business organization.


Respati ◽  
2020 ◽  
Vol 15 (1) ◽  
pp. 30
Author(s):  
Nahrowi Hamdani ◽  
Arief Setyanto ◽  
Sudarmawan Sudarmawan

INTISARIPenelitian ini didasari pada keinginan memanfaatkan informasi akademis mahasiswa yang tinggal di asrama yang memiliki pendidikan karakter dengan program pembelajaraan milik Universitas Muhammadiyah Yogyakarta yang disediakan untuk sebagian mahasiswanya. Hubungan antara pembinaan di asrama mahasiswa dengan prestasi di kampus belum pernah diteliti secara khusus. Penelitian sebelumnya yang penulis temukan menjelasakan hubungan antara nilai di kampus dan kelulusannya. Adanya visi asrama yang salah satunya adalah prestasi studi juga tersedianya data Nilai pendaftaran hingga raport hasil pembelajaran di Asrama serta data kelulusan di kampus, sehingga penulis ingin melihat apakah mahasiswa asrama dapat lulus tepat waktu di kampus, dibutuhkan data mining untuk memprediksi, dipilihlah algoritma Regresi Logisitic dan Neural Network. Dari hasil pengolahan data angkatan tahun 2014-2015 yang digunakan untuk training dan testing, didapatkan hasil dari 5x iterasi k-fold cross validation untuk Regresi Logistic dengan akurasi 65 % dan Neural Network 69%. Dengan begitu algoritma Neural network cendrung lebih baik Regresi Logistic. Kata kunci — data mining, kelulusan, klasifikasi, neural network, prediksi, regresi logistic ABSTRACTThis research is based on the desire to utilize the academic information of students living in dormitories who have character education with the learning program of the University of Muhammadiyah Yogyakarta provided for some of its students. The relationship between development in student dormitories with achievements on campus has not been specifically examined. Previous research that the authors found explained the relationship between grades on campus and graduation. The existence of a dormitory vision, one of which is the achievement of the study as well as the availability of data Registration value to report cards of learning outcomes at the Dormitory as well as graduation data on campus, so the writer wants to see whether boarding students can graduate on time on campus, data mining is needed to predict, chosen Logistic Regression algorithm and Neural Network. From the results of the 2014-2015 batch data processing used for training and testing, the results of 5 times the k-fold cross validation iteration for Logistic Regression with an accuracy of 65% and a 69% Neural Network. Thus the Neural network algorithm tends to be better than Logistic Regression. Keywords —  data mining, graduation, klasification, neural nework, prediction, regresi logistic.


2016 ◽  
Vol 7 (2) ◽  
pp. 75-80
Author(s):  
Adhi Kusnadi ◽  
Risyad Ananda Putra

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability


2019 ◽  
Vol 23 (1) ◽  
pp. 67-77 ◽  
Author(s):  
Yao Yevenyo Ziggah ◽  
Hu Youjian ◽  
Alfonso Rodrigo Tierra ◽  
Prosper Basommi Laari

The popularity of Artificial Neural Network (ANN) methodology has been growing in a wide variety of areas in geodesy and geospatial sciences. Its ability to perform coordinate transformation between different datums has been well documented in literature. In the application of the ANN methods for the coordinate transformation, only the train-test (hold-out cross-validation) approach has usually been used to evaluate their performance. Here, the data set is divided into two disjoint subsets thus, training (model building) and testing (model validation) respectively. However, one major drawback in the hold-out cross-validation procedure is inappropriate data partitioning. Improper split of the data could lead to a high variance and bias in the results generated. Besides, in a sparse dataset situation, the hold-out cross-validation is not suitable. For these reasons, the K-fold cross-validation approach has been recommended. Consequently, this study, for the first time, explored the potential of using K-fold cross-validation method in the performance assessment of radial basis function neural network and Bursa-Wolf model under data-insufficient situation in Ghana geodetic reference network. The statistical analysis of the results revealed that incorrect data partition could lead to a false reportage on the predictive performance of the transformation model. The findings revealed that the RBFNN and Bursa-Wolf model produced a transformation accuracy of 0.229 m and 0.469 m, respectively. It was also realised that a maximum horizontal error of 0.881 m and 2.131 m was given by the RBFNN and Bursa-Wolf. The obtained results per the cadastral surveying and plan production requirement set by the Ghana Survey and Mapping Division are applicable. This study will contribute to the usage of K-fold cross-validation approach in developing countries having the same sparse dataset situation like Ghana as well as in the geodetic sciences where ANN users seldom apply the statistical resampling technique.


2020 ◽  
Vol 10 (6) ◽  
pp. 1999 ◽  
Author(s):  
Milica M. Badža ◽  
Marko Č. Barjaktarović

The classification of brain tumors is performed by biopsy, which is not usually conducted before definitive brain surgery. The improvement of technology and machine learning can help radiologists in tumor diagnostics without invasive measures. A machine-learning algorithm that has achieved substantial results in image segmentation and classification is the convolutional neural network (CNN). We present a new CNN architecture for brain tumor classification of three tumor types. The developed network is simpler than already-existing pre-trained networks, and it was tested on T1-weighted contrast-enhanced magnetic resonance images. The performance of the network was evaluated using four approaches: combinations of two 10-fold cross-validation methods and two databases. The generalization capability of the network was tested with one of the 10-fold methods, subject-wise cross-validation, and the improvement was tested by using an augmented image database. The best result for the 10-fold cross-validation method was obtained for the record-wise cross-validation for the augmented data set, and, in that case, the accuracy was 96.56%. With good generalization capability and good execution speed, the new developed CNN architecture could be used as an effective decision-support tool for radiologists in medical diagnostics.


Sign in / Sign up

Export Citation Format

Share Document