scholarly journals Data Mining Probabilistic Classifiers for Extracting Knowledge from Maternal Health Datasets

Data Mining is an important sub-process of Knowledge Discovery in Databases (KDD) or Knowledge Discovery Process (KDP) methodology that is mainly used for applying various data mining techniques and algorithms on the target data. In this research paper, the authors have made an attempt to discover knowledge by classifying the maternal healthcare data of Jammu and Kashmir State of India (now declared as Union Territory by the Government of India). The data for the present research work was collected from a web portal named as Health Management Information System (HMIS) facilitated by Ministry of Health and Family Welfare (MoHFW), Government of India. The data consists of diverse health parameters pertaining to the maternal health of women and for this study, the maternal healthcare data of all districts of Jammu and Kashmir State was considered. Two data mining classifiers viz. Bayesian TAN and Naïve Bayes were applied for classifying the districts of Jammu and Kashmir State into High MMR and Low MMR districts based on the available past data from 2014 to 2018. Additionally, evaluation measures viz. Accuracy, F-measure, Area under the Curve (AUC), and Gini have been used to evaluate the performance of the models developed by Bayesian TAN and Naïve Bayes.

2019 ◽  
Vol 8 (4) ◽  
pp. 3335-3343

Knowledge Discovery in Databases (KDD) is a splendid methodology of discovering knowledge from gigantic databases by using its various stages viz. Data Selection, Data Preprocessing, Data Transformation, Data Mining and Interpretation/Evaluation. Data Mining is a vital sub-process of KDD methodology that is particularly used to apply the various mining algorithms on the data. In the present research paper, the authors have made an attempt to discover new knowledge by classifying the child immunization data of Jammu and Kashmir State of India. The data for the present work was collected from a web portal named as Health Management Information System (HMIS) facilitated by Ministry of Health and Family Welfare (MoHFW), Government of India. The data consists of diverse health parameters pertaining to the immunization of children and for the present study, the child immunization data of all districts of Jammu and Kashmir State was considered. Two classifiers viz. Bayesian TAN and Naïve Bayes were employed for classifying the districts of Jammu and Kashmir State into High IMR and Low IMR districts based on the available past data from 2014 to 2018. Additionally, various measurement methods have been used to evaluate the performance of the models developed by Bayesian TAN and Naïve Bayes.


Author(s):  
Fauziah Nur ◽  
M. Zarlis ◽  
Benny Benyamin Nasution

Data mining merupakan teknik pengolahan data dalam jumlah besar untuk pengelompokan.Teknik ini digunakan dalam proses Knowledge Discovery in Database (KDD). Teknik tersebut mempunyai beberapa metode dalam pengelompokannya Naïve-Bayes dan Nearest Neighbour, pohon keputusan (KD-Tree), ID3, K-Means, text mining dan dbscan. Dalam hal ini penulis mengelompokan data siswa baru sekolah menengah kejuruan tahun ajaran 2014/2015. Pengelompokan tersebut berdasarkan kriteria – kriteria data siswa. Pada penelitian ini, penulis menerapkan algoritma K-Means Clustering untuk pengelompokan data siswa baru sekolah menengah kejuruan. Dalam hal ini, pada umumnya untuk memamasuki jurusan hanya disesuaikan dengan nilai siswa saja namun dalam penelitian ini pengelompokan disesuaikan kriteria – kriteria siswa seperti penghasilan orang tua, tanggungan anak orang tua dan nilai tes siswa. Penulis menggunakan beberapa kriteria tersebut agar pengelompokan yang dihasilkan menjadi lebih optimal. Tujuan dari pengelompokan ini adalah terbentuknya kelompok jurusan pada siswa yang menggunakan algoritma K-Means clustering. Hasil dari pengelompokan tersebut diperoleh tiga kelompok yaitu kelompok tidak lulus, kelompok rekayasa perangkat lunak dan kelompok teknik komputer jaringan. Terdapat pusat cluster  dengan Cluster-1=1.4;2.2;2.2, Cluster-2= 2.28;1.64;4 dan Cluster-3=5;2;6. Pusat cluster tersebut didapat dari beberapa iterasi sehingga mengahasilakan pusat cluster yang optimal.


2018 ◽  
Vol 4 (1) ◽  
pp. 6-12
Author(s):  
Eka Miranda

Tujaun penelitian ini adalah mengklasifikasikan pelanggan berdasarkan tabel transaksi dengan pendekatan knowledge discovery from data (KDD) dan metode data mining naïve bayes classifier dengan manfaat menghasilkan pengetahuan yang berguna untuk mengambil keputusan yang terkait dengan mengelola pelanggan.Untuk menggali pengetahuan dari data yang berjumlah besar tersebut, menggunakan data mining dan metode Naïve Bayes Classifier. Untuk mengklasifikasikan pelanggan digunakan tabel transaksi dari proses pembelian kendaraan bermotor dengan pendekatan Knowledge Discovery from Data (KDD) dan metode data mining Naïve Bayes Classifier. Metode yang digunakan pada penelitian terdiri atas metode pengumpulan data yang digunakan untuk pencariaan kebutuhan informasi dengan menggunakan fact finding technique menurut Thomas Connolly dan Carolyn Begg, yang meliputi: Wawancara (Interview), Persyaratan (Requerements) atau Preferensi (Preferences) dan proses penemuan pengetahuan menggunakan pendekatan Knowledge Discovery from Data (KDD). Penellitian ini mengklasifikasikan pelanggan menjadi dua kelas yaitu kelas pelanggan potensial dan pelanggan tidak potensial dengan menggunakan atribut prediksi klasifikasi terdiri atas Pekerjaan, Jenis Bayar, Tenor dan Usia. Hasil dari penelitian menunjukan bahwa Naïve Bayes Classifier telah dapat mengklasifikasikan pelanggan menjadi dua kelas yaitu kelas pelanggan potensial dan pelanggan tidak potensial dengan nilai akurasi masing-masing sebagai berikut : Sensitivity 97%, Specificity 99,8%, Precision 99,8%, Recall 97%, Accuracy 97%, Error Rate 3%.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


2020 ◽  
Vol 10 (1) ◽  
pp. 12
Author(s):  
Ekka Pujo Ariesanto Akhmad

<strong> </strong>Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi <em>86,71</em>%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi <em>90,27</em>% untuk prediksi nasabah bank yang mengambil kartu kredit.


2018 ◽  
Vol 12 (2) ◽  
pp. 119-126 ◽  
Author(s):  
Vikas Chaurasia ◽  
Saurabh Pal ◽  
BB Tiwari

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.


2019 ◽  
Vol 3 (3) ◽  
pp. 103
Author(s):  
Ni Wayan Wardani ◽  
Ni Kadek Ariasih

Pelanggan adalah salah satu aset utama bagi perusahaan ritel. Perusahaan harus dapat mengenali bagaimana karakter pelanggan mereka sehingga mereka dapat mempertahankan pelanggan yang sudah ada agar tidak berhenti membeli dan pindah ke perusahaan ritel yang bersaing (churn). Salah satu model yang tepat untuk mengenali karakter pelanggan adalah model RFM (Recency, Frekuensi, Moneter). Model RFM mampu menghasilkan kelas pelanggan dan di setiap kelas pelanggan dapat dianalisis atau diprediksi dengan konsep data mining apakah pelanggan tetap sebagai pelanggan atau churn. Data yang digunakan berasal dari data pelanggan dan data penjualan di UD. Mawar Sari. Kelas pelanggan UD Mawar Sari yang dihasilkan dari model RFM adalah Dormant, Everyday, Golden dan Superstar. Konsep data mining dengan membangun model prediksi dalam penelitian ini menggunakan algoritma Decision Tree C4.5 dan Naïve Bayes. Di semua kelas pelanggan kinerja Algoritma Naïve Bayes lebih baik daripada Algoritma Decision Tree C4.5 dengan Recall 95,92%, Precision 84,15%, dan Accuracy 83,49% dan kelas pelanggan yang memiliki potensi churn tinggi adalah Dormant B, Dormant E, dan Dormant F.Kata Kunci: Prediksi Churn, RFM, C4.5, Naïve Bayes


2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Diah Puspitasari ◽  
Syifa Sintia Al Khautsar ◽  
Wida Prima Mustika

Cooperatives are a forum that can help people, especially small and medium-sized communities. Cooperatives play an important role in the economic growth of the community such as the price of basic commodities which are relatively cheap and there are also cooperatives that offer borrowing and storing money for the community. Constraints that have been felt by this cooperative are that borrowers find it difficult to repay loan installments, causing bad credit. Because the cooperative in conducting credit analysis is carried out in a personal manner, namely by filling out the loan application form along with the requirements and conducting a field survey. Therefore there is a need for an evaluation to be carried out in lending to borrowers. To minimize these problems, it is necessary to detect customer criteria that are used to predict bad loans and to determine whether or not the elites are eligible to take credit using data mining. The data mining technique used is classification with the Naive Bayes method. Based on testing the accuracy of the resulting model obtained accuracy level of 59%, sensitivity (True Positive Rate (TP Rate) or Recall) of 46.80%, specificity (False Negative Rate (FN Rate or Precision) of 69.81%, Positive Predictive Value (PPV) of 57.89%, and Negative Predictive Value (NPV) of 59.67%.


2019 ◽  
Vol 1 (1) ◽  
pp. 14-28
Author(s):  
Ahmad Haidar Mirza

Data Mining is a process that uses statistical techniques, mathematics, artificial intelligence, machine learning to extract and identify useful information and related knowledge from large databases. Data mining is the process of finding new patterns in data by filtering large amounts of data. Data mining uses pattern recognition technology that is similar to statistical techniques and mathematical techniques. The patterns found can provide useful information for generating economic benefits, effectiveness and efficiency. Algorithm Naive Bayes Classifier is one method of data mining that can be used to support effective and efficient promotion strategies. The Naive Bayes Classifier algorithm is used to predict the interest of the study based on the calculations performed. The data used are new student registration data from 2014 until 2016 at Bina Darma University. The results of this study are new models that are expected to provide important information can be used to assist the Marketing Team of Bina Darma University Palembang in policy making and implementation of appropriate marketing strategy. The results obtained are expected to help to support the promotion strategies that impact on the effectiveness and efficiency of promotion and increase the number of new students who will register.


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


Sign in / Sign up

Export Citation Format

Share Document