Analysis and Implementation of Data Mining Algorithms for Deploying ID3, CHAID and Naive Bayes for Random Dataset

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples. Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.

Download Full-text

AUMENTANDODESEMPENHO DEALGORITMOSDEMINERAÇÃODEDADOSUTILIZANDOAPLATAFORMACUDA

Colloquium Exactarum ◽

10.5747/ce.2018.v10.n1.e226 ◽

2018 ◽

pp. 90-102

Author(s):

Matheus Varela Ferreira ◽

Francisco Assis da Silva ◽

Leandro Luiz de Almeida ◽

Danillo Roberto Pereira

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Data Processing ◽

Processing Time ◽

Naive Bayes ◽

Naïve Bayes ◽

Short Term ◽

Data Mining Algorithms ◽

Mining Algorithms

With the increasing need to make decisions in the short term, industry (pharmaceutical, petrochemical, aeronautics and etc.) has been seeking new ways to reduce the time of the data mining process to obtain knowledge. In recent years, many technological resources are being used to mitigate this need, an example is CUDA. CUDA is a platform that enables the use of GeForce GPUs in conjunction with CPUs for data processing, significantly reducing processing time. This work proposes to perform a comparative analysis of the processing time between two versions of some data mining algorithms (Apriori, AprioriAll, Naïve Bayes and K-Means), one running on CPU only and one on CPU in conjunction with GPU through platform CUDA. Through the experiments performed, it was observed that using the CUDA platform it is possible to obtain satisfactory results.

Download Full-text

A Tentative analysis of Liver Disorder using Data mining Algorithms J48, Decision Table and Naive Bayes

International Journal of Computing Algorithm ◽

10.20894/ijcoa.101.006.001.009 ◽

2017 ◽

Vol 6 (1) ◽

pp. 37-40 ◽

Cited By ~ 1

Author(s):

P. Kuppan ◽

◽

N. Manoharan ◽

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Liver Disorder ◽

Naïve Bayes ◽

Decision Table ◽

Data Mining Algorithms ◽

Using Data ◽

Mining Algorithms

Download Full-text

Predictive Factors of Infant Mortality Using Data Mining in Iran

Journal of Comprehensive Pediatrics ◽

10.5812/compreped.108575 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Mahmoud Hajipour ◽

Niloufar Taherpour ◽

Haleh Fateh ◽

Ebrahim Yousefi ◽

Koorosh Etemad ◽

...

Keyword(s):

Risk Factors ◽

Data Mining ◽

Infant Mortality ◽

Rural Areas ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Mining Algorithms ◽

Using Data ◽

Mining Algorithms

Objectives: Reducing infant mortality in the whole world is one of the millennium development goals.The aim of this study was to determine the factors related to infant mortality using data mining algorithms. Methods: This population-based case-control study was conducted in eight provinces of Iran. A sum of 2,386 mothers (1,076 cases and 1,310 controls) enrolled in this study. Data were extracted from health records of mothers and filled with checklists in health centers. We employed several data mining algorithms such as AdaBoost classifier, Support Vector Machine, Artificial Neural Networks, Random Forests, K-nearest neighborhood, and Naïve Bayes in order to recognize the important predictors of infant death; binary logistic regression model was used to clarify the role of each selected predictor. Results: In this study, 58.7% of infant mortalities occurred in rural areas, that 55.6% of them were boys. Moreover, Naïve Bayes and Random Forest were highly capable of predicting related factors among data mining models. Also, the results showed that events during pregnancy such as dental disorders, high blood pressure, loss of parents, factors related to infants such as low birth weight, and factors related to mothers like consanguineous marriage and gap of pregnancy (< 3 years) were all risk factors while the age of pregnancy (18 - 35 year) and a high degree of education were protective factors. Conclusions: Infant mortality is the consequence of a variety of factors, including factors related to infants themselves and their mothers and events during pregnancy. Owing to the high accuracy and ability of modern modeling compared to traditional modeling, it is recommended to use machine learning tools for indicating risk factors of infant mortality.

Download Full-text

Naive bayes algorithm performance for smartphone sentiment analysis in social media

International Journal Artificial Intelligent and Informatics ◽

10.33292/ijarlit.v1i2.23 ◽

2018 ◽

Vol 1 (2) ◽

pp. 76

Author(s):

Monalisa Fatmawati Sarifah

Keyword(s):

Communication Technology ◽

Naive Bayes ◽

Analytical Techniques ◽

Naïve Bayes ◽

Algorithm Performance ◽

Data Mining Algorithms ◽

Exchange Information ◽

Learning Technique ◽

Bayes Algorithm ◽

Mining Algorithms

Indonesia with a population of 250 million is a large market, Millennials tend to be more adaptive to the development of communication technology [1]. There are lot of opportunities that are used by various groups, one of which is the need to use smartphones that can make it easier for people to exchange information [2]. The shift in sales of smartphone brands in Indonesia is influenced by massive advertising carried out by smartphone vendors (smartphone capitalists) to consumers [3]. The enthusiasm of the community in welcoming this platform is so great, lot of comment about smartphone brand stated by public is an interesting thing to be processed to be information. Utilization of that information requires analytical techniques so that the produced information can help many parties. The method used in this study is Naïve Bayes classification method which is a learning technique for data mining algorithms that uses probability and statistical methods [4]. This method is used to classify comments given by the community to smartphone brands. The comments given in this application will later be classified into positive, negative, and neutral comments. The purpose of this study was to find out how much positive, negative and neutral comments the community gave to smartphone brands, so that later it would facilitate the smartphone brand in providing policies or development in the future.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MEMPREDIKSI PEMESANAN DRIVER GO-JEK ONLINE DENGAN MENGGUNAKAN METODE NAIVE BAYES (STUDI KASUS: PT. GO-JEK INDONESIA)

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.972 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Delisman Laia ◽

Efori Buulolo ◽

Matias Julyus Fika Sirait

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Transportation Industry ◽

Data Set ◽

Data Mining Algorithms ◽

Taxi Service ◽

Bayes Algorithm ◽

Using Data

PT. Go-Jek Indonesia is a service company. Go-jek online is a technology-based motorcycle taxi service that leads the transportation industry revolution. Predictions on ordering go-jek drivers using data mining algorithms are used to solve problems faced by the company PT. Go-Jek Indonesia to predict the level of ordering of online go-to drivers. In determining the crowded and lonely time. The proposed method is Naive Bayes. Naive Bayes algorithm aims to classify data in certain classes. The purpose of this study is to look at the prediction patterns of each of the attributes contained in the data set by using the naive algorithm and testing the training data on testing data to see whether the data pattern is good or not. what will be predicted is to collect the data of the previous driver ordering, which is based on the day, time for one month. The Naive Bayes algorithm is used to predict the ordering of online go-to-go drivers that will be experienced every day by seeing each order such as morning, afternoon and evening. The results of this study are to make it easier for the company to analyze the data of each go-jek driver booking in taking policies to ensure that both drivers and consumers or customers.Keywords: Go-jek Driver, Data Mining, Naive Bayes

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Evaluasi Telemarketing Kartu Kredit Bank Menggunakan Algoritma Genetika untuk Seleksi Fitur dan Naive Bayes

Jurnal Aplikasi Pelayaran dan Kepelabuhanan ◽

10.30649/japk.v10i1.71 ◽

2020 ◽

Vol 10 (1) ◽

pp. 12

Author(s):

Ekka Pujo Ariesanto Akhmad

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Standard Process ◽

Industry Standard

Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi 86,71%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi 90,27% untuk prediksi nasabah bank yang mengambil kartu kredit.

Download Full-text

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Download Full-text