scholarly journals Migrating From Data Mining to Big Data Mining

2018 ◽  
Vol 7 (3.4) ◽  
pp. 13
Author(s):  
Gourav Bathla ◽  
Himanshu Aggarwal ◽  
Rinkle Rani

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with  pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples.  Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.  

2020 ◽  
Vol 1641 ◽  
pp. 012068
Author(s):  
Diah Puspitasari ◽  
Kresna Ramanda ◽  
Adi Supriyatna ◽  
Mochamad Wahyudi ◽  
Erma Delima Sikumbang ◽  
...  

2018 ◽  
pp. 90-102
Author(s):  
Matheus Varela Ferreira ◽  
Francisco Assis da Silva ◽  
Leandro Luiz de Almeida ◽  
Danillo Roberto Pereira

With the increasing need to make decisions in the short term, industry (pharmaceutical, petrochemical, aeronautics and etc.) has been seeking new ways to reduce the time of the data mining process to obtain knowledge. In recent years, many technological resources are being used to mitigate this need, an example is CUDA. CUDA is a platform that enables the use of GeForce GPUs in conjunction with CPUs for data processing, significantly reducing processing time. This work proposes to perform a comparative analysis of the processing time between two versions of some data mining algorithms (Apriori, AprioriAll, Naïve Bayes and K-Means), one running on CPU only and one on CPU in conjunction with GPU through platform CUDA. Through the experiments performed, it was observed that using the CUDA platform it is possible to obtain satisfactory results.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mahmoud Hajipour ◽  
Niloufar Taherpour ◽  
Haleh Fateh ◽  
Ebrahim Yousefi ◽  
Koorosh Etemad ◽  
...  

Objectives: Reducing infant mortality in the whole world is one of the millennium development goals.The aim of this study was to determine the factors related to infant mortality using data mining algorithms. Methods: This population-based case-control study was conducted in eight provinces of Iran. A sum of 2,386 mothers (1,076 cases and 1,310 controls) enrolled in this study. Data were extracted from health records of mothers and filled with checklists in health centers. We employed several data mining algorithms such as AdaBoost classifier, Support Vector Machine, Artificial Neural Networks, Random Forests, K-nearest neighborhood, and Naïve Bayes in order to recognize the important predictors of infant death; binary logistic regression model was used to clarify the role of each selected predictor. Results: In this study, 58.7% of infant mortalities occurred in rural areas, that 55.6% of them were boys. Moreover, Naïve Bayes and Random Forest were highly capable of predicting related factors among data mining models. Also, the results showed that events during pregnancy such as dental disorders, high blood pressure, loss of parents, factors related to infants such as low birth weight, and factors related to mothers like consanguineous marriage and gap of pregnancy (< 3 years) were all risk factors while the age of pregnancy (18 - 35 year) and a high degree of education were protective factors. Conclusions: Infant mortality is the consequence of a variety of factors, including factors related to infants themselves and their mothers and events during pregnancy. Owing to the high accuracy and ability of modern modeling compared to traditional modeling, it is recommended to use machine learning tools for indicating risk factors of infant mortality.


Sign in / Sign up

Export Citation Format

Share Document