Various viewpoints analysis of the actual and large-scale data by using the data mining technique

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Classification and metaclassification in large scale data mining application for estimation of software projects

2010 IEEE 9th International Conference on Cyberntic Intelligent Systems ◽

10.1109/ukricis.2010.5898136 ◽

2010 ◽

Cited By ~ 1

Author(s):

Dorota Dzega ◽

Wieslaw Pietruszkiewicz

Keyword(s):

Data Mining ◽

Large Scale ◽

Software Projects ◽

Large Scale Data ◽

Data Mining Application ◽

Scale Data

Download Full-text

Large-Scale Data Mining to Optimize Patient-Centered Scheduling at Health Centers

Journal of Healthcare Informatics Research ◽

10.1007/s41666-018-0030-0 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1-18

Author(s):

Kislaya Kunjan ◽

Huanmei Wu ◽

Tammy R. Toscos ◽

Bradley N. Doebbeling

Keyword(s):

Data Mining ◽

Large Scale ◽

Health Centers ◽

Patient Centered ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large scale data mining approach for gene-specific standardization of microarray gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btl500 ◽

2006 ◽

Vol 22 (23) ◽

pp. 2898-2904 ◽

Cited By ~ 12

Author(s):

S. Yoon ◽

Y. Yang ◽

J. Choi ◽

J. Seong

Keyword(s):

Gene Expression ◽

Data Mining ◽

Large Scale ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Large Scale Data ◽

Data Mining Approach ◽

Microarray Gene ◽

Scale Data

Download Full-text

Scalable 2-Pass Data Mining Technique for Large Scale Spatio-temporal Datasets

Lecture Notes in Computer Science - Knowledge-Based Intelligent Information and Engineering Systems ◽

10.1007/978-3-540-74827-4_99 ◽

2007 ◽

pp. 785-792

Author(s):

Tahar Kechadi ◽

Michela Bertolotto

Keyword(s):

Data Mining ◽

Large Scale ◽

Data Mining Technique ◽

Mining Technique ◽

Spatio Temporal

Download Full-text

Large scale data mining using genetics-based machine learning

Proceeding of the fifteenth annual conference companion on Genetic and evolutionary computation conference companion - GECCO '13 Companion ◽

10.1145/2464576.2480807 ◽

2013 ◽

Author(s):

Jaume Bacardit ◽

Xavier Llorà

Keyword(s):

Machine Learning ◽

Data Mining ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large-scale data mining using genetics-based machine learning

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery ◽

10.1002/widm.1078 ◽

2013 ◽

Vol 3 (1) ◽

pp. 37-61 ◽

Cited By ~ 38

Author(s):

Jaume Bacardit ◽

Xavier Llorà

Keyword(s):

Machine Learning ◽

Data Mining ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Introduction to Special Issue on Large-Scale Data Mining

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/1921632.1921633 ◽

2011 ◽

Vol 5 (2) ◽

pp. 1-1

Keyword(s):

Data Mining ◽

Large Scale ◽

Special Issue ◽

Large Scale Data ◽

Scale Data

Download Full-text

Hierarchical visual data mining for large-scale data

Computational Statistics ◽

10.1007/bf02915281 ◽

2004 ◽

Vol 19 (1) ◽

pp. 147-158 ◽

Cited By ~ 5

Author(s):

Matthew Ward ◽

Wei Peng ◽

Xiaoning Wang

Keyword(s):

Data Mining ◽

Large Scale ◽

Visual Data ◽

Visual Data Mining ◽

Large Scale Data ◽

Scale Data

Download Full-text

PENGELOMPOKAN PERSENTASE BUTA HURUF UMUR 15-44 MENURUT PROVINSI MENGGUNAKAN ALGORITMA K-MEANS

KLIK - KUMPULAN JURNAL ILMU KOMPUTER ◽

10.20527/klik.v7i3.329 ◽

2020 ◽

Vol 7 (3) ◽

pp. 230

Author(s):

Saifullah Saifullah ◽

Nani Hidayati

Keyword(s):

Data Mining ◽

Human Resources ◽

Data Clustering ◽

Large Scale ◽

Market Basket ◽

Large Scale Data ◽

The Government ◽

Large Scale Data Processing ◽

Scale Data

Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.Kata Kunci: Illiterate, Data mining, K-Means ClusteringData Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf berdasarkan propinsi di Indonesia. Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu 1 node kelompok tinggi, kelompok rendah memiliki 27 node, dan kelompok sedang 6 node. Hasil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan pemberantasan buta huruf di Indonesia berdasarakn propinsi.Kata Kunci: Buta Huruf, Data mining, K-Means Clustering

Download Full-text