Various viewpoints analysis of the actual and large-scale data by using the data mining technique

Author(s):  
K. Tamura ◽  
K. Matsuura ◽  
H. Imai
Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


2018 ◽  
Vol 3 (1) ◽  
pp. 1-18
Author(s):  
Kislaya Kunjan ◽  
Huanmei Wu ◽  
Tammy R. Toscos ◽  
Bradley N. Doebbeling

2004 ◽  
Vol 19 (1) ◽  
pp. 147-158 ◽  
Author(s):  
Matthew Ward ◽  
Wei Peng ◽  
Xiaoning Wang

2020 ◽  
Vol 7 (3) ◽  
pp. 230
Author(s):  
Saifullah Saifullah ◽  
Nani Hidayati

<p><em>Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.</em></p><p><strong>Kata Kunci</strong>: <em>Illiterate</em><em>, Data mining, K-Means Clustering</em></p><p><em>Data Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat</em><em> </em><em>Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf</em><em> berdasarkan propinsi di Indonesia</em><em>.</em><em> </em><em>Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu</em><em> 1 node</em><em> kelompok tinggi</em><em>,  kelompok rendah memiliki 27 node</em><em>, dan kelompok  sedang  6 node. Ha</em><em>sil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan</em><em> </em><em>pemberantasan buta huruf di Indonesia berdasarakn propinsi</em><em>.</em></p><p><strong>Kata Kunci</strong>: Buta Huruf, Data mining, <em>K-Means Clustering</em><em></em></p>


Sign in / Sign up

Export Citation Format

Share Document