Management of Data Streams for Large-Scale Data Mining

Author(s):  
Jon R. Wright ◽  
Gregg T. Vesonder ◽  
Tamraparni Dasu

In an enterprise setting, a major challenge for any data mining operation is managing data streams or feeds, both data and metadata, to ensure a stable and certifiably accurate flow of data. Data feeds in this environment can be complex, numerous and opaque. The management of frequently changing data and metadata presents a considerable challenge. In this paper, we articulate the technical issues involved in the task of managing enterprise data and propose a multi-disciplinary solution, derived from fields such as knowledge engineering and statistics, to understand, standardize, and automate information acquisition and quality management in preparation for enterprise mining.

2008 ◽  
pp. 2644-2658
Author(s):  
Jon R. Wright ◽  
Gregg T. Vesonder ◽  
Tamraparni Dasu

In an enterprise setting, a major challenge for any data mining operation is managing data streams or feeds, both data and metadata, to ensure a stable and certifiably accurate flow of data. Data feeds in this environment can be complex, numerous and opaque. The management of frequently changing data and metadata presents a considerable challenge. In this paper, we articulate the technical issues involved in the task of managing enterprise data and propose a multi-disciplinary solution, derived from fields such as knowledge engineering and statistics, to understand, standardize, and automate information acquisition and quality management in preparation for enterprise mining.


2011 ◽  
Vol 26 (1) ◽  
pp. 25-29 ◽  
Author(s):  
Frans Coenen

AbstractData mining has become a well-established discipline within the domain of artificial intelligence (AI) and knowledge engineering (KE). It has its roots in machine learning and statistics, but encompasses other areas of computer science. It has received much interest over the last decade as advances in computer hardware have provided the processing power to enable large-scale data mining to be conducted. Unlike other innovations in AI and KE, data mining can be argued to be an application rather then a technology and thus can be expected to remain topical for the foreseeable future. This paper presents a brief review of the history of data mining, up to the present day, and some insights into future directions.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


2018 ◽  
Vol 3 (1) ◽  
pp. 1-18
Author(s):  
Kislaya Kunjan ◽  
Huanmei Wu ◽  
Tammy R. Toscos ◽  
Bradley N. Doebbeling

2020 ◽  
Vol 204 ◽  
pp. 106186 ◽  
Author(s):  
Fang Liu ◽  
Yanwei Yu ◽  
Peng Song ◽  
Yangyang Fan ◽  
Xiangrong Tong

2016 ◽  
Vol 194 ◽  
pp. 107-116 ◽  
Author(s):  
Jingsong Shan ◽  
Jianxin Luo ◽  
Guiqiang Ni ◽  
Zhaofeng Wu ◽  
Weiwei Duan

Sign in / Sign up

Export Citation Format

Share Document