scholarly journals Evolutionary Computation Access on Incremental Map Reduce for Mining Large Scale Data

In recent era, data updates arrive constantly from different areas like social network, finance, healthcare, ecommerce etc… Hence the data becomes large and computation on it becomes difficult. A framework for mining data earlyand to refresh the computed result with the new data arrival is proposed. The framework includes an incremental mapreduce method on hadoop with evolutionary computation algorithm for reduction in time complexity and increased accuracy. Proposed approach is a key pair level incremental iterative processing to Mapreduce for mining big data and uses particle swarm optimization to avoid recomputation from scratch on the new data arrived. Thereby the I/O overhead gets reduced for accessing predefined states. Experimental results were tested on three iterative algorithms in hadoop showed good performance compared to traditional mapreduce with sequential computation access

2019 ◽  
Vol 15 (3) ◽  
pp. 64-78
Author(s):  
Chandrakala D ◽  
Sumathi S ◽  
Saran Kumar A ◽  
Sathish J

Detection and realization of new trends from corpus are achieved through Emergent Trend Detection (ETD) methods, which is a principal application of text mining. This article discusses the influence of the Particle Swarm Optimization (PSO) on Dynamic Adaptive Self Organizing Maps (DASOM) in the design of an efficient ETD scheme by optimizing the neural parameters of the network. This hybrid machine learning scheme is designed to accomplish maximum accuracy with minimum computational time. The efficiency and scalability of the proposed scheme is analyzed and compared with standard algorithms such as SOM, DASOM and Linear Regression analysis. The system is trained and tested on DBLP database, University of Trier, Germany. The superiority of hybrid DASOM algorithm over the well-known algorithms in handling high dimensional large-scale data to detect emergent trends from the corpus is established in this article.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

2016 ◽  
Author(s):  
John W. Williams ◽  
◽  
Simon Goring ◽  
Eric Grimm ◽  
Jason McLachlan

2008 ◽  
Vol 9 (10) ◽  
pp. 1373-1381 ◽  
Author(s):  
Ding-yin Xia ◽  
Fei Wu ◽  
Xu-qing Zhang ◽  
Yue-ting Zhuang

2021 ◽  
Vol 77 (2) ◽  
pp. 98-108
Author(s):  
R. M. Churchill ◽  
C. S. Chang ◽  
J. Choi ◽  
J. Wong ◽  
S. Klasky ◽  
...  

Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Sign in / Sign up

Export Citation Format

Share Document