Parallel Association Rules Pruning Algorithm on Hadoop MapReduce

Author(s):  
Mohamed A. Alasow ◽  
Salahadin A. Mohammed ◽  
El-Sayed M. El-Alfy
Author(s):  
Eduardo P. S. Castro ◽  
Thiago D. Maia ◽  
Marluce R. Pereira ◽  
Ahmed A. A. Esmin ◽  
Denilson A. Pereira

AbstractSeveral Apriori algorithm implementations for mining association rules have been proposed in the literature using the Hadoop-MapReduce framework and, more recently, Spark. However, none of the works have made a detailed assessment of its performance, for example, comparing it with other implementations in various characteristics of data sets. In this work, we present a review of the main algorithms proposed for Hadoop-MapReduce and compared their implementations in a single environment under several different situations. Moreover, these algorithms had their implementations adapted to Spark, and also compared under the same circumstances. Based on the results of the experiments, we present a framework for recommending the Apriori implementation most appropriate for solving a given problem, according to the data set characteristics and minimum required support. The results show that Spark implementations overcome Hadoop-MapReduce implementations at runtime in most experiments. However, there is no single implementation that is the best in all the evaluated situations.


2020 ◽  
Vol 16 (10) ◽  
pp. 1627
Author(s):  
Pei Shujun ◽  
Zhang Yu ◽  
Liang Chao

Author(s):  
LAKSHMI PRANEETHA

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.


2014 ◽  
Vol 1 (1) ◽  
pp. 339-342
Author(s):  
Mirela Danubianu ◽  
Dragos Mircea Danubianu

AbstractSpeech therapy can be viewed as a business in logopaedic area that aims to offer services for correcting language. A proper treatment of speech impairments ensures improved efficiency of therapy, so, in order to do that, a therapist must continuously learn how to adjust its therapy methods to patient's characteristics. Using Information and Communication Technology in this area allowed collecting a lot of data regarding various aspects of treatment. These data can be used for a data mining process in order to find useful and usable patterns and models which help therapists to improve its specific education. Clustering, classification or association rules can provide unexpected information which help to complete therapist's knowledge and to adapt the therapy to patient's needs.


Sign in / Sign up

Export Citation Format

Share Document