Research on Association Rule Mining Algorithm Based on Distributed Data

2014 ◽  
Vol 998-999 ◽  
pp. 899-902 ◽  
Author(s):  
Cheng Luo ◽  
Ying Chen

Existing data miming algorithms have mostly implemented data mining under centralized environment, but the large-scale database exists in the distributed form. According to the existing problem of the distributed data mining algorithm FDM and its improved algorithms, which exist the problem that the frequent itemsets are lost and network communication cost too much. This paper proposes a association rule mining algorithm based on distributed data (ARADD). The mapping marks the array mechanism is included in the ARADD algorithm, which can not only keep the integrity of the frequent itemsets, but also reduces the cost of network communication. The efficiency of algorithm is proved in the experiment.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
You Wu ◽  
Zheng Wang ◽  
Shengqi Wang

Data mining is currently a frontier research topic in the field of information and database technology. It is recognized as one of the most promising key technologies. Data mining involves multiple technologies, such as mathematical statistics, fuzzy theory, neural networks, and artificial intelligence, with relatively high technical content. The realization is also difficult. In this article, we have studied the basic concepts, processes, and algorithms of association rule mining technology. Aiming at large-scale database applications, in order to improve the efficiency of data mining, we proposed an incremental association rule mining algorithm based on clustering, that is, using fast clustering. First, the feasibility of realizing performance appraisal data mining is studied; then, the business process needed to realize the information system is analyzed, the business process-related links and the corresponding data input interface are designed, and then the data process to realize the data processing is designed, including data foundation and database model. Aiming at the high efficiency of large-scale database mining, database development tools are used to implement the specific system settings and program design of this algorithm. Incorporated into the human resource management system of colleges and universities, they carried out successful association broadcasting, realized visualization, and finally discovered valuable information.


2013 ◽  
Vol 327 ◽  
pp. 197-200
Author(s):  
Guo Fang Kuang ◽  
Ying Cun Cao

The material is used by humans to manufacture the machines, components, devices and other products of substances. Association rules originated in the field of data mining, people use it to find large amounts of data between itemsets of the association. Apriori is a breadth-first algorithm to obtain the support is greater than the minimum support of frequent itemsets by repeatedly scanning the database. This paper presents the construction of materials science and information model based on association rule mining. Experimental data sets prove that the proposed algorithm is effective and reasonable.


Author(s):  
Mafruz Zaman Ashrafi

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site.


2019 ◽  
Vol 8 (S2) ◽  
pp. 9-12
Author(s):  
R. Smeeta Mary ◽  
K. Perumal

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.


2010 ◽  
Vol 6 (4) ◽  
pp. 30-45 ◽  
Author(s):  
M. Rajalakshmi ◽  
T. Purusothaman ◽  
S. Pratheeba

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.


Author(s):  
Lin Lin ◽  
Mei-Ling Shyu ◽  
Shu-Ching Chen

The explosive growth and increasing complexity of the multimedia data have created a high demand of multimedia services and applications in various areas so that people can access and distribute the data easily. Unfortunately, traditional keyword-based information retrieval is no longer suitable. Instead, multimedia data mining and content-based multimedia information retrieval have become the key technologies in modern societies. Among many data mining techniques, association rule mining (ARM) is considered one of the most popular approaches to extract useful information from multimedia data in terms of relationships between variables. In this paper, a novel rule-based semantic concept classification framework using weighted association rule mining (WARM), capturing the significance degrees of the feature-value pairs to improve the applicability of ARM, is proposed to deal with major issues and challenges in large-scale video semantic concept classification. Unlike traditional ARM that the rules are generated by frequency count and the items existing in one rule are equally important, our proposed WARM algorithm utilizes multiple correspondence analysis (MCA) to explore the relationships among features and concepts and to signify different contributions of the features in rule generation. To the authors best knowledge, this is one of the first WARM-based classifiers in the field of multimedia concept retrieval. The experimental results on the benchmark TRECVID data demonstrate that the proposed framework is able to handle large-scale and imbalanced video data with promising classification and retrieval performance.


2014 ◽  
Vol 571-572 ◽  
pp. 57-62
Author(s):  
Si Hui Shu ◽  
Zi Zhi Lin

Association rule mining is one of the most important and well researched techniques of data mining, the key procedure of the association rule mining is to find frequent itemsets , the frequent itemsets are easily obtained by maximum frequent itemsets. so finding maximum frequent itemsets is one of the most important strategies of association data mining. Algorithms of mining maximum frequent itemsets based on compression matrix are introduced in this paper. It mainly obtains all maximum frequent itemsets by simply removing a set of rows and columns of transaction matrix, which is easily programmed recursive algorithm. The new algorithm optimizes the known association rule mining algorithms based on matrix given by some researchers in recent years, which greatly reduces the temporal complexity and spatial complexity, and highly promotes the efficiency of association rule mining.


2018 ◽  
Vol 189 ◽  
pp. 10012 ◽  
Author(s):  
Ming Yin ◽  
Wenjie Wang ◽  
Yang Liu ◽  
Dan Jiang

FP-Growth algorithm is an association rule mining algorithm based on frequent pattern tree (FP-Tree), which doesn’t need to generate a large number of candidate sets. However, constructing FP-Tree requires two scansof the original transaction database and the recursive mining of FP-Tree to generate frequent itemsets. In addition, the algorithm can’t work effectively when the dataset is dense. To solve the problems of large memory usage and low time-effectiveness of data mining in this algorithm, this paper proposes an improved algorithm based on adjacency table using a hash table to store adjacency table, which considerably saves the finding time. The experimental results show that the improved algorithm has good performance especially for mining frequent itemsets in dense data sets.


Sign in / Sign up

Export Citation Format

Share Document