Research on Association Rule Mining Algorithm Based on Distributed Data

Existing data miming algorithms have mostly implemented data mining under centralized environment, but the large-scale database exists in the distributed form. According to the existing problem of the distributed data mining algorithm FDM and its improved algorithms, which exist the problem that the frequent itemsets are lost and network communication cost too much. This paper proposes a association rule mining algorithm based on distributed data (ARADD). The mapping marks the array mechanism is included in the ARADD algorithm, which can not only keep the integrity of the frequent itemsets, but also reduces the cost of network communication. The efficiency of algorithm is proved in the experiment.

Download Full-text

Human Resource Allocation Based on Fuzzy Data Mining Algorithm

Complexity ◽

10.1155/2021/9489114 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

You Wu ◽

Zheng Wang ◽

Shengqi Wang

Keyword(s):

Data Mining ◽

Human Resource ◽

Business Process ◽

Association Rule ◽

Association Rule Mining ◽

Large Scale ◽

Data Mining Algorithm ◽

Rule Mining ◽

Mining Algorithm ◽

Database Technology

Data mining is currently a frontier research topic in the field of information and database technology. It is recognized as one of the most promising key technologies. Data mining involves multiple technologies, such as mathematical statistics, fuzzy theory, neural networks, and artificial intelligence, with relatively high technical content. The realization is also difficult. In this article, we have studied the basic concepts, processes, and algorithms of association rule mining technology. Aiming at large-scale database applications, in order to improve the efficiency of data mining, we proposed an incremental association rule mining algorithm based on clustering, that is, using fast clustering. First, the feasibility of realizing performance appraisal data mining is studied; then, the business process needed to realize the information system is analyzed, the business process-related links and the corresponding data input interface are designed, and then the data process to realize the data processing is designed, including data foundation and database model. Aiming at the high efficiency of large-scale database mining, database development tools are used to implement the specific system settings and program design of this algorithm. Incorporated into the human resource management system of colleges and universities, they carried out successful association broadcasting, realized visualization, and finally discovered valuable information.

Download Full-text

An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2010.2208 ◽

2010 ◽

Vol 2 (2) ◽

pp. 90-103 ◽

Cited By ~ 9

Author(s):

Sujni Paul

Keyword(s):

Data Mining ◽

Response Time ◽

Association Rule ◽

Association Rule Mining ◽

Distributed Data Mining ◽

Distributed Data ◽

Rule Mining ◽

Xml Data ◽

Mining Algorithm ◽

Distributed Association

Download Full-text

The Novel Model of Construct Materials Science and Information Based on Association Rule Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.327.197 ◽

2013 ◽

Vol 327 ◽

pp. 197-200

Author(s):

Guo Fang Kuang ◽

Ying Cun Cao

Keyword(s):

Experimental Data ◽

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Materials Science ◽

Frequent Itemsets ◽

Data Sets ◽

The Novel ◽

Rule Mining ◽

Novel Model

The material is used by humans to manufacture the machines, components, devices and other products of substances. Association rules originated in the field of data mining, people use it to find large amounts of data between itemsets of the association. Apriori is a breadth-first algorithm to obtain the support is greater than the minimum support of frequent itemsets by repeatedly scanning the database. This paper presents the construction of materials science and information model based on association rule mining. Experimental data sets prove that the proposed algorithm is effective and reasonable.

Download Full-text

Distributed Association Rule Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch108 ◽

2011 ◽

pp. 695-700

Author(s):

Mafruz Zaman Ashrafi

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Data Centers ◽

Digital Data ◽

Purchase Behavior ◽

Rule Mining ◽

Communication Costs ◽

Mining Algorithm ◽

Distributed Association

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site.

Download Full-text

An Enhanced Approach to Mine Maximal Frequent Itemset using Maximal Frequent Itemset Prima Algorithm (MFIPA)

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s2.2035 ◽

2019 ◽

Vol 8 (S2) ◽

pp. 9-12

Author(s):

R. Smeeta Mary ◽

K. Perumal

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Decision Makers ◽

New Method ◽

Rule Mining

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.

Download Full-text

Collusion-Free Privacy Preserving Data Mining

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2010100103 ◽

2010 ◽

Vol 6 (4) ◽

pp. 30-45 ◽

Cited By ~ 7

Author(s):

M. Rajalakshmi ◽

T. Purusothaman ◽

S. Pratheeba

Keyword(s):

Data Mining ◽

Association Rule ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Sources ◽

Sensitive Information ◽

Distributed Data ◽

Distributed Environment ◽

Rule Mining ◽

Privacy Preserving Data Mining

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.

Download Full-text

Rule-Based Semantic Concept Classification from Large-Scale Video Collections

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2013010103 ◽

2013 ◽

Vol 4 (1) ◽

pp. 46-67 ◽

Cited By ~ 3

Author(s):

Lin Lin ◽

Mei-Ling Shyu ◽

Shu-Ching Chen

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Association Rule ◽

Association Rule Mining ◽

Large Scale ◽

Multimedia Data ◽

Semantic Concept ◽

Rule Mining ◽

Rule Based ◽

Concept Classification

The explosive growth and increasing complexity of the multimedia data have created a high demand of multimedia services and applications in various areas so that people can access and distribute the data easily. Unfortunately, traditional keyword-based information retrieval is no longer suitable. Instead, multimedia data mining and content-based multimedia information retrieval have become the key technologies in modern societies. Among many data mining techniques, association rule mining (ARM) is considered one of the most popular approaches to extract useful information from multimedia data in terms of relationships between variables. In this paper, a novel rule-based semantic concept classification framework using weighted association rule mining (WARM), capturing the significance degrees of the feature-value pairs to improve the applicability of ARM, is proposed to deal with major issues and challenges in large-scale video semantic concept classification. Unlike traditional ARM that the rules are generated by frequency count and the items existing in one rule are equally important, our proposed WARM algorithm utilizes multiple correspondence analysis (MCA) to explore the relationships among features and concepts and to signify different contributions of the features in rule generation. To the authors best knowledge, this is one of the first WARM-based classifiers in the field of multimedia concept retrieval. The experimental results on the benchmark TRECVID data demonstrate that the proposed framework is able to handle large-scale and imbalanced video data with promising classification and retrieval performance.

Download Full-text

Analytical Study of Association Rule Mining Algorithm for Retrieving Frequent Itemsets in Big Datasets

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i7.424436 ◽

2018 ◽

Vol 6 (7) ◽

pp. 424-436

Author(s):

Sachin Kumar Pandey

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Analytical Study ◽

Frequent Itemsets ◽

Rule Mining ◽

Mining Algorithm

Download Full-text

Algorithms of Mining Maximum Frequent Itemsets Based on Compression Matrix

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.57 ◽

2014 ◽

Vol 571-572 ◽

pp. 57-62

Author(s):

Si Hui Shu ◽

Zi Zhi Lin

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Recursive Algorithm ◽

Frequent Itemsets ◽

Rule Mining ◽

Temporal Complexity ◽

Data Mining Algorithms ◽

Association Data ◽

Mining Algorithms

Association rule mining is one of the most important and well researched techniques of data mining, the key procedure of the association rule mining is to find frequent itemsets , the frequent itemsets are easily obtained by maximum frequent itemsets. so finding maximum frequent itemsets is one of the most important strategies of association data mining. Algorithms of mining maximum frequent itemsets based on compression matrix are introduced in this paper. It mainly obtains all maximum frequent itemsets by simply removing a set of rows and columns of transaction matrix, which is easily programmed recursive algorithm. The new algorithm optimizes the known association rule mining algorithms based on matrix given by some researchers in recent years, which greatly reduces the temporal complexity and spatial complexity, and highly promotes the efficiency of association rule mining.

Download Full-text

An improvement of FP-Growth association rule mining algorithm based on adjacency table

MATEC Web of Conferences ◽

10.1051/matecconf/201818910012 ◽

2018 ◽

Vol 189 ◽

pp. 10012 ◽

Cited By ~ 2

Author(s):

Ming Yin ◽

Wenjie Wang ◽

Yang Liu ◽

Dan Jiang

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Hash Table ◽

Frequent Itemsets ◽

Frequent Pattern ◽

Data Sets ◽

Rule Mining ◽

Mining Algorithm ◽

Mining Frequent Itemsets ◽

Improved Algorithm

FP-Growth algorithm is an association rule mining algorithm based on frequent pattern tree (FP-Tree), which doesn’t need to generate a large number of candidate sets. However, constructing FP-Tree requires two scansof the original transaction database and the recursive mining of FP-Tree to generate frequent itemsets. In addition, the algorithm can’t work effectively when the dataset is dense. To solve the problems of large memory usage and low time-effectiveness of data mining in this algorithm, this paper proposes an improved algorithm based on adjacency table using a hash table to store adjacency table, which considerably saves the finding time. The experimental results show that the improved algorithm has good performance especially for mining frequent itemsets in dense data sets.

Download Full-text