Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

Enhancement of Classification using FPFF-ANN for Big data Analysis in Distributed Environment

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5712.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1033-1040

Keyword(s):

Big Data ◽

Frequent Itemsets ◽

Age Group ◽

Distributed Environment ◽

Qualitative Information ◽

Rule Mining ◽

Computation Cost ◽

Server Software ◽

As Relationship ◽

F Measure

The development of massive amount of information from any source of group at any time, wherever and from any device which is termed as Big Data. The age group of big data becomes a dangerous challenge to grip, take out and access these data is short length of time. The detection of everyday itemsets is an significant issue of data mining which helps in engendering the qualitative information for the business insight and helps for the verdict makers. For the extracting the necessary itemsets from the big data a variety of big data logical techniques has been evolved such as relationship rule mining, genetic algorithm, mechanism learning, FP-growth algorithm etc. In this paper we suggest FP-ANN algorithm to promote the FP enlargement calculation with neural networks to maintain the feed forward approach. The recommend algorithm uses the Twitter social dataset for the collection of frequent itemsets and the proportional analysis of this approach is done using the different performance measuring parameters such as Precision, Recall, F-measure, Time complexity, Computation cost and time. The simulation of proposed work is done using the JDK, JavaBeans, and Wamp server software. The experimental results of projected algorithm gives better results in deference of time difficulty, computation cost and time also. It also gives enhanced results for the Precision, recall and F-measure.

Download Full-text

An Enhanced Approach to Mine Maximal Frequent Itemset using Maximal Frequent Itemset Prima Algorithm (MFIPA)

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s2.2035 ◽

2019 ◽

Vol 8 (S2) ◽

pp. 9-12

Author(s):

R. Smeeta Mary ◽

K. Perumal

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Decision Makers ◽

New Method ◽

Rule Mining

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.

Download Full-text

Implementation of Improved Association Rule Mining Algorithms for Fast Mining with Efficient Tree Structures on Large Datasets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3876.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5136-5141

Keyword(s):

Association Rule ◽

Frequent Itemsets ◽

Large Datasets ◽

Frequent Itemset ◽

Rule Mining ◽

Tree Structures ◽

Significant Area ◽

Dataset Size ◽

Mining Algorithms ◽

Mining Frequent Itemsets

ARM is a significant area of knowledge mining which enables association rules which are essential for decision making. Frequent itemset mining has a challenge against large datasets. As going on the dataset size increases the burden and time to discover rules will increase. In this paper the ARM algorithms with tree structures like FP-tree, FIN with POC tree and PPC tree are discussed for reducing overheads and time consuming. These algorithms use highly competent data structures for mining frequent itemsets from the database. FIN uses nodeset a unique and novel data structure to extract frequent itemsets and POC tree to store frequent itemset information. These techniques are extremely helpful in the marketing fields. The proposed and implemented techniques reveal that they have improved about performance by means of time and efficiency

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining

Applied Sciences ◽

10.3390/app11198971 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8971

Author(s):

Yalong Zhang ◽

Wei Yu ◽

Xuan Ma ◽

Hisakazu Ogura ◽

Dongfen Ye

Keyword(s):

Big Data ◽

Optimal Solution ◽

Solution Space ◽

Frequent Itemsets ◽

Frequent Itemset ◽

High Dimensional ◽

Lethal Gene ◽

Multi Objective Optimization ◽

Multi Objective ◽

Evolution Algorithms

The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.

Download Full-text