A parallel approach for high utility-based frequent pattern mining in a big data environment

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.

Get full-text (via PubEx)

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

Complexity ◽

10.1155/2018/2818251 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Dawen Xia ◽

Xiaonan Lu ◽

Huaqing Li ◽

Wendong Wang ◽

Yantao Li ◽

...

Keyword(s):

Big Data ◽

Association Analysis ◽

Intelligent Transportation Systems ◽

Large Scale ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Transportation Systems ◽

Frequent Pattern ◽

Trajectory Data ◽

Pattern Growth

Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.

Get full-text (via PubEx)

MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

Applied Sciences ◽

10.3390/app9102075 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2075 ◽

Cited By ~ 4

Author(s):

Chen-Shu Wang ◽

Jui-Yen Chang

Keyword(s):

Big Data ◽

Data Analytics ◽

Pattern Mining ◽

High Efficiency ◽

Big Data Analytics ◽

Frequent Pattern Mining ◽

Experimental Results ◽

Frequent Pattern ◽

Multiple Item ◽

Two Phases

In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of itemsets with multiple item supports (MIS). The proposed architecture consists of two phases. First, in the counting support phase, a Hadoop MapReduce architecture is employed to determine the support for each item. Next, in the analytics phase, sub-transaction blocks are generated according to MIS and the MISFP-growth algorithm identifies the frequency of patterns. To facilitate decision makers in setting MIS, we also propose the concept of classification of item (COI), which classifies items of higher homogeneity into the same class, by which the items inherit class support as their item support. Three experiments were implemented to validate the proposed Hadoop-based MISFP-growth algorithm. The experimental results show approximately 38% reduction in the execution time on parallel architectures. The proposed MISFP-growth algorithm can be implemented on the distributed computing framework. Furthermore, according to the experimental results, the enhanced performance of the proposed algorithm indicates that it could have big data analytics applications.

Get full-text (via PubEx)

A Review: Frequent Pattern Mining Techniques in Static and Stream Data Environment

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i45/106350 ◽

2016 ◽

Vol 9 (45) ◽

Author(s):

, Simarpreet ◽

Varun Singla

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Stream Data ◽

Data Environment

Get full-text (via PubEx)