Efficient Large Scale Frequent Itemset Mining with Hybrid Partitioning Approach

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952206 ◽

2019 ◽

pp. 845-852

Author(s):

Priyanka R. ◽

Mohammed Ibrahim M. ◽

Ranjith Kumar M.

Keyword(s):

Large Scale ◽

Customer Segmentation ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Frequent Patterns ◽

Itemset Mining ◽

Large Scale Data ◽

Player Tracking ◽

Frequent Items ◽

Scale Data

In today’s world, voluminous data are available which are generated from various sources in various forms. Mining or analyzing this large scale data in an efficient way so as to make them useful for the mankind is difficult with the existing approaches. Frequent itemset mining is one such technique used for analyzing in many fields like finance, health care system where the main focus is gathering frequent patterns and grouping them to be meaningful inorder to gather useful insights from the data. Some major applications include customer segmentation in marketing, shopping cart analyses, management relationship, web usage mining, player tracking and so on. Many parallel algorithms, like Dist-Eclat Algorithm, Big FIM algorithm are available to perform large scale Frequent itemset mining. In Dist-Eclat algorithm, datasets are partitioned using Round Robin technique which uses a hybrid partitioning approach, which can improve the overall efficiency of the system. The system works as follows: Initially the data collected are distributed by mapreduce. Then the local frequent k-itmesets are computed using FP-Tree and sent to the map phase. Later the mining results are combined to the center node. Finally, global frequent itemsets are gathered by mapreduce. The proposed system is expected to improve in efficiency by using hybrid partitioning approach in the datasets based on the identification of frequent items.

Download Full-text

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.34102 ◽

2018 ◽

Vol 12 (2) ◽

pp. 149

Author(s):

Nur Rokhman ◽

Amelia Nursanti

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Programming Models ◽

Distributed Programming ◽

Itemset Mining ◽

Large Scale Data ◽

Mapreduce Model ◽

Large Scale Data Processing ◽

Scale Data

The implementation of parallel algorithms is very interesting research recently. Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. The implementation of parallel programming faces many difficulties. The Cascading gives easy scheme of Hadoop system which implements MapReduce model.Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data. This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata.The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n2/m).

Download Full-text

Frequent Itemset Mining in Large Datasets a Survey

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2017100103 ◽

2017 ◽

Vol 7 (4) ◽

pp. 37-49

Author(s):

Amrit Pal ◽

Manish Kumar

Keyword(s):

Large Scale ◽

Parallel Implementation ◽

Complete Information ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Data Parallel ◽

Itemset Mining ◽

Large Scale Data ◽

Day By Day ◽

Scale Data

Frequent Itemset Mining is a well-known area in data mining. Most of the techniques available for frequent itemset mining requires complete information about the data which can result in generation of the association rules. The amount of data is increasing day by day taking form of BigData, which require changes in the algorithms for working on such large-scale data. Parallel implementation of the mining techniques can provide solutions to this problem. In this paper a survey of frequent itemset mining techniques is done which can be used in a parallel environment. Programming models like Map Reduce provides efficient architecture for working with BigData, paper also provides information about issues and feasibility about technique to be implemented in such environment.

Download Full-text

GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Information Sciences ◽

10.1016/j.ins.2018.01.046 ◽

2018 ◽

Vol 439-440 ◽

pp. 19-38 ◽

Cited By ~ 11

Author(s):

Kang-Wook Chon ◽

Sang-Hyun Hwang ◽

Min-Soo Kim

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Mining Method ◽

Itemset Mining ◽

Large Scale Data ◽

Scale Data

Download Full-text

Grafting for combinatorial binary model using frequent itemset mining

Data Mining and Knowledge Discovery ◽

10.1007/s10618-019-00657-9 ◽

2019 ◽

Vol 34 (1) ◽

pp. 101-123 ◽

Cited By ~ 1

Author(s):

Taito Lee ◽

Shin Matsushima ◽

Kenji Yamanishi

Keyword(s):

Large Scale ◽

Computational Cost ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Binary Model ◽

High Knowledge ◽

Linear Predictors ◽

Computational Difficulty ◽

High Computational Cost

Abstract We consider the class of linear predictors over all logical conjunctions of binary attributes, which we refer to as the class of combinatorial binary models (CBMs) in this paper. CBMs are of high knowledge interpretability but naïve learning of them from labeled data requires exponentially high computational cost with respect to the length of the conjunctions. On the other hand, in the case of large-scale datasets, long conjunctions are effective for learning predictors. To overcome this computational difficulty, we propose an algorithm, GRAfting for Binary datasets (GRAB), which efficiently learns CBMs within the $$L_1$$L1-regularized loss minimization framework. The key idea of GRAB is to adopt weighted frequent itemset mining for the most time-consuming step in the grafting algorithm, which is designed to solve large-scale $$L_1$$L1-RERM problems by an iterative approach. Furthermore, we experimentally showed that linear predictors of CBMs are effective in terms of prediction accuracy and knowledge discovery.

Download Full-text

Enhancing the Performance of Large-scale Profitable Itemset Mining using Efficient Data Structures

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8151.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1768-1772

Keyword(s):

Data Structures ◽

Large Scale ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Second Phase ◽

Itemset Mining ◽

Efficient Data ◽

Profit Value ◽

Efficient Data Structures

The process of extracting the most frequently bought items from a transactional database is termed as frequent itemset mining. Although it provides us with an idea of the best-selling itemsets, the method fails to identify the most profitable items from the database. It is not uncommon to have minimal intersection between frequent itemsets and profitable itemsets, and the process of extracting the most profitable itemsets is termed as Greater Profitable Itemset (GPI) mining. There have been various approaches to mine GPI in which [7] proposed a two-phased algorithm to optimize regeneration of GPI when the profit value of any item changes. This constituted of keeping track of the pruned items in the first phase and using it to efficiently regenerate GPI in the second phase. This paper proposes an enhancement to the way these changes are tracked by storing the pruned itemsets according to their constituent items, unlike the earlier algorithm that stored records iteration wise. By storing the itemsets according to their constituent items, we make sure that only the required items are being retrieved. In contrast, the earlier algorithm would fetch all the items pruned in any iteration, regardless of its relevance. By fetching only relevant itemset, the proposed method would significantly bring down the computational requirements.

Download Full-text

Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/2541268.2541271 ◽

2013 ◽

Vol 7 (4) ◽

pp. 1-39 ◽

Cited By ~ 5

Author(s):

Antonella Guzzo ◽

Luigi Moccia ◽

Domenico Saccà ◽

Edoardo Serra

Keyword(s):

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Linear Programs ◽

Itemset Mining

Download Full-text

Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth

International Journal of Engineering Technology and Management Sciences ◽

10.46647/ijetms.2020.v04i06.001 ◽

2020 ◽

Vol 4 (6) ◽

pp. 1-4

Author(s):

Jismy Joseph ◽

Kesavaraj G

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Frequent Patterns ◽

Apriori Algorithm ◽

Itemset Mining ◽

Mining Algorithms ◽

Time And Space Complexity

Nowadays the Frequentitemset mining (FIM) is an essential task for retrieving frequently occurring patterns, correlation, events or association in a transactional database. Understanding of such frequent patterns helps to take substantial decisions in decisive situations. Multiple algorithms are proposed for finding such patterns, however the time and space complexity of these algorithms rapidly increases with number of items in a dataset. So it is necessary to analyze the efficiency of these algorithms by using different datasets. The aim of this paper is to evaluate theperformance of frequent itemset mining algorithms, Apriori and Frequent Pattern (FP) growth by comparing their features. This study shows that the FP-growth algorithm is more efficient than the Apriori algorithm for generating rules and frequent pattern mining.

Download Full-text

Frequent Itemset Mining on Large-Scale Shared Memory Machines

2011 IEEE International Conference on Cluster Computing ◽

10.1109/cluster.2011.69 ◽

2011 ◽

Cited By ~ 9

Author(s):

Yan Zhang ◽

Fan Zhang ◽

Jason Bakos

Keyword(s):

Shared Memory ◽

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

An Intersection Cache Based on Frequent Itemset Mining in Large Scale Search Engines

2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb) ◽

10.1109/hotweb.2015.17 ◽

2015 ◽

Cited By ~ 1

Author(s):

Wanwan Zhou ◽

Ruixuan Li ◽

Xinhua Dong ◽

Zhiyong Xu ◽

Weijun Xiao

Keyword(s):

Search Engines ◽

Large Scale ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text