A new approximate method for mining frequent itemsets from big data

Mining frequent itemsets in transaction databases is an important task in many applications. It becomes more challenging when dealing with a large transaction database because traditional algorithms are not scalable due to the memory limit. In this paper, we propose a new approach for approximately mining of frequent itemsets in a big transaction database. Our approach is suitable for mining big transaction databases since it produces approximate frequent itemsets from a subset of the entire database, and can be implemented in a distributed environment. Our algorithm is able to efficiently produce high-accurate results, however it misses some true frequent itemsets. To address this problem and reduce the number of false negative frequent itemsets we introduce an additional parameter to the algorithm to discover most of the frequent itemsets contained in the entire data set. In this article, we show an empirical evaluation of the results of the proposed approach.

Download Full-text

Enhancement of Classification using FPFF-ANN for Big data Analysis in Distributed Environment

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5712.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1033-1040

Keyword(s):

Big Data ◽

Frequent Itemsets ◽

Age Group ◽

Distributed Environment ◽

Qualitative Information ◽

Rule Mining ◽

Computation Cost ◽

Server Software ◽

As Relationship ◽

F Measure

The development of massive amount of information from any source of group at any time, wherever and from any device which is termed as Big Data. The age group of big data becomes a dangerous challenge to grip, take out and access these data is short length of time. The detection of everyday itemsets is an significant issue of data mining which helps in engendering the qualitative information for the business insight and helps for the verdict makers. For the extracting the necessary itemsets from the big data a variety of big data logical techniques has been evolved such as relationship rule mining, genetic algorithm, mechanism learning, FP-growth algorithm etc. In this paper we suggest FP-ANN algorithm to promote the FP enlargement calculation with neural networks to maintain the feed forward approach. The recommend algorithm uses the Twitter social dataset for the collection of frequent itemsets and the proportional analysis of this approach is done using the different performance measuring parameters such as Precision, Recall, F-measure, Time complexity, Computation cost and time. The simulation of proposed work is done using the JDK, JavaBeans, and Wamp server software. The experimental results of projected algorithm gives better results in deference of time difficulty, computation cost and time also. It also gives enhanced results for the Precision, recall and F-measure.

Download Full-text

Mining Frequent Itemsets in Distributed Environment

Lecture Notes in Electrical Engineering - Trends in Communication Technologies and Engineering Science ◽

10.1007/978-1-4020-9532-0_22 ◽

2009 ◽

pp. 295-305

Author(s):

Ebrahim Ansari Chelche ◽

G.H. Dastghaibyfard ◽

M.H. Sadreddini ◽

Morteza Keshtakaran ◽

Hani Kaabi

Keyword(s):

Frequent Itemsets ◽

Distributed Environment ◽

Mining Frequent Itemsets

Download Full-text

An Algorithm for Mining Frequent Itemsets from Library Big Data

Journal of Software ◽

10.4304/jsw.9.9.2361-2365 ◽

2014 ◽

Vol 9 (9) ◽

Cited By ~ 4

Author(s):

Xingjian Li

Keyword(s):

Big Data ◽

Frequent Itemsets ◽

Mining Frequent Itemsets

Download Full-text

Towards a new approach for mining frequent itemsets on data stream

Journal of Intelligent Information Systems ◽

10.1007/s10844-006-0002-3 ◽

2006 ◽

Vol 28 (1) ◽

pp. 23-36 ◽

Cited By ~ 18

Author(s):

Chedy Raïssi ◽

Pascal Poncelet ◽

Maguelonne Teisseire

Keyword(s):

Data Stream ◽

Frequent Itemsets ◽

New Approach ◽

Mining Frequent Itemsets

Download Full-text

An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big Data

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Wireless Internet ◽

10.1007/978-3-030-06158-6_8 ◽

2019 ◽

pp. 76-85

Author(s):

Ching-Ming Chao ◽

Po-Zung Chen ◽

Shih-Yang Yang ◽

Cheng-Hung Yen

Keyword(s):

Big Data ◽

Frequent Itemsets ◽

Mining Frequent Itemsets

Download Full-text

Content-aware data distribution over cluster nodes

Intelligent Data Analysis ◽

10.3233/ida-205360 ◽

2021 ◽

Vol 25 (4) ◽

pp. 907-927

Author(s):

Adam Krechowicz

Keyword(s):

Big Data ◽

Data Processing ◽

Data Clustering ◽

Data Distribution ◽

Distributed Environment ◽

Data Set ◽

Clustering Techniques ◽

Big Data Applications ◽

Paper Author ◽

Content Aware

Proper data items distribution may seriously improve the performance of data processing in distributed environment. However, typical datastorage systems as well as distributed computational frameworks do not pay special attention to that aspect. In this paper author introduces two custom data items addressing methods for distributed datastorage on the example of Scalable Distributed Two-Layer Datastore. The basic idea of those methods is to preserve that data items stored on the same cluster node are similar to each other following concepts of data clustering. Still, most of the data clustering mechanisms have serious problem with data scalability which is a severe limitation in Big Data applications. The proposed methods allow to efficiently distribute data set over a set of buckets. As it was shown by the experimental results, all proposed methods generate good results efficiently in comparison to traditional clustering techniques like k-means, agglomerative and birch clustering. Distributed environment experiments shown that proper data distribution can seriously improve the effectiveness of Big Data processing.

Download Full-text

A New Approach for Collaborative Filtering Based on Mining Frequent Itemsets

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-36543-0_3 ◽

2013 ◽

pp. 19-29

Author(s):

Phung Do ◽

Vu Thanh Nguyen ◽

Tran Nam Dung

Keyword(s):

Collaborative Filtering ◽

Frequent Itemsets ◽

New Approach ◽

Mining Frequent Itemsets

Download Full-text

TRICE: Mining Frequent Itemsets by Iterative TRimmed Transaction LattICE in Sparse Big Data

IEEE Access ◽

10.1109/access.2019.2959878 ◽

2019 ◽

Vol 7 ◽

pp. 181688-181705 ◽

Cited By ~ 2

Author(s):

Muhammad Yasir ◽

Muhammad Asif Habib ◽

Muhammad Ashraf ◽

Shahzad Sarwar ◽

Muhammad Umar Chaudhry ◽

...

Keyword(s):

Big Data ◽

Frequent Itemsets ◽

Mining Frequent Itemsets

Download Full-text

Recommender Systems in Light of Big Data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i6.pp1553-1563 ◽

2015 ◽

Vol 5 (6) ◽

pp. 1553 ◽

Cited By ~ 2

Author(s):

Khadija A. Almohsen ◽

Huda Al-Jobori

Keyword(s):

Big Data ◽

Large Scale ◽

Absolute Error ◽

Web Content ◽

Distributed Environment ◽

Data Set ◽

Existing Problems ◽

Current State ◽

Value Decomposition ◽

The Web

The growth in the usage of the web, especially e-commerce website, has led to the development of recommender system (RS) which aims in personalizing the web content for each user and reducing the cognitive load of information on the user. However, as the world enters Big Data era and lives through the contemporary data explosion, the main goal of a RS becomes to provide millions of high quality recommendations in few seconds for the increasing number of users and items. One of the successful techniques of RSs is collaborative filtering (CF) which makes recommendations for users based on what other like-mind users had preferred. Despite its success, CF is facing some challenges posed by Big Data, such as: scalability, sparsity and cold start. As a consequence, new approaches of CF that overcome the existing problems have been studied such as Singular value decomposition (SVD). This paper surveys the literature of RSs and reviews the current state of RSs with the main concerns surrounding them due to Big Data. Furthermore, it investigates thoroughly SVD, one of the promising approaches expected to perform well in tackling Big Data challenges, and provides an implementation to it using some of the successful Big Data tools (i.e. Apache Hadoop and Spark). This implementation is intended to validate the applicability of, existing contributions to the field of, SVD-based RSs as well as validated the effectiveness of Hadoop and spark in developing large-scale systems. The implementation has been evaluated empirically by measuring mean absolute error which gave comparable results with other experiments conducted, previously by other researchers, on a relatively smaller data set and non-distributed environment. This proved the scalability of SVD-based RS and its applicability to Big Data.

Download Full-text

New Improved Algorithm for Mining Privacy - Preserving Frequent Itemsets

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2011.1001 ◽

2011 ◽

pp. 1-7

Author(s):

Rashmi Awasthy ◽

Rajesh Shrivastava ◽

Bharat Solanki

Keyword(s):

Data Mining ◽

Research Area ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Important Research ◽

New Approach ◽

Large Databases ◽

Very Large Databases ◽

Important Research Area ◽

Transaction Database

Due to the increasing use of very large databases and data warehouses, mining useful information and helpful knowledge from transactions is evolving into an important research area. Frequent Itemsets (FI) Mining is one of the most researched areas of data mining. In order to mining privacy preserving frequent itemsets on large transaction database efficiently, a new approach was proposed in this paper.

Download Full-text