scholarly journals A new approximate method for mining frequent itemsets from big data

Author(s):  
Timur Valiullin ◽  
Zhexue Huang ◽  
Chenghao Wei ◽  
Jianfei Yin ◽  
Dingming Wu ◽  
...  

Mining frequent itemsets in transaction databases is an important task in many applications. It becomes more challenging when dealing with a large transaction database because traditional algorithms are not scalable due to the memory limit. In this paper, we propose a new approach for approximately mining of frequent itemsets in a big transaction database. Our approach is suitable for mining big transaction databases since it produces approximate frequent itemsets from a subset of the entire database, and can be implemented in a distributed environment. Our algorithm is able to efficiently produce high-accurate results, however it misses some true frequent itemsets. To address this problem and reduce the number of false negative frequent itemsets we introduce an additional parameter to the algorithm to discover most of the frequent itemsets contained in the entire data set. In this article, we show an empirical evaluation of the results of the proposed approach.

The development of massive amount of information from any source of group at any time, wherever and from any device which is termed as Big Data. The age group of big data becomes a dangerous challenge to grip, take out and access these data is short length of time. The detection of everyday itemsets is an significant issue of data mining which helps in engendering the qualitative information for the business insight and helps for the verdict makers. For the extracting the necessary itemsets from the big data a variety of big data logical techniques has been evolved such as relationship rule mining, genetic algorithm, mechanism learning, FP-growth algorithm etc. In this paper we suggest FP-ANN algorithm to promote the FP enlargement calculation with neural networks to maintain the feed forward approach. The recommend algorithm uses the Twitter social dataset for the collection of frequent itemsets and the proportional analysis of this approach is done using the different performance measuring parameters such as Precision, Recall, F-measure, Time complexity, Computation cost and time. The simulation of proposed work is done using the JDK, JavaBeans, and Wamp server software. The experimental results of projected algorithm gives better results in deference of time difficulty, computation cost and time also. It also gives enhanced results for the Precision, recall and F-measure.


Author(s):  
Ebrahim Ansari Chelche ◽  
G.H. Dastghaibyfard ◽  
M.H. Sadreddini ◽  
Morteza Keshtakaran ◽  
Hani Kaabi

2006 ◽  
Vol 28 (1) ◽  
pp. 23-36 ◽  
Author(s):  
Chedy Raïssi ◽  
Pascal Poncelet ◽  
Maguelonne Teisseire

2021 ◽  
Vol 25 (4) ◽  
pp. 907-927
Author(s):  
Adam Krechowicz

Proper data items distribution may seriously improve the performance of data processing in distributed environment. However, typical datastorage systems as well as distributed computational frameworks do not pay special attention to that aspect. In this paper author introduces two custom data items addressing methods for distributed datastorage on the example of Scalable Distributed Two-Layer Datastore. The basic idea of those methods is to preserve that data items stored on the same cluster node are similar to each other following concepts of data clustering. Still, most of the data clustering mechanisms have serious problem with data scalability which is a severe limitation in Big Data applications. The proposed methods allow to efficiently distribute data set over a set of buckets. As it was shown by the experimental results, all proposed methods generate good results efficiently in comparison to traditional clustering techniques like k-means, agglomerative and birch clustering. Distributed environment experiments shown that proper data distribution can seriously improve the effectiveness of Big Data processing.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 181688-181705 ◽  
Author(s):  
Muhammad Yasir ◽  
Muhammad Asif Habib ◽  
Muhammad Ashraf ◽  
Shahzad Sarwar ◽  
Muhammad Umar Chaudhry ◽  
...  

Author(s):  
Khadija A. Almohsen ◽  
Huda Al-Jobori

The growth in the usage of the web, especially e-commerce website, has led to the development of recommender system (RS) which aims in personalizing the web content for each user and reducing the cognitive load of information on the user. However, as the world enters Big Data era and lives through the contemporary data explosion, the main goal of a RS becomes to provide millions of high quality recommendations in few seconds for the increasing number of users and items. One of the successful techniques of RSs is collaborative filtering (CF) which makes recommendations for users based on what other like-mind users had preferred. Despite its success, CF is facing some challenges posed by Big Data, such as: scalability, sparsity and cold start. As a consequence, new approaches of CF that overcome the existing problems have been studied such as Singular value decomposition (SVD). This paper surveys the literature of RSs and reviews the current state of RSs with the main concerns surrounding them due to Big Data. Furthermore, it investigates thoroughly SVD, one of the promising approaches expected to perform well in tackling Big Data challenges, and provides an implementation to it using some of the successful Big Data tools (i.e. Apache Hadoop and Spark). This implementation is intended to validate the applicability of, existing contributions to the field of, SVD-based RSs as well as validated the effectiveness of Hadoop and spark in developing large-scale systems. The implementation has been evaluated empirically by measuring mean absolute error which gave comparable results with other experiments conducted, previously by other researchers, on a relatively smaller data set and non-distributed environment. This proved the scalability of SVD-based RS and its applicability to Big Data.


Author(s):  
Rashmi Awasthy ◽  
Rajesh Shrivastava ◽  
Bharat Solanki

Due to the increasing use of very large databases and data warehouses, mining useful information and helpful knowledge from transactions is evolving into an important research area. Frequent Itemsets (FI) Mining is one of the most researched areas of data mining. In order to mining privacy preserving frequent itemsets on large transaction database efficiently, a new approach was proposed in this paper.


Sign in / Sign up

Export Citation Format

Share Document