Analysis and Evaluation of Schemes for Secure Sum in Collaborative Frequent Itemset Mining across Horizontally Partitioned Data

Journal of Engineering ◽

10.1155/2014/470416 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Nirali R. Nanavati ◽

Prakash Lalwani ◽

Devesh C. Jinwala

Keyword(s):

Privacy Preservation ◽

State Of The Art ◽

Empirical Evaluation ◽

Research Direction ◽

Frequent Itemset ◽

Public Key ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Partitioned Data ◽

Real Market

Privacy preservation while undertaking collaborative distributed frequent itemset mining (PPDFIM) is an important research direction. The current state of the art for privacy preservation in distributed frequent itemset mining for secure sum in a horizontally partitioned data model comprises primarily public key based homomorphic schemes which are expensive in terms of the communication and computation cost. The nonpublic key based existing state-of-the-art scheme by Clifton et al. used for secure sum in PPDFIM is efficient but prone to security attacks. In this paper, we propose Shamir’s secret sharing based approaches and a symmetric key based scheme to calculate the secure sum in PPDFIM. These schemes are information theoretically secure under the standard assumptions. We further give a detailed theoretical and empirical evaluation of our proposed schemes for PPDFIM using a real market basket dataset. Our experimental analysis also shows that our schemes perform better in terms of the execution cost compared to the public key based scheme for secure sum in PPDFIM.

Download Full-text

A novel privacy-preserving scheme for collaborative frequent itemset mining across vertically partitioned data

Security and Communication Networks ◽

10.1002/sec.1377 ◽

2015 ◽

Vol 8 (18) ◽

pp. 4407-4420 ◽

Cited By ~ 7

Author(s):

Nirali R. Nanavati ◽

Devesh C. Jinwala

Keyword(s):

Privacy Preserving ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Mining Frequent Weighted Itemsets without Storing Transaction IDs and Generating Candidates

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500052 ◽

2017 ◽

Vol 25 (01) ◽

pp. 111-144 ◽

Cited By ~ 20

Author(s):

Gangin Lee ◽

Unil Yun ◽

Keun Ho Ryu

Keyword(s):

State Of The Art ◽

Frequent Itemset ◽

Experimental Results ◽

Frequent Itemset Mining ◽

Memory Usage ◽

Tree Structures ◽

Prefix Tree ◽

Itemset Mining ◽

Mining Methods ◽

Mining Algorithms

Weighted itemset mining, which is one of the important areas in frequent itemset mining, is an approach for mining meaningful itemsets considering different importance or weights for each item in databases. Because of the merit of the weighted itemset mining, various related works have been studied actively. As one of the methods in the weighted itemset mining, FWI (Frequent Weighted Itemset) mining calculates weights of transactions from weights of items and then finds FWIs based on the transaction weights. However, previous FWI mining methods still have limitations in terms of runtime and memory usage performance. For this reason, in this paper, we propose two algorithms for mining FWIs more efficiently from databases with weights of items. In contrast to the previous approaches storing transaction IDs for mining FWIs, the proposed methods employ new types of prefix tree structures and mine these patterns more efficiently without storing any transaction ID. Through extensive experimental results in this paper, we show that the proposed algorithms outperform state-of-the-art FWI mining algorithms in terms of runtime, memory usage, and scalability.

Download Full-text

Parallel State of the Art Algorithms for Frequent Itemset Mining – A Concise Descriptive Summary of Scalable Approaches

International Journal of Scientific Research and Management ◽

10.18535/ijsrm/v5i6.34 ◽

2017 ◽

Author(s):

Nafisur Rahman ◽

◽

Samar Wazir ◽

Keyword(s):

State Of The Art ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i3.316320 ◽

2018 ◽

Vol 6 (3) ◽

pp. 316-320

Author(s):

Basavaraj A. Goudannavar ◽

◽

Prashant Bhat ◽

Keyword(s):

Knowledge Discovery ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Inverse Frequent Itemset Mining Based on FP-Tree

Journal of Software ◽

10.3724/sp.j.1001.2008.00338 ◽

2008 ◽

Vol 19 (2) ◽

pp. 338-350 ◽

Cited By ~ 2

Author(s):

Yu-Hong GUO

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Mathematics ◽

10.3390/math9040450 ◽

2021 ◽

Vol 9 (4) ◽

pp. 450

Author(s):

Gergely Honti ◽

János Abonyi

Keyword(s):

Climate Change ◽

Extraction Process ◽

Knowledge Extraction ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Multilayer Network ◽

Interdisciplinary Science ◽

Academic Knowledge ◽

Itemset Mining

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

Download Full-text