scholarly journals Swapping-based Data Sanitization Method for Hiding Sensitive Frequent Itemset in Transaction Database

Author(s):  
Dedi Gunawan ◽  
Yusuf Sulistyo Nugroho ◽  
Maryam -
2018 ◽  
Vol 7 (4.1) ◽  
pp. 134
Author(s):  
Julaily Aida Jusoh ◽  
Mustafa Man ◽  
Wan Aezwani Wan Abu Bakar

Pattern mining refers to a subfield of data mining that uncovers interesting, unexpected, and useful patterns from transaction databases. Such patterns reflect frequent and infrequent patterns. An abundant literature has dedicated in frequent pattern mining and tremendous efficient algorithms for frequent itemset mining in the transaction database. Nonetheless, the infrequent pattern mining has emerged to be an interesting issue in discovering patterns that rarely occur in the transaction database. More researchers reckon that rare pattern occurrences may offer valuable information in knowledge data discovery process. The R-Eclat is a novel algorithm that determines infrequent patterns in the transaction database. The multiple variants in the R-Eclat algorithm generate varied performances in infrequent mining patterns. This paper proposes IF-Postdiffset as a new variant in R-Eclat algorithm. This paper also highlights the performance of infrequent mining pattern from the transaction database among different variants of the R-Eclat algorithm regarding its execution time.   


In recent year, frequent Itemset Mining (FIM) has occurred as a vital role in data mining tasks. The search of FIM in a transactions data is discovered in this paper, pull out hidden pattern from transactions data. The main two limitation of the Apriori algorithm are undertaken, first, its scans the complete Databases at every passes to compute the supports of every itemset produced and secondly, the user defined responsive to variation of min_sup (minimum supports) thresholds. In this paper, proposed methodology called frequent Itemset Mining in unique Scan (FIMUS), needs a scan only one time of transaction databases to extract frequent itemsets. The generation of a static numbers of candidate Itemset is an exclusive feature, individually from the threshold of min_sup, which reduces the execution time for huge database. The proposed algorithm FIMUS is compared with Apriori algorithm using benchmark database for a dense databases. The experimental result confirms the scalability of FIMUS.


2019 ◽  
Vol 3 (2) ◽  
pp. 9
Author(s):  
Ibnu Rusydi

<p><em>Sales transaction data is a very valuable asset in business processes. Not only is it used to calculate profits and money, but large amounts of transaction data can also be used for various purposes to generate new knowledge (knowledge) in the transaction database. Ways that can be done for data processing and generate new knowledge from the data is to use data mining techniques. The technique used in this case is the FP-Growth Algorithm. The data structure used is a tree called FP-Tree. By using FP-Tree, FP-growth Algorithm can directly extract Itemset from FP-Tree. Research conducted by collecting data related to research in the case studio at Medan Haji Hospital Pharmacy where the variables taken are daily drug transaction data. The results of this study are part of the new knowledge of this sales data by applying the FP-Growth Algorithm that uses the concept of FP-Tree development in finding Frequent Itemset that is useful for the development of investment plans in the study areas taken.</em></p><p><em> </em></p><p><em>Keywords: Data Mining, Association Rules, Frequent Itemset, FP-Growth.</em></p>


2015 ◽  
Vol 713-715 ◽  
pp. 1765-1768
Author(s):  
Hui Wang

We present a new algorithm for mining maximal frequent itemsets, MaxMining, from big transaction databases. MaxMining employs the depth-first traversal and iterative method. It re-represents the transaction database by vertical tidset format, travels the search space with effective pruning strategies which reduces the search space dramatically. MaxMining removes all the non-maximal frequent itemsets to get the exact set of maximal frequent itemsets directly, no need to enumerate all the frequent itemsets from smaller ones step by step. It backtracks to the proper ancestor directly, needless level by level, ignoring those redundant frequent itemsets. We found that MaxMining can be more effective to find all the maximal frequent itemsets from big databases than many of proposed algorithms with ordinary pruning strategies.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-30
Author(s):  
Guangtao Wang ◽  
Gao Cong ◽  
Ying Zhang ◽  
Zhen Hai ◽  
Jieping Ye

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 450
Author(s):  
Gergely Honti ◽  
János Abonyi

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.


Sign in / Sign up

Export Citation Format

Share Document