Modifying Transactional Databases to Hide Sensitive Association Rules

Author(s):  
Syam Menon ◽  
Abhijeet Ghoshal ◽  
Sumit Sarkar

Although firms recognize the value in sharing data with supply chain partners, many remain reluctant to share for fear of sensitive information potentially making its way to competitors. Approaches that can help hide sensitive information could alleviate such concerns and increase the number of firms that are willing to share. Sensitive information in transactional databases often manifests itself in the form of association rules. The sensitive association rules can be concealed by altering transactions so that they remain hidden when the data are mined by the partner. The problem of hiding these rules in the data are computationally difficult (NP-hard), and extant approaches are all heuristic in nature. To our knowledge, this is the first paper that introduces the problem as a nonlinear integer formulation to hide the sensitive association rule while minimizing the alterations needed in the data set. We apply transformations that linearize the constraints and derive various results that help reduce the size of the problem to be solved. Our results show that although the nonlinear integer formulations are not practical, the linearizations and problem-reduction steps make a significant impact on solvability and solution time. This approach mitigates potential risks associated with sharing and should increase data sharing among supply chain partners.

2020 ◽  
Vol 31 (2) ◽  
pp. 473-490 ◽  
Author(s):  
Abhijeet Ghoshal ◽  
Jing Hao ◽  
Syam Menon ◽  
Sumit Sarkar

Although retailers recognize the potential value of sharing transactional data with supply chain partners, many remain reluctant to share. However, there is evidence that the extent of sharing would be greater if information sensitive to retailers can be concealed before sharing. Extant research has only considered sensitive information at the organizational level. This is rarely the case in reality; the retail industry has adapted their offerings to region-wide differences in customer tastes for decades. Differences in customer characteristics across regions lead to region-specific sensitive information in addition to any at the organizational level. This is the first paper to propose an approach to solve this version of the problem. Region-level requirements increase the size of an already difficult (NP-hard) problem substantially, making adaptations of existing approaches impractical. We present an ensemble approach that draws intuition from Lagrangian relaxation to conceal sensitive patterns at the organizational and regional levels with minimal damage to the data set. Extensive computational experiments show that it identifies optimal or near-optimal solutions even when other approaches fail, doing so without any loss in recommendation effectiveness. This mitigates potential risks associated with sharing and should increase data sharing among partners in the supply chain.


Author(s):  
Ling Feng

The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ? beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ? Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined.


Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


Author(s):  
G. Bhavani ◽  
S. Sivakumari

Data mining process extracts useful information from a large amount of data. The most interesting part of data mining is discovering the unseen patterns without unpacking sensitive knowledge. Privacy Preserving Data Mining abbreviated as PPDM deals with the issue of sustaining the privacy of information. This methodology covers the sensitive information from disclosure. PPDM techniques are established for hiding the sensitive information even after performing the data mining. One of the practices to hide the sensitive association rules is termed as association rule hiding. The main objective of association rule hiding algorithm is to slightly adjust the original database so that no sensitive association rule is derived from it. The following article presents a detailed survey of various association rule hiding techniques for preserving privacy in data mining. At first, different techniques developed by previous researchers are studied in detail. Then, a comparative analysis is carried out to know the limitations of each technique and then providing a suggestion for future improvement in association rule hiding for privacy preservation.


Author(s):  
Suma B. ◽  
Shobha G.

<div>Association rule mining is a well-known data mining technique used for extracting hidden correlations between data items in large databases. In the majority of the situations, data mining results contain sensitive information about individuals and publishing such data will violate individual secrecy. The challenge of association rule mining is to preserve the confidentiality of sensitive rules when releasing the database to external parties. The association rule hiding technique conceals the knowledge extracted by the sensitive association rules by modifying the database. In this paper, we introduce a border-based algorithm for hiding sensitive association rules. The main purpose of this approach is to conceal the sensitive rule set while maintaining the utility of the database and association rule mining results at the highest level. The performance of the algorithm in terms of the side effects is demonstrated using experiments conducted on two real datasets. The results show that the information loss is minimized without sacrificing the accuracy. </div>


Author(s):  
Meera Sharma ◽  
Abhishek Tandon ◽  
Madhu Kumari ◽  
V. B. Singh

Bug triaging is a process to decide what to do with newly coming bug reports. In this paper, we have mined association rules for the prediction of bug assignee of a newly reported bug using different bug attributes, namely, severity, priority, component and operating system. To deal with the problem of large data sets, we have taken subsets of data set by dividing the large data set using [Formula: see text]-means clustering algorithm. We have used an Apriori algorithm in MATLAB to generate association rules. We have extracted the association rules for top 5 assignees in each cluster. The proposed method has been empirically validated on 14,696 bug reports of Mozilla open source software project, namely, Seamonkey, Firefox and Bugzilla. In our approach, we observe that taking on these attributes (severity, priority, component and operating system) as antecedents, essential rules are more than redundant rules, whereas in [M. Sharma and V. B. Singh, Clustering-based association rule mining for bug assignee prediction, Int. J. Business Intell. Data Mining 11(2) (2017) 130–150.] essential rules are less than redundant rules in every cluster. The proposed method provides an improvement over the existing techniques for bug assignment problem.


2020 ◽  
Vol 7 (2) ◽  
pp. 262-276
Author(s):  
Alexander J.P. Sibarani

Dengan adanya kegiatan transaksi penjualan setiap hari, data semakin lama akan semakin bertambah banyak. Data tersebut tidak hanya berfungsi sebagai arsip bagi perusahaan, data tersebut dapat dimanfaatkan dan diolah menjadi informasi untuk meningkatan penjualan obat. Permasalahan yang sering timbul di Apotik Pusaka Arta yaitu sering sekali penjualan obat yang diinginkan konsumen tidak ada atau habis karena apotek tidak memperhatikan stok, apotek tidak memanfaatkan data transaksi penjualan yang ada dan biasanya data transaksi penjualan tersebut hanya menjadi arsip yang tidak dimanfaatkan. Untuk memecahkan masalah tersebut, maka dibuatlah aplikasi Data mining menggunakan Algoritma Apriori. Metode yang dipakai penulis dalam menerapkan penelitian ini adalah Association Rules. Asociation Rule merupakan suatu teknik dalam data mining untuk menentukan hubungan antar item dalam satu data set (sekumpulan data) yang telah ditentukan. Teknik ini mencari kemungkinan kombinasi yang sering muncul (frequenct) dari suatu itemset (sekumpulan item). Dalam penelitian ini Association Rule berfungsi untuk menganalisa beberapa sering suatu obat yang sering dijual secara bersamaan, analisis ini akan ditinjau dari data transaksi yang telah terjadi. Penerapan Algoritma Apriori dalam aplikasi ini berhasil mencari kombinasi item terbanyak berdasarkan data transaksi dan kemudian membentuk pola asosiasi dari kombinasi item tersebut. Hasil aplikasi ini dapat mengetahui apa saja obat yang sering dibeli oleh konsumen secara bersamaan sehingga dapat mengetahui pola penjualan obat.  


2006 ◽  
Vol 532-533 ◽  
pp. 1024-1027 ◽  
Author(s):  
Shou Ning Qu ◽  
Qin Wang ◽  
Kui Liu ◽  
De Jun Xu

In this paper, the association rule algorithm and its defects were analyzed. An improved algorithm was put forward for applying it to analysis the association of products fittings in SCM. The application of improved algorithm can mine which kinds of fittings or sets of items being matched to get a salable product or gain the higher profit. So it can not only instruct customer’s consumption but also can help the entrepreneur make a detailed and efficient internal plan.


Author(s):  
Vladimír Bartík

Association rules are one of the most frequently used types of knowledge discovered from databases. The problem of discovering association rules was first introduced in (Agrawal, Imielinski & Swami, 1993). Here, association rules are discovered from transactional databases –a set of transactions where a transaction is a set of items. An association rule is an expression of a form A?B where A and B are sets of items. A typical application is market basket analysis. Here, the transaction is the content of a basket and items are products. For example, if a rule milk ? juice ? coffee is discovered, it is interpreted as: “If the customer buys milk and juice, s/he is likely to buy coffee too.” These rules are called single-dimensional Boolean association rules (Han & Kamber, 2001). The potential usefulness of the rule is expressed by means of two metrics – support and confidence. A lot of algorithms have been developed for mining association rules in transactional databases. The best known is the Apriori algorithm (Agrawal & Srikant, 1994), which has many modifications, e.g. (Kotásek & Zendulka, 2000). These algorithms usually consist of two phases: discovery of frequent itemsets and generation of association rules from them. A frequent itemset is a set of items having support greater than a threshold called minimum support. Association rule generation is controlled by another threshold referred to as minimum confidence. Association rules discovered can have a more general form and their mining is more complex than mining rules from transactional databases. In relational databases, association rules are ordinarily discovered from data of one table (it can be the result of joining several other tables). The table can have many columns (attributes) defined on domains of different types. It is useful to distinguish two types of attributes. A categorical attribute (also called nominal) has a finite number of possible values with no ordering among the values (e.g. a country of a customer). A quantitative attribute is a numeric attribute, domain of which is infinite or very large. In addition, it has an implicit ordering among values (e.g. age and salary of a customer). An association rule (Age = [20…30]) ? (Country = “Czech Rep.”) ? (Salary = [1000$...2000$]) says that if the customer is between 20 and 30 and is from the Czech Republic, s/he is likely to earn between 1000$ and 2000$ per month. Such rules with two or more predicates (items) containing different attributes are also called multidimensional association rules. If some attributes of rules are quantitative, the rules are called quantitative association rules (Han & Kamber, 2001). If a table contains only categorical attributes, it is possible to use modified algorithms for mining association rules in transactional databases. The crucial problem is to process quantitative attributes because their domains are very large and these algorithms cannot be used. Quantitative attributes must be discretized. This article deals with mining multidimensional association rules from relational databases, with main focus on distance-based methods. One of them is a novel method developed by the authors.


2008 ◽  
Vol 17 (06) ◽  
pp. 1109-1129 ◽  
Author(s):  
BASILIS BOUTSINAS ◽  
COSTAS SIOTOS ◽  
ANTONIS GEROLIMATOS

One of the most important data mining problems is learning association rules of the form "90% of the customers that purchase product x also purchase product y". Discovering association rules from huge volumes of data requires substantial processing power. In this paper we present an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets. The proposed algorithm is based on partitioning the initial data set into subsets and processing each subset in parallel. The proposed algorithm can maintain the set of association rules that are extracted when applying an association rule mining algorithm to all the data, by reducing the support threshold during processing the subsets. The above are confirmed by empirical tests that we present and which also demonstrate the utility of the method.


Sign in / Sign up

Export Citation Format

Share Document