Anonymization Based on Improved Bucketization (AIB): A Privacy-Preserving Data Publishing Technique for Improving Data Utility in Healthcare Data

2021 ◽  
Vol 11 (12) ◽  
pp. 3164-3173
Author(s):  
R. Indhumathi ◽  
S. Sathiya Devi

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.

2019 ◽  
pp. 1518-1538
Author(s):  
Sowmyarani C. N. ◽  
Dayananda P.

Privacy attack on individual records has great concern in privacy preserving data publishing. When an intruder who is interested to know the private information of particular person of his interest, will acquire background knowledge about the person. This background knowledge may be gained though publicly available information such as Voter's id or through social networks. Combining this background information with published data; intruder may get the private information causing a privacy attack of that person. There are many privacy attack models. Most popular attack models are discussed in this chapter. The study of these attack models plays a significant role towards the invention of robust Privacy preserving models.


Author(s):  
Wei Chang ◽  
Jie Wu

Many smartphone-based applications need microdata, but publishing a microdata table may leak respondents' privacy. Conventional researches on privacy-preserving data publishing focus on providing identical privacy protection to all data requesters. Considering that, instead of trapping in a small coterie, information usually propagates from friend to friend. The authors study the privacy-preserving data publishing problem on a mobile social network. Along a propagation path, a series of tables will be locally created at each participant, and the tables' privacy-levels should be gradually enhanced. However, the tradeoff between these tables' overall utility and their individual privacy requirements are not trivial: any inappropriate sanitization operation under a lower privacy requirement may cause dramatic utility loss on the subsequent tables. For solving the problem, the authors propose an approximation algorithm by previewing the future privacy requirements. Extensive results show that this approach successfully increases the overall data utility, and meet the strengthening privacy requirements.


2010 ◽  
Vol 45 (1) ◽  
pp. 151-159 ◽  
Author(s):  
Michal Sramka

ABSTRACTMany databases contain data about individuals that are valuable for research, marketing, and decision making. Sharing or publishing data about individuals is however prone to privacy attacks, breaches, and disclosures. The concern here is about individuals’ privacy-keeping the sensitive information about individuals private to them. Data mining in this setting has been shown to be a powerful tool to breach privacy and make disclosures. In contrast, data mining can be also used in practice to aid data owners in their decision on how to share and publish their databases. We present and discuss the role and uses of data mining in these scenarios and also briefly discuss other approaches to private data analysis.


2015 ◽  
Vol 12 (4) ◽  
pp. 1193-1216 ◽  
Author(s):  
Hong Zhu ◽  
Shengli Tian ◽  
Genyuan Du ◽  
Meiyi Xie

In privacy preserving data publishing, to reduce the correlation loss between sensitive attribute (SA) and non-sensitive attributes(NSAs) caused by anonymization methods (such as generalization, anatomy, slicing and randomization, etc.), the records with same NSAs values should be divided into same blocks to meet the anonymizing demands of ?-diversity. However, there are often many blocks (of the initial partition), in which there are more than ? records with different SA values, and the frequencies of different SA values are uneven. Therefore, anonymization on the initial partition causes more correlation loss. To reduce the correlation loss as far as possible, in this paper, an optimizing model is first proposed. Then according to the optimizing model, the refining partition of the initial partition is generated, and anonymization is applied on the refining partition. Although anonymization on refining partition can be used on top of any existing partitioning method to reduce the correlation loss, we demonstrate that a new partitioning method tailored for refining partition could further improve data utility. An experimental evaluation shows that our approach could efficiently reduce correlation loss.


Author(s):  
Sowmyarani C. N. ◽  
Dayananda P.

Privacy attack on individual records has great concern in privacy preserving data publishing. When an intruder who is interested to know the private information of particular person of his interest, will acquire background knowledge about the person. This background knowledge may be gained though publicly available information such as Voter's id or through social networks. Combining this background information with published data; intruder may get the private information causing a privacy attack of that person. There are many privacy attack models. Most popular attack models are discussed in this chapter. The study of these attack models plays a significant role towards the invention of robust Privacy preserving models.


Author(s):  
Deepak Narula ◽  
Pardeep Kumar ◽  
Shuchita Upadhyaya

<p>In the current scenario of modern era, providing security to an individual is always a matter of concern when a huge volume of electronic data is gathering daily. Now providing security to the gathered data is not only a matter of concern but also remains a notable topic of research. The concept of Privacy Preserving Data Publishing (PPDP) defines accessing the published data without disclosing the non required information about an individual. Hence PPDP faces the problem of publishing useful data while keeping the privacy about sensitive information about an individual. A variety of techniques for anonymization has been found in literature, but suffers from different kind of problems in terms of data information loss, discernibility and average equivalence class size. This paper proposes amalgamated approach along with its verification with respect to information loss, value of discernibility and the value of average equivalence class size metric. The result have been found encouraging as compared to existing  <em>k-</em>anonymity based algorithms such as Datafly, Mondrian and Incognito on various publically available datasets.</p>


2021 ◽  
Author(s):  
Wen-Yang Lin ◽  
Jie-Teng Wang

BACKGROUND Increasingly, spontaneous reporting systems (SRS) have been established to collect adverse drug events to foster the research of ADR detection and analysis. SRS data contains personal information and so its publication requires data anonymization to prevent the disclosure of individual privacy. We previously have proposed a privacy model called MS(k, θ*)-bounding and the associated MS-Anonymization algorithm to fulfill the anonymization of SRS data. In the real world, the SRS data usually are released periodically, e.g., FAERS, to accommodate newly collected adverse drug events. Different anonymized releases of SRS data available to the attacker may thwart our single-release-focus method, i.e., MS(k, θ*)-bounding. OBJECTIVE We investigate the privacy threat caused by periodical releases of SRS data and propose anonymization methods to prevent the disclosure of personal privacy information while maintain the utility of published data. METHODS We identify some potential attacks on periodical releases of SRS data, namely BFL-attacks, that are mainly caused by follow-up cases. We present a new privacy model called PPMS(k, θ*)-bounding, and propose the associated PPMS-Anonymization algorithm along with two improvements, PPMS+-Anonymization and PPMS++-Anonymization. Empirical evaluations were performed using 32 selected FAERS quarter datasets, from 2004Q1 to 2011Q4. The performance of the proposed three versions of PPMS-Anonymization were inspected against MS-Anonymization from some aspects, including data distortion, measured by Normalized Information Loss (NIS); privacy risk of anonymized data, measured by Dangerous Identity Ratio (DIR) and Dangerous Sensitivity Ratio (DSR); and data utility, measured by bias of signal counting and strength (PRR). RESULTS The results show that our new method can prevent privacy disclosure for periodical releases of SRS data with reasonable sacrifice of data utility and acceptable deviation of the strength of ADR signals. The best version of PPMS-Anonymization, PPMS++-Anonymization, achieves nearly the same quality as MS-Anonymization both in privacy protection and data utility. CONCLUSIONS The proposed PPMS(k, θ*)-bounding model and PPMS-Anonymization algorithm are effective in anonymizing SRS datasets in the periodical data publishing scenario, preventing the series of releases from the disclosure of personal sensitive information caused by BFL-attacks while maintaining the data utility for ADR signal detection.


2021 ◽  
Vol 11 (22) ◽  
pp. 10740
Author(s):  
Jong Kim

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the k-anonymity property in the query result datasets based on the statistical information without applying k-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.


Author(s):  
Sowmyarani C. N. ◽  
Dayananda P.

Privacy attack on individual records has great concern in privacy preserving data publishing. When an intruder who is interested to know the private information of particular person of his interest, will acquire background knowledge about the person. This background knowledge may be gained though publicly available information such as Voter's id or through social networks. Combining this background information with published data; intruder may get the private information causing a privacy attack of that person. There are many privacy attack models. Most popular attack models are discussed in this chapter. The study of these attack models plays a significant role towards the invention of robust Privacy preserving models.


Author(s):  
Wei Chang ◽  
Jie Wu

Many smartphone-based applications need microdata, but publishing a microdata table may leak respondents' privacy. Conventional researches on privacy-preserving data publishing focus on providing identical privacy protection to all data requesters. Considering that, instead of trapping in a small coterie, information usually propagates from friend to friend. The authors study the privacy-preserving data publishing problem on a mobile social network. Along a propagation path, a series of tables will be locally created at each participant, and the tables' privacy-levels should be gradually enhanced. However, the tradeoff between these tables' overall utility and their individual privacy requirements are not trivial: any inappropriate sanitization operation under a lower privacy requirement may cause dramatic utility loss on the subsequent tables. For solving the problem, the authors propose an approximation algorithm by previewing the future privacy requirements. Extensive results show that this approach successfully increases the overall data utility, and meet the strengthening privacy requirements.


Sign in / Sign up

Export Citation Format

Share Document