Privacy Preserving Anonymity for Periodical Releases of Spontaneous ADE Reporting Data: Algorithm Development and Validation (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28752 ◽

2021 ◽

Author(s):

Wen-Yang Lin ◽

Jie-Teng Wang

Keyword(s):

Adverse Drug Events ◽

Personal Information ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Personal Privacy ◽

Data Utility ◽

Data Anonymization ◽

Privacy Model ◽

Bounding Model

BACKGROUND Increasingly, spontaneous reporting systems (SRS) have been established to collect adverse drug events to foster the research of ADR detection and analysis. SRS data contains personal information and so its publication requires data anonymization to prevent the disclosure of individual privacy. We previously have proposed a privacy model called MS(k, θ*)-bounding and the associated MS-Anonymization algorithm to fulfill the anonymization of SRS data. In the real world, the SRS data usually are released periodically, e.g., FAERS, to accommodate newly collected adverse drug events. Different anonymized releases of SRS data available to the attacker may thwart our single-release-focus method, i.e., MS(k, θ*)-bounding. OBJECTIVE We investigate the privacy threat caused by periodical releases of SRS data and propose anonymization methods to prevent the disclosure of personal privacy information while maintain the utility of published data. METHODS We identify some potential attacks on periodical releases of SRS data, namely BFL-attacks, that are mainly caused by follow-up cases. We present a new privacy model called PPMS(k, θ*)-bounding, and propose the associated PPMS-Anonymization algorithm along with two improvements, PPMS+-Anonymization and PPMS++-Anonymization. Empirical evaluations were performed using 32 selected FAERS quarter datasets, from 2004Q1 to 2011Q4. The performance of the proposed three versions of PPMS-Anonymization were inspected against MS-Anonymization from some aspects, including data distortion, measured by Normalized Information Loss (NIS); privacy risk of anonymized data, measured by Dangerous Identity Ratio (DIR) and Dangerous Sensitivity Ratio (DSR); and data utility, measured by bias of signal counting and strength (PRR). RESULTS The results show that our new method can prevent privacy disclosure for periodical releases of SRS data with reasonable sacrifice of data utility and acceptable deviation of the strength of ADR signals. The best version of PPMS-Anonymization, PPMS++-Anonymization, achieves nearly the same quality as MS-Anonymization both in privacy protection and data utility. CONCLUSIONS The proposed PPMS(k, θ*)-bounding model and PPMS-Anonymization algorithm are effective in anonymizing SRS datasets in the periodical data publishing scenario, preventing the series of releases from the disclosure of personal sensitive information caused by BFL-attacks while maintaining the data utility for ADR signal detection.

Download Full-text

Anonymization Based on Improved Bucketization (AIB): A Privacy-Preserving Data Publishing Technique for Improving Data Utility in Healthcare Data

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3901 ◽

2021 ◽

Vol 11 (12) ◽

pp. 3164-3173

Author(s):

R. Indhumathi ◽

S. Sathiya Devi

Keyword(s):

Medical Information ◽

Threshold Value ◽

Privacy Preserving ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Data Utility ◽

Healthcare Data ◽

Privacy Preserving Data Publishing ◽

Horizontal Partitioning

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.

Download Full-text

K-Anonymity technique for privacy protection: a proof of concept study

10.5753/sbseg.2019.13987 ◽

2019 ◽

Author(s):

Italo Santos ◽

Emanuel Coutinho ◽

Leonardo Moreira

Keyword(s):

System Architecture ◽

Privacy Protection ◽

Personal Information ◽

Personal Space ◽

Sensitive Information ◽

Proof Of Concept ◽

Data Set ◽

Data Anonymization ◽

Privacy Model ◽

Anonymized Data

Privacy is a concept directly related to people's interest in maintaining personal space without the interference of others. In this paper, we focus on study the k-anonymity technique since many generalization algorithms are based on this privacy model. Due to this, we develop a proof of concept that uses the k-anonymity technique for data anonymization to anonymize data raw and generate a new ﬁle with anonymized data. We present the system architecture and detailed an experiment using the adult data set which has sensitive information, where each record corresponds to the personal information for a person. Finally, we summarize our work and discuss future works.

Download Full-text

Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

Applied Sciences ◽

10.3390/app112210740 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10740

Author(s):

Jong Kim

Keyword(s):

Personal Information ◽

Privacy Preserving ◽

Online Privacy ◽

Data Publishing ◽

Sensitive Information ◽

Data Anonymization ◽

Query Result ◽

Individual Entity ◽

Privacy Preserving Data Publishing ◽

Increasing Demand

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the k-anonymity property in the query result datasets based on the statistical information without applying k-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.

Download Full-text

Data Privacy Protection Based on Micro Aggregation with Dynamic Sensitive Attribute Updating

Sensors ◽

10.3390/s18072307 ◽

2018 ◽

Vol 18 (7) ◽

pp. 2307 ◽

Cited By ~ 2

Author(s):

Yancheng Shi ◽

Zhenjiang Zhang ◽

Han-Chieh Chao ◽

Bo Shen

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Large Scale ◽

Data Centers ◽

Personal Information ◽

Rapid Development ◽

Data Availability ◽

Personal Privacy ◽

Data Anonymization ◽

Data Privacy Protection

With the rapid development of information technology, large-scale personal data, including those collected by sensors or IoT devices, is stored in the cloud or data centers. In some cases, the owners of the cloud or data centers need to publish the data. Therefore, how to make the best use of the data in the risk of personal information leakage has become a popular research topic. The most common method of data privacy protection is the data anonymization, which has two main problems: (1) The availability of information after clustering will be reduced, and it cannot be flexibly adjusted. (2) Most methods are static. When the data is released multiple times, it will cause personal privacy leakage. To solve the problems, this article has two contributions. The first one is to propose a new method based on micro-aggregation to complete the process of clustering. In this way, the data availability and the privacy protection can be adjusted flexibly by considering the concepts of distance and information entropy. The second contribution of this article is to propose a dynamic update mechanism that guarantees that the individual privacy is not compromised after the data has been subjected to multiple releases, and minimizes the loss of information. At the end of the article, the algorithm is simulated with real data sets. The availability and advantages of the method are demonstrated by calculating the time, the average information loss and the number of forged data.

Download Full-text

A Secure Protocol for High-Dimensional Big Data Providing Data Privacy

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch015 ◽

2021 ◽

pp. 327-343

Author(s):

Anitha J. ◽

Prasad S. P.

Keyword(s):

Big Data ◽

Data Storage ◽

Data Privacy ◽

Personal Information ◽

Technological Development ◽

High Dimensional ◽

Sensitive Information ◽

Data Anonymization ◽

Secure Protocol ◽

Data Owner

Due to recent technological development, a huge amount of data generated by social networking, sensor networks, internet, etc., adds more challenges when performing data storage and processing tasks. During PPDP, the collected data may contain sensitive information about the data owner. Directly releasing this for further processing may violate the privacy of the data owner, hence data modification is needed so that it does not disclose any personal information. The existing techniques of data anonymization have a fixed scheme with a small number of dimensions. There are various types of attacks on the privacy of data like linkage attack, homogeneity attack, and background knowledge attack. To provide an effective technique in big data to maintain data privacy and prevent linkage attacks, this paper proposes a privacy preserving protocol, UNION, for a multi-party data provider. Experiments show that this technique provides a better data utility to handle high dimensional data, and scalability with respect to the data size compared with existing anonymization techniques.

Download Full-text

Personal Privacy Data Protection in Location Recommendation System

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012026 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012026

Author(s):

Linrui Han

Keyword(s):

Data Storage ◽

Data Protection ◽

Data Security ◽

Recommendation System ◽

Data Availability ◽

Data Publishing ◽

Published Data ◽

Personal Privacy ◽

Location Recommendation ◽

Privacy Issues

Abstract At present, there are many location-based recommendation algorithms and systems, including location calculation, route calculation, and so on. However, in the general information data publishing, the privacy issues in the published data have not been fully paid attention to and protected. The purpose of this article is to investigate the effectiveness of personal privacy data protection in location recommendation systems. This paper first introduces the basis and importance of research on data security and secrecy, analyses personal privacy issues in data publishing in the era of big data, summarizes the research status in the field of security and secrecy at home and abroad, and introduces the process of data security and the role of users in it. Then, some classic privacy security modules in this field are introduced, and the privacy of data storage security concepts in the current situation mentioned in this paper is analyzed. A geographic location-based privacy protection scheme in mobile cloud is proposed. Privacy analysis, sensitive attribute generalization information analysis, route synthesis analysis and related experiments are performed on the location recommendation system. The experimental results show that the scheme proposed in this paper is more secure and has less loss of data availability.

Download Full-text

A Secure Protocol for High-Dimensional Big Data Providing Data Privacy

Handbook of Research on Machine and Deep Learning Applications for Cyber Security - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-9611-0.ch016 ◽

2020 ◽

pp. 347-363

Author(s):

Anitha J. ◽

Prasad S. P.

Keyword(s):

Big Data ◽

Data Storage ◽

Data Privacy ◽

Personal Information ◽

Technological Development ◽

High Dimensional ◽

Sensitive Information ◽

Data Anonymization ◽

Secure Protocol ◽

Data Owner

Download Full-text

Evaluation of proposed amalgamated anonymization approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i3.pp1439-1446 ◽

2019 ◽

Vol 16 (3) ◽

pp. 1439

Author(s):

Deepak Narula ◽

Pardeep Kumar ◽

Shuchita Upadhyaya

Keyword(s):

Equivalence Class ◽

Class Size ◽

Information Loss ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Electronic Data ◽

Current Scenario ◽

Privacy Preserving Data Publishing ◽

Modern Era

<p>In the current scenario of modern era, providing security to an individual is always a matter of concern when a huge volume of electronic data is gathering daily. Now providing security to the gathered data is not only a matter of concern but also remains a notable topic of research. The concept of Privacy Preserving Data Publishing (PPDP) defines accessing the published data without disclosing the non required information about an individual. Hence PPDP faces the problem of publishing useful data while keeping the privacy about sensitive information about an individual. A variety of techniques for anonymization has been found in literature, but suffers from different kind of problems in terms of data information loss, discernibility and average equivalence class size. This paper proposes amalgamated approach along with its verification with respect to information loss, value of discernibility and the value of average equivalence class size metric. The result have been found encouraging as compared to existing <em>k-</em>anonymity based algorithms such as Datafly, Mondrian and Incognito on various publically available datasets.</p>

Download Full-text

Privacy Preservation and Analytical Utility of E-Learning Data Mashups in the Web of Data

Applied Sciences ◽

10.3390/app11188506 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8506

Author(s):

Mercedes Rodriguez-Garcia ◽

Antonio Balderas ◽

Juan Manuel Dodero

Keyword(s):

Learning Environments ◽

Privacy Preservation ◽

Personal Information ◽

Empirical Evaluation ◽

Published Data ◽

Data Sets ◽

Sensitive Information ◽

Essential Information ◽

Data Mashups ◽

Analytical Utility

Virtual learning environments contain valuable data about students that can be correlated and analyzed to optimize learning. Modern learning environments based on data mashups that collect and integrate data from multiple sources are relevant for learning analytics systems because they provide insights into students’ learning. However, data sets involved in mashups may contain personal information of sensitive nature that raises legitimate privacy concerns. Average privacy preservation methods are based on preemptive approaches that limit the published data in a mashup based on access control and authentication schemes. Such limitations may reduce the analytical utility of the data exposed to gain students’ learning insights. In order to reconcile utility and privacy preservation of published data, this research proposes a new data mashup protocol capable of merging and k-anonymizing data sets in cloud-based learning environments without jeopardizing the analytical utility of the information. The implementation of the protocol is based on linked data so that data sets involved in the mashups are semantically described, thereby enabling their combination with relevant educational data sources. The k-anonymized data sets returned by the protocol still retain essential information for supporting general data exploration and statistical analysis tasks. The analytical and empirical evaluation shows that the proposed protocol prevents individuals’ sensitive information from re-identifying.

Download Full-text

Anonymization of Daily Activity Data by Using ℓ-diversity Privacy Model

ACM Transactions on Management Information Systems ◽

10.1145/3456876 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1-21

Author(s):

Pooja Parameshwarappa ◽

Zhiyuan Chen ◽

Güneş Koru

Keyword(s):

Daily Activity ◽

Distance Measure ◽

Sensitive Information ◽

Activity Data ◽

Weighted Distance ◽

Data Utility ◽

Publishing Activity ◽

Privacy Model ◽

Multi Level ◽

Privacy Risks

In the age of IoT, collection of activity data has become ubiquitous. Publishing activity data can be quite useful for various purposes such as estimating the level of assistance required by older adults and facilitating early diagnosis and treatment of certain diseases. However, publishing activity data comes with privacy risks: Each dimension, i.e., the activity of a person at any given point in time can be used to identify a person as well as to reveal sensitive information about the person such as not being at home at that time. Unfortunately, conventional anonymization methods have shortcomings when it comes to anonymizing activity data. Activity datasets considered for publication are often flat with many dimensions but typically not many rows, which makes the existing anonymization techniques either inapplicable due to very few rows, or else either inefficient or ineffective in preserving utility. This article proposes novel multi-level clustering-based approaches using a non-metric weighted distance measure that enforce ℓ-diversity model. Experimental results show that the proposed methods preserve data utility and are orders more efficient than the existing methods.

Download Full-text