data anonymization Latest Research Papers

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

ACM Transactions on Privacy and Security ◽

10.1145/3484945 ◽

2022 ◽

Vol 25 (1) ◽

pp. 1-25

Author(s):

Sibghat Ullah Bazai ◽

Julian Jang-Jaccard ◽

Hooman Alavizadeh

Keyword(s):

Critical Analysis ◽

Data Privacy ◽

High Performance ◽

Distributed Processing ◽

Hybrid Approach ◽

Relative Size ◽

Optimal Number ◽

Data Anonymization ◽

Fine Grained ◽

Message Exchange

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.

Anonymization of Network Traces Data through Condensation-based Differential Privacy

Digital Threats: Research and Practice ◽

10.1145/3425401 ◽

2021 ◽

Vol 2 (4) ◽

pp. 1-23

Author(s):

Ahmed Aleroud ◽

Fan Yang ◽

Sai Chaithanya Pallaprolu ◽

Zhiyuan Chen ◽

George Karabatis

Keyword(s):

Data Analysis ◽

Intrusion Detection ◽

Differential Privacy ◽

User Behavior ◽

Primary Source ◽

Third Party ◽

Prototype System ◽

Detection Rates ◽

Data Anonymization ◽

Detection Techniques

Network traces are considered a primary source of information to researchers, who use them to investigate research problems such as identifying user behavior, analyzing network hierarchy, maintaining network security, classifying packet flows, and much more. However, most organizations are reluctant to share their data with a third party or the public due to privacy concerns. Therefore, data anonymization prior to sharing becomes a convenient solution to both organizations and researchers. Although several anonymization algorithms are available, few of them allow sufficient privacy (organization need), acceptable data utility (researcher need), and efficient data analysis at the same time. This article introduces a condensation-based differential privacy anonymization approach that achieves an improved tradeoff between privacy and utility compared to existing techniques and produces anonymized network trace data that can be shared publicly without lowering its utility value. Our solution also does not incur extra computation overhead for the data analyzer. A prototype system has been implemented, and experiments have shown that the proposed approach preserves privacy and allows data analysis without revealing the original data even when injection attacks are launched against it. When anonymized datasets are given as input to graph-based intrusion detection techniques, they yield almost identical intrusion detection rates as the original datasets with only a negligible impact.

Using Edge AI and Language Understanding for Predictive Modeling of Acute Medical Intoxications

The Journal of CIEES ◽

10.48149/jciees.2021.1.2.3 ◽

2021 ◽

Vol 1 (2) ◽

pp. 18-22

Author(s):

Strahil Sokolov ◽

Stanislava Georgieva

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Modeling ◽

Personal Data ◽

Experimental Results ◽

Language Understanding ◽

New Approach ◽

Data Anonymization ◽

Model Training

This paper presents a new approach to processing and categorization of text from patient documents in Bulgarian language using Natural Language Processing and Edge AI. The proposed algorithm contains several phases - personal data anonymization, pre-processing and conversion of text to vectors, model training and recognition. The experimental results in terms of achieved accuracy are comparable with modern approaches.

Learning Realistic Patterns from Visually Unrealistic Stimuli: Generalization and Data Anonymization

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13252 ◽

2021 ◽

Vol 72 ◽

pp. 1163-1214

Author(s):

Konstantinos Nikolaidis ◽

Stein Kristiansen ◽

Thomas Plagemann ◽

Vera Goebel ◽

Knut Liestøl ◽

...

Keyword(s):

Sleep Stage ◽

Original Data ◽

Training Data ◽

Data Sets ◽

Classification Models ◽

Data Anonymization ◽

Sleep Stage Classification ◽

Machine Learning Applications ◽

Anonymized Data ◽

Accuracy Difference

Good training data is a prerequisite to develop useful Machine Learning applications. However, in many domains existing data sets cannot be shared due to privacy regulations (e.g., from medical studies). This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such anonymized data. We explore the feasibility of learning implicitly from visually unrealistic, task-relevant stimuli, which are synthesized by exciting the neurons of a trained deep neural network. As such, neuronal excitation can be used to generate synthetic stimuli. The stimuli data is used to train new classification models. Furthermore, we extend this framework to inhibit representations that are associated with specific individuals. We use sleep monitoring data from both an open and a large closed clinical study, and Electroencephalogram sleep stage classification data, to evaluate whether (1) end-users can create and successfully use customized classification models, and (2) the identity of participants in the study is protected. Extensive comparative empirical investigation shows that different algorithms trained on the stimuli are able to generalize successfully on the same task as the original model. Architectural and algorithmic similarity between new and original models play an important role in performance. For similar architectures, the performance is close to that of using the original data (e.g., Accuracy difference of 0.56%-3.82%, Kappa coefficient difference of 0.02-0.08). Further experiments show that the stimuli can provide state-ofthe-art resilience against adversarial association and membership inference attacks.

Enhanced Data Privacy Using Vertical Fragmentation and Data Anonymization Techniques

10.3233/apc210292 ◽

2021 ◽

Author(s):

R Sudha ◽

G Pooja ◽

V Revathy ◽

S Dilip Kumar

Keyword(s):

Private Information ◽

Credit Card ◽

Data Privacy ◽

Detection System ◽

Fraud Detection ◽

Online Banking ◽

Data Anonymization ◽

Vertical Fragmentation ◽

Server Systems ◽

Privacy Models

The use of online net banking official sites has been rapidly increased now a days. In online transaction attackers need only little information to steal the private information of bank users and can do any kind of fraudulent activities. One of the major drawbacks of commercial losses in online banking is fraud detected by credit card fraud detection system, which has a significant impact on clients. Fraudulent transactions will be discovered after the transaction is completed in the existing novel privacy models. As a result, in this paper, three level server systems are implemented to partition the intermediate gateway with better security. User details, transaction details and account details are considered as sensitive attributes and stored in separate database. And also data suppression scheme to replace the string and numerical characters into special symbols to overcome the traditional cryptography schemes is implemented. The Quasi-Identifiers are hidden by using Anonymization algorithm so that the transactions can be done efficiently.

Towards utility-aware privacy-preserving sensor data anonymization in distributed IoT

10.1145/3486611.3492389 ◽

2021 ◽

Author(s):

Xin Yang

Keyword(s):

Privacy Preserving ◽

Sensor Data ◽

Data Anonymization

Wykorzystywanie danych personalnych zgromadzonych przypadkowo podczas operacji bezzałogowych statków powietrznych w świetle standardów prawa do prywatności gwarantowanego w Europejskiej Konwencji o Ochronie Praw Człowieka — zarys problemu

Studia nad Autorytaryzmem i Totalitaryzmem ◽

10.19195/2300-7249.43.1.21 ◽

2021 ◽

Vol 43 (1) ◽

pp. 331-346

Author(s):

Jakub Kociubiński

Keyword(s):

Public Space ◽

Research Question ◽

Data Gathering ◽

Personal Data ◽

Case Law ◽

European Convention ◽

Public Authorities ◽

Data Anonymization ◽

Aerial Platforms ◽

The One

The rapid growth of data-gathering technologies on the one hand has provided public authorities with a valuable tool for counteracting crimes, but on the other gave rise to concerns over potentially excessive intrusion into persons’ privacy. In order to mitigate the risk of authoritarian behavior stemming from a moral hazard arising out of ability to conduct an ever more eﬀective surveillance, public authorities must impose certain self-limitations with regards to the usage of such data. In this context, the use of unmanned aerial vehicles, which may serve other non-invigilation purposes, may inadvertently lead to collecting someone’s personal data. This paper provides a propaedeutic analysis of legal challenges associated with collateral collection of personal data through unmanned aerial platforms operated by public bodies, and the subsequent use of said data. The analysis will be carried out through the lens of the standards set out in the European Convention on Human Rights (ECHR). In order to provide an answer to the paper’s research question whether the current acquis on Article 8 of the ECHR setting out the basic right to privacy and exceptions thereof require adjustment, the analysis will begin with an overview of the existing case-law dedicated to the ECHR’s standards associated with collecting and processing personal data with an emphasis on its relevance to technical speciﬁ cities of drones operations. The inquiry will then focus on standards associated with operating unmanned platforms during which personal data may be collaterally collected in public places. While it stands to reason that anyone within such a public space must reasonably expect that his or her privacy will be somewhat limited, a distinction must be made between mere recording and the subsequent use of such data for a diﬀerent purpose that it was originally gathered. The next part of the analysis will cover a legal assessment of situations whereby sensors installed on a drone used by public authorities over public spaces will record persons within their domicile — place of living. The analysis carried out in this paper has led to conclusion that while the core of the pre-existing ECHR’s case-law can be successfully applied per analogiam to unmanned aerial platforms’ operations, due to technical and operational factors there is no feasible way to provide adequate information about whether a monitoring is conducted, who is carrying it out, etc., in a similar manner as this is being done in the case of stationary close-circuit cameras. Therefore, it is necessary to place a greater emphasis on ex oﬃcio data anonymization.

Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

Applied Sciences ◽

10.3390/app112210740 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10740

Author(s):

Jong Kim

Keyword(s):

Personal Information ◽

Privacy Preserving ◽

Online Privacy ◽

Data Publishing ◽

Sensitive Information ◽

Data Anonymization ◽

Query Result ◽

Individual Entity ◽

Privacy Preserving Data Publishing ◽

Increasing Demand

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the k-anonymity property in the query result datasets based on the statistical information without applying k-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.

Side-Channel Attacks on Query-Based Data Anonymization

10.1145/3460120.3484751 ◽

2021 ◽

Author(s):

Franziska Boenisch ◽

Reinhard Munz ◽

Marcel Tiepelt ◽

Simon Hanisch ◽

Christiane Kuhn ◽

...

Keyword(s):

Side Channel ◽

Side Channel Attacks ◽

Data Anonymization

L-Diversity for Data Analysis: Data Swapping with Customized Clustering

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012050 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012050

Author(s):

Thirupathi Lingala ◽

C Kishor Kumar Reddy ◽

B V Ramana Murthy ◽

Rajashekar Shastry ◽

YVSS Pragathi

Keyword(s):

Data Analysis ◽

Personal Information ◽

Original Data ◽

Analysis Data ◽

Privacy Concerns ◽

Data Anonymization ◽

Data Analyst ◽

Anonymized Data

Abstract Data anonymization should support the analysts who intend to use the anonymized data. Releasing datasets that contain personal information requires anonymization that balances privacy concerns while preserving the utility of the data. This work shows how choosing anonymization techniques with the data analyst requirements in mind improves effectiveness quantitatively, by minimizing the discrepancy between querying the original data versus the anonymized result, and qualitatively, by simplifying the workflow for querying the data.

data anonymization
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

Anonymization of Network Traces Data through Condensation-based Differential Privacy

Using Edge AI and Language Understanding for Predictive Modeling of Acute Medical Intoxications

Learning Realistic Patterns from Visually Unrealistic Stimuli: Generalization and Data Anonymization

Enhanced Data Privacy Using Vertical Fragmentation and Data Anonymization Techniques

Towards utility-aware privacy-preserving sensor data anonymization in distributed IoT

Wykorzystywanie danych personalnych zgromadzonych przypadkowo podczas operacji bezzałogowych statków powietrznych w świetle standardów prawa do prywatności gwarantowanego w Europejskiej Konwencji o Ochronie Praw Człowieka — zarys problemu

Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

Side-Channel Attacks on Query-Based Data Anonymization

L-Diversity for Data Analysis: Data Swapping with Customized Clustering

Export Citation Format

data anonymizationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

Anonymization of Network Traces Data through Condensation-based Differential Privacy

Using Edge AI and Language Understanding for Predictive Modeling of Acute Medical Intoxications

Learning Realistic Patterns from Visually Unrealistic Stimuli: Generalization and Data Anonymization

Enhanced Data Privacy Using Vertical Fragmentation and Data Anonymization Techniques

Towards utility-aware privacy-preserving sensor data anonymization in distributed IoT

Wykorzystywanie danych personalnych zgromadzonych przypadkowo podczas operacji bezzałogowych statków powietrznych w świetle standardów prawa do prywatności gwarantowanego w Europejskiej Konwencji o Ochronie Praw Człowieka — zarys problemu

Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

Side-Channel Attacks on Query-Based Data Anonymization

L-Diversity for Data Analysis: Data Swapping with Customized Clustering

data anonymization
Recently Published Documents