data anonymization
Recently Published Documents


TOTAL DOCUMENTS

234
(FIVE YEARS 106)

H-INDEX

15
(FIVE YEARS 2)

2022 ◽  
Vol 25 (1) ◽  
pp. 1-25
Author(s):  
Sibghat Ullah Bazai ◽  
Julian Jang-Jaccard ◽  
Hooman Alavizadeh

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.


2021 ◽  
Vol 2 (4) ◽  
pp. 1-23
Author(s):  
Ahmed Aleroud ◽  
Fan Yang ◽  
Sai Chaithanya Pallaprolu ◽  
Zhiyuan Chen ◽  
George Karabatis

Network traces are considered a primary source of information to researchers, who use them to investigate research problems such as identifying user behavior, analyzing network hierarchy, maintaining network security, classifying packet flows, and much more. However, most organizations are reluctant to share their data with a third party or the public due to privacy concerns. Therefore, data anonymization prior to sharing becomes a convenient solution to both organizations and researchers. Although several anonymization algorithms are available, few of them allow sufficient privacy (organization need), acceptable data utility (researcher need), and efficient data analysis at the same time. This article introduces a condensation-based differential privacy anonymization approach that achieves an improved tradeoff between privacy and utility compared to existing techniques and produces anonymized network trace data that can be shared publicly without lowering its utility value. Our solution also does not incur extra computation overhead for the data analyzer. A prototype system has been implemented, and experiments have shown that the proposed approach preserves privacy and allows data analysis without revealing the original data even when injection attacks are launched against it. When anonymized datasets are given as input to graph-based intrusion detection techniques, they yield almost identical intrusion detection rates as the original datasets with only a negligible impact.


2021 ◽  
Vol 1 (2) ◽  
pp. 18-22
Author(s):  
Strahil Sokolov ◽  
Stanislava Georgieva

This paper presents a new approach to processing and categorization of text from patient documents in Bulgarian language using Natural Language Processing and Edge AI. The proposed algorithm contains several phases - personal data anonymization, pre-processing and conversion of text to vectors, model training and recognition. The experimental results in terms of achieved accuracy are comparable with modern approaches.


2021 ◽  
Vol 72 ◽  
pp. 1163-1214
Author(s):  
Konstantinos Nikolaidis ◽  
Stein Kristiansen ◽  
Thomas Plagemann ◽  
Vera Goebel ◽  
Knut Liestøl ◽  
...  

Good training data is a prerequisite to develop useful Machine Learning applications. However, in many domains existing data sets cannot be shared due to privacy regulations (e.g., from medical studies). This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such anonymized data. We explore the feasibility of learning implicitly from visually unrealistic, task-relevant stimuli, which are synthesized by exciting the neurons of a trained deep neural network. As such, neuronal excitation can be used to generate synthetic stimuli. The stimuli data is used to train new classification models. Furthermore, we extend this framework to inhibit representations that are associated with specific individuals. We use sleep monitoring data from both an open and a large closed clinical study, and Electroencephalogram sleep stage classification data, to evaluate whether (1) end-users can create and successfully use customized classification models, and (2) the identity of participants in the study is protected. Extensive comparative empirical investigation shows that different algorithms trained on the stimuli are able to generalize successfully on the same task as the original model. Architectural and algorithmic similarity between new and original models play an important role in performance. For similar architectures, the performance is close to that of using the original data (e.g., Accuracy difference of 0.56%-3.82%, Kappa coefficient difference of 0.02-0.08). Further experiments show that the stimuli can provide state-ofthe-art resilience against adversarial association and membership inference attacks.


2021 ◽  
Author(s):  
R Sudha ◽  
G Pooja ◽  
V Revathy ◽  
S Dilip Kumar

The use of online net banking official sites has been rapidly increased now a days. In online transaction attackers need only little information to steal the private information of bank users and can do any kind of fraudulent activities. One of the major drawbacks of commercial losses in online banking is fraud detected by credit card fraud detection system, which has a significant impact on clients. Fraudulent transactions will be discovered after the transaction is completed in the existing novel privacy models. As a result, in this paper, three level server systems are implemented to partition the intermediate gateway with better security. User details, transaction details and account details are considered as sensitive attributes and stored in separate database. And also data suppression scheme to replace the string and numerical characters into special symbols to overcome the traditional cryptography schemes is implemented. The Quasi-Identifiers are hidden by using Anonymization algorithm so that the transactions can be done efficiently.


2021 ◽  
Vol 43 (1) ◽  
pp. 331-346
Author(s):  
Jakub Kociubiński

The rapid growth of data-gathering technologies on the one hand has provided public authorities with a valuable tool for counteracting crimes, but on the other gave rise to concerns over potentially excessive intrusion into persons’ privacy. In order to mitigate the risk of authoritarian behavior stemming from a moral hazard arising out of ability to conduct an ever more effective surveillance, public authorities must impose certain self-limitations with regards to the usage of such data. In this context, the use of unmanned aerial vehicles, which may serve other non-invigilation purposes, may inadvertently lead to collecting someone’s personal data. This paper provides a propaedeutic analysis of legal challenges associated with collateral collection of personal data through unmanned aerial platforms operated by public bodies, and the subsequent use of said data. The analysis will be carried out through the lens of the standards set out in the European Convention on Human Rights (ECHR). In order to provide an answer to the paper’s research question whether the current acquis on Article 8 of the ECHR setting out the basic right to privacy and exceptions thereof require adjustment, the analysis will begin with an overview of the existing case-law dedicated to the ECHR’s standards associated with collecting and processing personal data with an emphasis on its relevance to technical specifi cities of drones operations. The inquiry will then focus on standards associated with operating unmanned platforms during which personal data may be collaterally collected in public places. While it stands to reason that anyone within such a public space must reasonably expect that his or her privacy will be somewhat limited, a distinction must be made between mere recording and the subsequent use of such data for a different purpose that it was originally gathered. The next part of the analysis will cover a legal assessment of situations whereby sensors installed on a drone used by public authorities over public spaces will record persons within their domicile — place of living. The analysis carried out in this paper has led to conclusion that while the core of the pre-existing ECHR’s case-law can be successfully applied per analogiam to unmanned aerial platforms’ operations, due to technical and operational factors there is no feasible way to provide adequate information about whether a monitoring is conducted, who is carrying it out, etc., in a similar manner as this is being done in the case of stationary close-circuit cameras. Therefore, it is necessary to place a greater emphasis on ex officio data anonymization. 


2021 ◽  
Vol 11 (22) ◽  
pp. 10740
Author(s):  
Jong Kim

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the k-anonymity property in the query result datasets based on the statistical information without applying k-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.


2021 ◽  
Author(s):  
Franziska Boenisch ◽  
Reinhard Munz ◽  
Marcel Tiepelt ◽  
Simon Hanisch ◽  
Christiane Kuhn ◽  
...  

2021 ◽  
Vol 2089 (1) ◽  
pp. 012050
Author(s):  
Thirupathi Lingala ◽  
C Kishor Kumar Reddy ◽  
B V Ramana Murthy ◽  
Rajashekar Shastry ◽  
YVSS Pragathi

Abstract Data anonymization should support the analysts who intend to use the anonymized data. Releasing datasets that contain personal information requires anonymization that balances privacy concerns while preserving the utility of the data. This work shows how choosing anonymization techniques with the data analyst requirements in mind improves effectiveness quantitatively, by minimizing the discrepancy between querying the original data versus the anonymized result, and qualitatively, by simplifying the workflow for querying the data.


Sign in / Sign up

Export Citation Format

Share Document