Semantic-enabled architecture for auditable privacy-preserving data analysis

Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data centrally, which means losing the potential of data analytics and learning from aggregated user data. To enable organisations to leverage the full-potential of the collected personal data, two main technical challenges need to be addressed: (i) organisations must preserve the privacy of individual users and honour their consent, while (ii) being able to provide data and algorithmic governance, e.g., in the form of audit trails, to increase trust in the result and support reproducibility of the data analysis tasks performed on the collected data. Such an auditable, privacy-preserving data analysis is currently challenging to achieve, as existing methods and tools only offer partial solutions to this problem, e.g., data representation of audit trails and user consent, automatic checking of usage policies or data anonymisation. To the best of our knowledge, there exists no approach providing an integrated architecture for auditable, privacy-preserving data analysis. To address these gaps, as the main contribution of this paper, we propose the WellFort approach, a semantic-enabled architecture for auditable, privacy-preserving data analysis which provides secure storage for users’ sensitive data with explicit consent, and delivers a trusted, auditable analysis environment for executing data analytic processes in a privacy-preserving manner. Additional contributions include the adaptation of Semantic Web technologies as an integral part of the WellFort architecture, and the demonstration of the approach through a feasibility study with a prototype supporting use cases from the medical domain. Our evaluation shows that WellFort enables privacy preserving analysis of data, and collects sufficient information in an automated way to support its auditability at the same time.

Download Full-text

Record linkage of routine data with cohorts’ data of infants under European and Portuguese law

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.178 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Doetsch ◽

I Lopes ◽

R Redinha ◽

H Barros

Keyword(s):

Big Data ◽

Data Processing ◽

Data Protection ◽

Record Linkage ◽

Data Science ◽

Personal Data ◽

Routine Data ◽

Cohort Data ◽

Education Data ◽

Explicit Consent

Abstract The usage and exchange of “big data” is at the forefront of the data science agenda where Record Linkage plays a prominent role in biomedical research. In an era of ubiquitous data exchange and big data, Record Linkage is almost inevitable, but raises ethical and legal problems, namely personal data and privacy protection. Record Linkage refers to the general merging of data information to consolidate facts about an individual or an event that are not available in a separate record. This article provides an overview of ethical challenges and research opportunities in linking routine data on health and education with cohort data from very preterm (VPT) infants in Portugal. Portuguese, European and International law has been reviewed on data processing, protection and privacy. A three-stage analysis was carried out: i) interplay of threefold law-levelling for Record Linkage at different levels; ii) impact of data protection and privacy rights for data processing, iii) data linkage process' challenges and opportunities for research. A framework to discuss the process and its implications for data protection and privacy was created. The GDPR functions as utmost substantial legal basis for the protection of personal data in Record Linkage, and explicit written consent is considered the appropriate basis for the processing sensitive data. In Portugal, retrospective access to routine data is permitted if anonymised; for health data if it meets data processing requirements declared with an explicit consent; for education data if the data processing rules are complied. Routine health and education data can be linked to cohort data if rights of the data subject and requirements and duties of processors and controllers are respected. A strong ethical context through the application of the GDPR in all phases of research need to be established to achieve Record Linkage between cohort and routine collected records for health and education data of VPT infants in Portugal. Key messages GDPR is the most important legal framework for the protection of personal data, however, its uniform approach granting freedom to its Member states hampers Record Linkage processes among EU countries. The question remains whether the gap between data protection and privacy is adequately balanced at three legal levels to guarantee freedom for research and the improvement of health of data subjects.

Download Full-text

Differentially Private Image Classification Using Support Vector Machine and Differential Privacy

Machine Learning and Knowledge Extraction ◽

10.3390/make1010029 ◽

2019 ◽

Vol 1 (1) ◽

pp. 483-491 ◽

Cited By ~ 6

Author(s):

Makhamisa Senekane

Keyword(s):

Support Vector Machine ◽

Data Analysis ◽

Image Classification ◽

Differential Privacy ◽

Privacy Preserving ◽

Global Optimum ◽

Support Vector ◽

Sensitive Data ◽

Radiological Images ◽

Golden Standard

The ubiquity of data, including multi-media data such as images, enables easy mining and analysis of such data. However, such an analysis might involve the use of sensitive data such as medical records (including radiological images) and financial records. Privacy-preserving machine learning is an approach that is aimed at the analysis of such data in such a way that privacy is not compromised. There are various privacy-preserving data analysis approaches such as k-anonymity, l-diversity, t-closeness and Differential Privacy (DP). Currently, DP is a golden standard of privacy-preserving data analysis due to its robustness against background knowledge attacks. In this paper, we report a scheme for privacy-preserving image classification using Support Vector Machine (SVM) and DP. SVM is chosen as a classification algorithm because unlike variants of artificial neural networks, it converges to a global optimum. SVM kernels used are linear and Radial Basis Function (RBF), while ϵ -differential privacy was the DP framework used. The proposed scheme achieved an accuracy of up to 98%. The results obtained underline the utility of using SVM and DP for privacy-preserving image classification.

Download Full-text

A Data Protection Framework for Learning Analytics

Journal of Learning Analytics ◽

10.18608/jla.2016.31.6 ◽

2016 ◽

Vol 3 (1) ◽

Cited By ~ 16

Author(s):

Andrew Nicholas Cormack

Keyword(s):

Data Protection ◽

Learning Analytics ◽

Personal Data ◽

Informed Choice ◽

Ethical Framework ◽

Early Interventions ◽

Sensitive Data ◽

Analysis Pattern ◽

Open Questions ◽

Automated Processing

Most studies on the use of digital student data adopt an ethical framework derived from human-studies research, based on the informed consent of the experimental subject. However consent gives universities little guidance on the use of learning analytics as a routine part of educational provision: which purposes are legitimate and which analyses involve an unacceptable risk of harm. Obtaining consent when students join a course will not give them meaningful control over their personal data three or more years later. Relying on consent may exclude those most likely to benefit from early interventions. This paper proposes an alternative framework based on European Data Protection law. Separating the processes of analysis (pattern-finding) and intervention (pattern-matching) gives students and staff continuing protection from inadvertent harm during data analysis; students have a fully informed choice whether or not to accept individual interventions; organisations obtain clear guidance: how to conduct analysis, which analyses should not proceed, and when and how interventions should be offered. The framework provides formal support for practices that are already being adopted and helps with several open questions in learning analytics, including its application to small groups and alumni, automated processing and privacy-sensitive data.

Download Full-text

A Novel Cryptographic Approach for Privacy Preserving in Big Data Analysis for Sensitive Data

International Journal of Data Mining Techniques and Applications ◽

10.20894/ijdmta.102.006.001.006 ◽

2017 ◽

Vol 6 (1) ◽

pp. 33-35

Author(s):

Sujatha K ◽

◽

Rajesh N ◽

Udayarani V ◽

◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Privacy Preserving ◽

Sensitive Data

Download Full-text

Autonomia informacyjna jednostki a zgoda na przetwarzanie przez pracodawcę danych osobowych

Przegląd Sejmowy ◽

10.31268/ps.2020.80 ◽

2020 ◽

Vol 6(161) ◽

pp. 47-67

Author(s):

Karol Grzybowski

Keyword(s):

Data Protection ◽

Personal Data ◽

Self Determination ◽

Sensitive Data ◽

Personal Data Protection

By adapting the provisions of the Labour Code to EU regulations on personal data protection, the legislator has explicitly allowed employers to process personal data of employees and applicants for employment on the basis of their consent. However, the new provisions exclude the processing of data on convictions on this basis and limit the possibility of giving effective consent to the processing of sensitive data. The article attempts to analyze the solutions adopted in the context of the constitutional guarantee of informational self-determination. The author defends the thesis that the provisions of Article 221a § 1 and Article 221b § 1 of the Labour Code disproportionately interfere with an individual’s right to dispose of data concerning him or her. These provisions do not meet the criterion of the intervention’s necessity. The protective goal of the regulation, as established by the legislator, may be achieved by means of the legal instruments indicated in the article, which do not undermine the freedom aspect of the informational self-determination.

Download Full-text

Reporting, recording, and communication of COVID-19 cases in workplace: data protection as a moving target

Journal of Law and the Biosciences ◽

10.1093/jlb/lsaa008 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Mahsa Shabani ◽

Tom Goffin ◽

Heidi Mertes

Keyword(s):

Data Collection ◽

Data Protection ◽

Personal Data ◽

Privacy Preserving ◽

Moving Target ◽

National Data ◽

Risks And Benefits ◽

Guidelines And Recommendations

Abstract In response to concerns related to privacy in the context of coronavirus disease 2019 (COVID-19), recently European and national Data Protection Authorities (DPAs) issued guidelines and recommendations addressing a variety of issues related to the processing of personal data for preventive purposes. One of the recurring questions in these guidelines is related to the rights and responsibilities of employers and employees in reporting, recording, and communicating COVID-19 cases in workplace. National DPAs in some cases adopted different approaches regarding duties in reporting and communicating the COVID-19 cases; however, they unanimously stressed the importance of adopting privacy-preserving approaches to avoid raising concerns about surveillance and stigmatization. We stress that in view of the increasing use of new data collection and sharing tools such as ‘tracing and warning’ apps, the associated privacy-related risks should be evaluated on an ongoing manner. In addition, the intricacies of different settings where such apps may be used should be taken into consideration when assessing the associated risks and benefits.

Download Full-text

Privacy & Data Protection in Sport Industry

SPORT AND SOCIETY ◽

10.36836/2020/1/12 ◽

2020 ◽

pp. 1-9

Author(s):

Tataru Stefan Razvan ◽

Irene Nica

Keyword(s):

Data Protection ◽

Daily Life ◽

Personal Data ◽

Sensitive Data ◽

Sports Industry ◽

Sport Industry ◽

European Regulations ◽

The Impact ◽

General Regulation ◽

Sports Activities

Sports activities attract an impressive number of participants, manifesting themselves in a multitude of forms, in leisure or performance sports, in and out of the sports ground. In the context in which the sports industry processes a variety of personal data of athletes, including sensitive data such as information concerning health, we aim to analyse the impact of the General Regulation on the protection of personal data in sports activities. In the first part of the study we analysed the incidence of sport in daily life and the forms of organization of sports structures. Subsequently, we focused our attention in particular on the way in which the personal data of the athletes are processed, the rights they enjoy under the new European regulations and the measures that the operators should ensure for the protection of these data.

Download Full-text

Generalklauseln im Datenschutzrecht

Die Verwaltung ◽

10.3790/verw.54.1.1 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-35

Author(s):

Nikolaus Marsch ◽

Timo Rademacher

Keyword(s):

Data Processing ◽

Data Protection ◽

Personal Data ◽

Constitutional Court ◽

Sensitive Data ◽

Public Authorities ◽

Public Purpose ◽

Basic Law ◽

Data Protection Law ◽

High Degree

German data protection laws all provide for provisions that allow public authorities to process personal data whenever this is ‘necessary’ for the respective authority to fulfil its tasks or, in the case of sensitive data in the meaning of art. 9 GDPR, if this is ‘absolutely necessary’. Therewith, in theory, data protection law provides for a high degree of administrative flexibility, e. g. to cope with unforeseen situations like the Coronavirus pandemic. However, these provisions, referred to in German doctrine as ‘Generalklauseln’ (general clauses or ‘catch-all’-provisions in English), are hardly used, as legal orthodoxy assumes that they are too vague to form a sufficiently clear legal basis for public purpose processing under the strict terms of the German fundamental right to informational self-determination (art. 2‍(1), 1‍(1) German Basic Law). As this orthodoxy appears to be supported by case law of the German Constitutional Court, legislators have dutifully reacted by creating a plethora of sector specific laws and provisions to enable data processing by public authorities. As a consequence, German administrative data protection law has become highly detailed and confusing, even for legal experts, therewith betraying the very purpose of legal clarity and foreseeability that scholars intended to foster by requiring ever more detailed legal bases. In our paper, we examine the reasons that underlie the German ‘ban’ on using the ‘Generalklauseln’. We conclude that the reasons do not justify the ban in general, but only in specific areas and/or processing situations such as security and criminal law. Finally, we list several arguments that do speak in favour of a more ‘daring’ approach when it comes to using the ‘Generalklauseln’ for public purpose data processing.

Download Full-text

Semantic Disclosure Control: semantics meets data privacy

Online Information Review ◽

10.1108/oir-03-2017-0090 ◽

2018 ◽

Vol 42 (3) ◽

pp. 290-303 ◽

Cited By ~ 1

Author(s):

Montserrat Batet ◽

David Sánchez

Keyword(s):

Data Protection ◽

Privacy Protection ◽

Information Disclosure ◽

Data Privacy ◽

Personal Data ◽

Current Data ◽

Sensitive Data ◽

Content Type ◽

Disclosure Control ◽

Statistical Approaches

Purpose To overcome the limitations of purely statistical approaches to data protection, the purpose of this paper is to propose Semantic Disclosure Control (SeDC): an inherently semantic privacy protection paradigm that, by relying on state of the art semantic technologies, rethinks privacy and data protection in terms of the meaning of the data. Design/methodology/approach The need for data protection mechanisms able to manage data from a semantic perspective is discussed and the limitations of statistical approaches are highlighted. Then, SeDC is presented by detailing how it can be enforced to detect and protect sensitive data. Findings So far, data privacy has been tackled from a statistical perspective; that is, available solutions focus just on the distribution of the data values. This contrasts with the semantic way by which humans understand and manage (sensitive) data. As a result, current solutions present limitations both in preventing disclosure risks and in preserving the semantics (utility) of the protected data. Practical implications SeDC captures more general, realistic and intuitive notions of privacy and information disclosure than purely statistical methods. As a result, it is better suited to protect heterogenous and unstructured data, which are the most common in current data release scenarios. Moreover, SeDC preserves the semantics of the protected data better than statistical approaches, which is crucial when using protected data for research. Social implications Individuals are increasingly aware of the privacy threats that the uncontrolled collection and exploitation of their personal data may produce. In this respect, SeDC offers an intuitive notion of privacy protection that users can easily understand. It also naturally captures the (non-quantitative) privacy notions stated in current legislations on personal data protection. Originality/value On the contrary to statistical approaches to data protection, SeDC assesses disclosure risks and enforces data protection from a semantic perspective. As a result, it offers more general, intuitive, robust and utility-preserving protection of data, regardless their type and structure.

Download Full-text

Examining Compliance with Personal Data Protection Regulations in Interorganizational Data Analysis

Sustainability ◽

10.3390/su132011459 ◽

2021 ◽

Vol 13 (20) ◽

pp. 11459

Author(s):

Szu-Chuang Li ◽

Yi-Wen Chen ◽

Yennun Huang

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Protection ◽

Privacy Preservation ◽

Personal Data ◽

Data Sets ◽

Data Set ◽

General Data Protection Regulation ◽

Personal Data Protection ◽

The Government

The development of big data analysis technologies has changed how organizations work. Tech giants, such as Google and Facebook, are well positioned because they possess not only big data sets but also the in-house capability to analyze them. For small and medium-sized enterprises (SMEs), which have limited resources, capacity, and a relatively small collection of data, the ability to conduct data analysis collaboratively is key. Personal data protection regulations have become stricter due to incidents of private data being leaked, making it more difficult for SMEs to perform interorganizational data analysis. This problem can be resolved by anonymizing the data such that reidentifying an individual is no longer a concern or by deploying technical procedures that enable interorganizational data analysis without the exchange of actual data, such as data deidentification, data synthesis, and federated learning. Herein, we compared the technical options and their compliance with personal data protection regulations from several countries and regions. Using the EU’s GDPR (General Data Protection Regulation) as the main point of reference, technical studies, legislative studies, related regulations, and government-sponsored reports from various countries and regions were also reviewed. Alignment of the technical description with the government regulations and guidelines revealed that the solutions are compliant with the personal data protection regulations. Current regulations require “reasonable” privacy preservation efforts from data controllers; potential attackers are not assumed to be experts with knowledge of the target data set. This means that relevant requirements can be fulfilled without considerably sacrificing data utility. However, the potential existence of an extremely knowledgeable adversary when the stakes of data leakage are high still needs to be considered carefully.

Download Full-text