scholarly journals DAMS: A Distributed Analytics Metadata Schema

2021 ◽  
pp. 1-17
Author(s):  
Sascha Welten ◽  
Laurenz Neumann ◽  
Yeliz Ucer Yediel ◽  
Luiz Olavo Bonino da Silva Santos ◽  
Stefan Decker ◽  
...  

Abstract In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for datasets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schema’s compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.

Logistics ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 46
Author(s):  
Houssein Hellani ◽  
Layth Sliman ◽  
Abed Ellatif Samhat ◽  
Ernesto Exposito

Data transparency is essential in the modern supply chain to improve trust and boost collaboration among partners. In this context, Blockchain is a promising technology to provide full transparency across the entire supply chain. However, Blockchain was originally designed to provide full transparency and uncontrolled data access. This leads many market actors to avoid Blockchain as they fear for their confidentiality. In this paper, we highlight the requirements and challenges of supply chain transparency. We then investigate a set of supply chain projects that tackle data transparency issues by utilizing Blockchain in their core platform in different manners. Furthermore, we analyze the projects’ techniques and the tools utilized to customize transparency. As a result of the projects’ analyses, we identified that further enhancements are needed to set a balance between the data transparency and process opacity required by different partners, to ensure the confidentiality of their processes and to control access to sensitive data.


2021 ◽  
Author(s):  
Mark Howison ◽  
Mintaka Angell ◽  
Michael Hicklen ◽  
Justine S. Hastings

A Secure Data Enclave is a system that allows data owners to control data access and ensure data security while facilitating approved uses of data by other parties. This model of data use offers additional protections and technical controls for the data owner compared to the more commonly used approach of transferring data from the owner to another party through a data sharing agreement. Under the data use model, the data owner retains full transparency and auditing over the other party’s access, which can be difficult to achieve in practice with even the best legal instrument for data sharing. We describe the key technical requirements for a Secure Data Enclave and provide a reference architecture for its implementation on the Amazon Web Services platform using managed cloud services.


Author(s):  
Oksana Romaniuk ◽  
Bohdan Zadvornyi

The article is devoted to theoretical and methodological substantiations of the body flexibility development practically applying the stretching techniques. It was generalized scientific data on the organization and methodological features of stretching exercises. Semantic content and structural componential model of stretching usage in the process of flexibility development and the estimation of the changes of this characteristic according to the age were carried out. In particular, some parameters were highlighted especially which allow to recommend that methodology both for individual and group usage were analyzed. Besides, it was analyzed the diversity of physiological mechanism of the influence of stretching on human body, especially it was singled out the effect on mental and physical spheres of human being. The generalized scientific data on the theoretical and practical aspects of flexibility development with the help of stretching techniques indicate the priority of usage of this method in many types of physical activities irrespective of the scope of its practical application.


2014 ◽  
Vol 8 (2) ◽  
pp. 13-24 ◽  
Author(s):  
Arkadiusz Liber

Introduction: Medical documentation ought to be accessible with the preservation of its integrity as well as the protection of personal data. One of the manners of its protection against disclosure is anonymization. Contemporary methods ensure anonymity without the possibility of sensitive data access control. it seems that the future of sensitive data processing systems belongs to the personalized method. In the first part of the paper k-Anonymity, (X,y)- Anonymity, (α,k)- Anonymity, and (k,e)-Anonymity methods were discussed. these methods belong to well - known elementary methods which are the subject of a significant number of publications. As the source papers to this part, Samarati, Sweeney, wang, wong and zhang’s works were accredited. the selection of these publications is justified by their wider research review work led, for instance, by Fung, Wang, Fu and y. however, it should be noted that the methods of anonymization derive from the methods of statistical databases protection from the 70s of 20th century. Due to the interrelated content and literature references the first and the second part of this article constitute the integral whole.Aim of the study: The analysis of the methods of anonymization, the analysis of the methods of protection of anonymized data, the study of a new security type of privacy enabling device to control disclosing sensitive data by the entity which this data concerns.Material and methods: Analytical methods, algebraic methods.Results: Delivering material supporting the choice and analysis of the ways of anonymization of medical data, developing a new privacy protection solution enabling the control of sensitive data by entities which this data concerns.Conclusions: In the paper the analysis of solutions for data anonymization, to ensure privacy protection in medical data sets, was conducted. the methods of: k-Anonymity, (X,y)- Anonymity, (α,k)- Anonymity, (k,e)-Anonymity, (X,y)-Privacy, lKc-Privacy, l-Diversity, (X,y)-linkability, t-closeness, confidence Bounding and Personalized Privacy were described, explained and analyzed. The analysis of solutions of controlling sensitive data by their owner was also conducted. Apart from the existing methods of the anonymization, the analysis of methods of the protection of anonymized data was included. In particular, the methods of: δ-Presence, e-Differential Privacy, (d,γ)-Privacy, (α,β)-Distributing Privacy and protections against (c,t)-isolation were analyzed. Moreover, the author introduced a new solution of the controlled protection of privacy. the solution is based on marking a protected field and the multi-key encryption of sensitive value. The suggested way of marking the fields is in accordance with Xmlstandard. For the encryption, (n,p) different keys cipher was selected. to decipher the content the p keys of n were used. The proposed solution enables to apply brand new methods to control privacy of disclosing sensitive data.


Author(s):  
John Jenkins ◽  
Xiaocheng Zou ◽  
Houjun Tang ◽  
Dries Kimpe ◽  
Robert Ross ◽  
...  

2016 ◽  
pp. 1756-1773
Author(s):  
Grzegorz Spyra ◽  
William J. Buchanan ◽  
Peter Cruickshank ◽  
Elias Ekonomou

This paper proposes a new identity, and its underlying meta-data, model. The approach enables secure spanning of identity meta-data across many boundaries such as health-care, financial and educational institutions, including all others that store and process sensitive personal data. It introduces the new concepts of Compound Personal Record (CPR) and Compound Identifiable Data (CID) ontology, which aim to move toward own your own data model. The CID model ensures authenticity of identity meta-data; high availability via unified Cloud-hosted XML data structure; and privacy through encryption, obfuscation and anonymity applied to Ontology-based XML distributed content. Additionally CID via XML ontologies is enabled for identity federation. The paper also suggests that access over sensitive data should be strictly governed through an access control model with granular policy enforcement on the service side. This includes the involvement of relevant access control model entities, which are enabled to authorize an ad-hoc break-glass data access, which should give high accountability for data access attempts.


2014 ◽  
Vol 25 (3) ◽  
pp. 48-71 ◽  
Author(s):  
Stepan Kozak ◽  
David Novak ◽  
Pavel Zezula

The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space and ensures data privacy and thus is suitable for search systems outsourced for example in a cloud environment. In comparison with other approaches, the index is fully dynamic (update operations are efficient) and its aim is to transfer as much load from clients to the server as possible.


2005 ◽  
Vol 44 (02) ◽  
pp. 161-167 ◽  
Author(s):  
J. L. Oliveira ◽  
J. P. Sanchez ◽  
V. López-Alonso ◽  
F. Martin-Sanchez ◽  
V. Maojo ◽  
...  

Summary Objectives: The goal of this paper is to identify how Grid technology can be applied for the development and deployment of integration systems, bringing together distributed and heterogeneous biomedical information sources for medical applications. Methods: The integration of new genetic and medical knowledge in clinical workflows requires the development of new paradigms for information management in which the ability to access and relate disparate data sources is essential. We adopt a requirements perspective based on the user needs we have identified in the development of the INFOGENMED system to assess current Grid technology against those requirements. Results: The gap between Grid features and distributed biomedical information integration needs is characterized. Results from prospective studies are also reported. Conclusions: Grid infrastructures offer advanced features for the deployment of collaborative computational environments across virtual organizations. New Grid developments are in line with the problem of multiple site information integration. From the INFOGENMED point of view, Grid infrastructures need to evolve to implement structured data access services and semantic content description and discovery.


Computers ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 1 ◽  
Author(s):  
Yeong-Cherng Hsu ◽  
Chih-Hsin Hsueh ◽  
Ja-Ling Wu

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.


Sign in / Sign up

Export Citation Format

Share Document