Software-defined data protection

Most modern data processing pipelines run on top of a distributed storage layer, and securing the whole system, and the storage layer in particular, against accidental or malicious misuse is crucial to ensuring compliance to rules and regulations. Enforcing data protection and privacy rules, however, stands at odds with the requirement to achieve higher and higher access bandwidths and processing rates in large data processing pipelines. In this work we describe our proposal for the path forward that reconciles the two goals. We call our approach "Software-Defined Data Protection" (SDP). Its premise is simple, yet powerful: decoupling often changing policies from request-level enforcement allows distributed smart storage nodes to implement the latter at line-rate. Existing and future data protection frameworks can be translated to the same hardware interface which allows storage nodes to offload enforcement efficiently both for company-specific rules and regulations, such as GDPR or CCPA. While SDP is a promising approach, there are several remaining challenges to making this vision reality. As we explain in the paper, overcoming these will require collaboration across several domains, including security, databases and specialized hardware design.

Download Full-text

A Benchmark for Suitability of Alluxio over Spark

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8190.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 245-250

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Large Data ◽

Time Data ◽

Big Data Applications ◽

Access To Data

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Download Full-text

Record linkage of routine data with cohorts’ data of infants under European and Portuguese law

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.178 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Doetsch ◽

I Lopes ◽

R Redinha ◽

H Barros

Keyword(s):

Big Data ◽

Data Processing ◽

Data Protection ◽

Record Linkage ◽

Data Science ◽

Personal Data ◽

Routine Data ◽

Cohort Data ◽

Education Data ◽

Explicit Consent

Abstract The usage and exchange of “big data” is at the forefront of the data science agenda where Record Linkage plays a prominent role in biomedical research. In an era of ubiquitous data exchange and big data, Record Linkage is almost inevitable, but raises ethical and legal problems, namely personal data and privacy protection. Record Linkage refers to the general merging of data information to consolidate facts about an individual or an event that are not available in a separate record. This article provides an overview of ethical challenges and research opportunities in linking routine data on health and education with cohort data from very preterm (VPT) infants in Portugal. Portuguese, European and International law has been reviewed on data processing, protection and privacy. A three-stage analysis was carried out: i) interplay of threefold law-levelling for Record Linkage at different levels; ii) impact of data protection and privacy rights for data processing, iii) data linkage process' challenges and opportunities for research. A framework to discuss the process and its implications for data protection and privacy was created. The GDPR functions as utmost substantial legal basis for the protection of personal data in Record Linkage, and explicit written consent is considered the appropriate basis for the processing sensitive data. In Portugal, retrospective access to routine data is permitted if anonymised; for health data if it meets data processing requirements declared with an explicit consent; for education data if the data processing rules are complied. Routine health and education data can be linked to cohort data if rights of the data subject and requirements and duties of processors and controllers are respected. A strong ethical context through the application of the GDPR in all phases of research need to be established to achieve Record Linkage between cohort and routine collected records for health and education data of VPT infants in Portugal. Key messages GDPR is the most important legal framework for the protection of personal data, however, its uniform approach granting freedom to its Member states hampers Record Linkage processes among EU countries. The question remains whether the gap between data protection and privacy is adequately balanced at three legal levels to guarantee freedom for research and the improvement of health of data subjects.

Download Full-text

Reconciliation of anti-money laundering instruments and European data protection requirements in permissionless blockchain spaces

Journal of Cybersecurity ◽

10.1093/cybsec/tyab004 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Iwona Karasek-Wojciechowicz

Keyword(s):

Data Processing ◽

Money Laundering ◽

Data Protection ◽

Task Force ◽

Policy Instruments ◽

Personal Data ◽

General Data Protection Regulation ◽

Terrorist Financing ◽

The Government ◽

High Level

AbstractThis article is an attempt to reconcile the requirements of the EU General Data Protection Regulation (GDPR) and anti-money laundering and combat terrorist financing (AML/CFT) instruments used in permissionless ecosystems based on distributed ledger technology (DLT). Usually, analysis is focused only on one of these regulations. Covering by this research the interplay between both regulations reveals their incoherencies in relation to permissionless DLT. The GDPR requirements force permissionless blockchain communities to use anonymization or, at the very least, strong pseudonymization technologies to ensure compliance of data processing with the GDPR. At the same time, instruments of global AML/CFT policy that are presently being implemented in many countries following the recommendations of the Financial Action Task Force, counteract the anonymity-enhanced technologies built into blockchain protocols. Solutions suggested in this article aim to induce the shaping of permissionless DLT-based networks in ways that at the same time would secure the protection of personal data according to the GDPR rules, while also addressing the money laundering and terrorist financing risks created by transactions in anonymous blockchain spaces or those with strong pseudonyms. Searching for new policy instruments is necessary to ensure that governments do not combat the development of all privacy-blockchains so as to enable a high level of privacy protection and GDPR-compliant data processing. This article indicates two AML/CFT tools which may be helpful for shaping privacy-blockchains that can enable the feasibility of such tools. The first tool is exceptional government access to transactional data written on non-transparent ledgers, obfuscated by advanced anonymization cryptography. The tool should be optional for networks as long as another effective AML/CFT measures are accessible for the intermediaries or for the government in relation to a given network. If these other measures are not available and the network does not grant exceptional access, the regulations should allow governments to combat the development of those networks. Effective tools in that scope should target the value of privacy-cryptocurrency, not its users. Such tools could include, as a tool of last resort, state attacks which would undermine the trust of the community in a specific network.

Download Full-text

The Data Protection Impact Assessment, or: How the General Data Protection Regulation May Still Come to Foster Ethically Responsible Data Processing

SSRN Electronic Journal ◽

10.2139/ssrn.2695398 ◽

2015 ◽

Author(s):

Claudia Quelle

Keyword(s):

Data Processing ◽

Impact Assessment ◽

Data Protection ◽

General Data Protection Regulation ◽

General Data

Download Full-text

BIG DATA PROCESSING IN THE DIGITALIZATION OF ENTERPRISE ACTIVITIES

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.09 ◽

2021 ◽

Vol 75 (3) ◽

pp. 76-82

Author(s):

G.T. Balakayeva ◽

◽

D.K. Darkenbayev ◽

M. Turdaliyev ◽

◽

...

Keyword(s):

Social Networks ◽

Growth Rate ◽

Big Data ◽

Data Processing ◽

New Technologies ◽

Large Data ◽

Modern Society ◽

Information Flows ◽

Professional Activities ◽

The Past

The growth rate of these enterprises has increased significantly in the last decade. Research has shown that over the past two decades, the amount of data has increased approximately tenfold every two years - this exceeded Moore's Law, which doubles the power of processors. About thirty thousand gigabytes of data are accumulated every second, and their processing requires an increase in the efficiency of data processing. Uploading videos, photos and letters from users on social networks leads to the accumulation of a large amount of data, including unstructured ones. This leads to the need for enterprises to work with big data of different formats, which must be prepared in a certain way for further work in order to obtain the results of modeling and calculations. In connection with the above, the research carried out in the article on processing and storing large data of an enterprise, developing a model and algorithms, as well as using new technologies is relevant. Undoubtedly, every year the information flows of enterprises will increase and in this regard, it is important to solve the issues of storing and processing large amounts of data. The relevance of the article is due to the growing digitalization, the increasing transition to professional activities online in many areas of modern society. The article provides a detailed analysis and research of these new technologies.

Download Full-text

Multi-Variable, High Order, Performance Models (2005C)

Fluids Engineering ◽

10.1115/imece2005-79416 ◽

2005 ◽

Cited By ~ 3

Author(s):

David Japikse ◽

Oleg Dubitsky ◽

Kerry N. Oliphant ◽

Robert J. Pelton ◽

Daniel Maynes ◽

...

Keyword(s):

Data Processing ◽

Large Data ◽

High Order ◽

Large Data Sets ◽

Data Sets ◽

Performance Models ◽

Statistical Accuracy ◽

Evaluation Methodologies ◽

New Models ◽

The Impact

In the course of developing advanced data processing and advanced performance models, as presented in companion papers, a number of basic scientific and mathematical questions arose. This paper deals with questions such as uniqueness, convergence, statistical accuracy, training, and evaluation methodologies. The process of bringing together large data sets and utilizing them, with outside data supplementation, is considered in detail. After these questions are focused carefully, emphasis is placed on how the new models, based on highly refined data processing, can best be used in the design world. The impact of this work on designs of the future is discussed. It is expected that this methodology will assist designers to move beyond contemporary design practices.

Download Full-text

SECURE DEPENDABLE SELECTIVE STORAGE SERVICES AND SUPPORT FOR DYNAMIC DATA OPERATIONS IN CLOUD COMPUTING

Graduate Research in Engineering and Technology ◽

10.47893/gret.2013.1026 ◽

2013 ◽

pp. 35-40

Author(s):

VINITHA S P ◽

GURUPRASAD E

Keyword(s):

Cloud Computing ◽

Service Providers ◽

Distributed Storage ◽

Large Data ◽

Cloud Service ◽

Dynamic Data ◽

Cloud Data ◽

Data Auditing ◽

Cloud Servers ◽

Selective Storage

Cloud computing has been envisioned as the next generation architecture of IT enterprise. It moves the application software and databases to the centralized large data centers where management of data and services may not be fully trustworthy. This unique paradigm brings out many new security challenges like, maintaining correctness and integrity of data in cloud. Integrity of cloud data may be lost due to unauthorized access, modification or deletion of data. Lacking of availability of data may be due to the cloud service providers (CSP), in order to increase their margin of profit by reducing the cost, CSP may discard rarely accessed data without detecting in timely fashion. To overcome above issues, flexible distributed storage, token utilizing, signature creations used to ensure integrity of data, auditing mechanism used assists in maintaining the correctness of data and also locating, identifying of server where exactly the data has been corrupted and also dependability and availability of data achieved through distributed storage of data in cloud. Further in order to ensure authorized access to cloud data a admin module has been proposed in our previous conference paper, which prevents unauthorized users from accessing data and also selective storage scheme based on different parameters of cloud servers proposed in previous paper, in order to provide efficient storage of data in the cloud. In order to provide more efficiency in this paper dynamic data operations are supported such as updating, deletion and addition of data.

Download Full-text