Big Data Warehouse for Healthcare-Sensitive Data Applications

Obesity is a major public health problem worldwide, and the prevalence of childhood obesity is of particular concern. Effective interventions for preventing and treating childhood obesity aim to change behaviour and exposure at the individual, community, and societal levels. However, monitoring and evaluating such changes is very challenging. The EU Horizon 2020 project “Big Data against Childhood Obesity (BigO)” aims at gathering large-scale data from a large number of children using different sensor technologies to create comprehensive obesity prevalence models for data-driven predictions about specific policies on a community. It further provides real-time monitoring of the population responses, supported by meaningful real-time data analysis and visualisations. Since BigO involves monitoring and storing of personal data related to the behaviours of a potentially vulnerable population, the data representation, security, and access control are crucial. In this paper, we briefly present the BigO system architecture and focus on the necessary components of the system that deals with data access control, storage, anonymisation, and the corresponding interfaces with the rest of the system. We propose a three-layered data warehouse architecture: The back-end layer consists of a database management system for data collection, de-identification, and anonymisation of the original datasets. The role-based permissions and secured views are implemented in the access control layer. Lastly, the controller layer regulates the data access protocols for any data access and data analysis. We further present the data representation methods and the storage models considering the privacy and security mechanisms. The data privacy and security plans are devised based on the types of collected personal, the types of users, data storage, data transmission, and data analysis. We discuss in detail the challenges of privacy protection in this large distributed data-driven application and implement novel privacy-aware data analysis protocols to ensure that the proposed models guarantee the privacy and security of datasets. Finally, we present the BigO system architecture and its implementation that integrates privacy-aware protocols.

Download Full-text

An Efficient Ciphertext Policy-Attribute Based Encryption for Big Data Access Control in Cloud Computing

2017 Ninth International Conference on Advanced Computing (ICoAC) ◽

10.1109/icoac.2017.8441507 ◽

2017 ◽

Cited By ~ 4

Author(s):

P. Praveen Kumar ◽

P. Syam Kumar ◽

P.J.A. Alphonse

Keyword(s):

Cloud Computing ◽

Big Data ◽

Access Control ◽

Data Access ◽

Data Access Control ◽

Attribute Based Encryption ◽

Ciphertext Policy

Download Full-text

Striped Data Analysis Framework

EPJ Web of Conferences ◽

10.1051/epjconf/202024506042 ◽

2020 ◽

Vol 245 ◽

pp. 06042

Author(s):

Oliver Gutsche ◽

Igor Mandrichenko

Keyword(s):

Data Analysis ◽

Data Storage ◽

High Energy Physics ◽

Data Access ◽

High Energy ◽

Data Representation ◽

General Idea ◽

Common Data Model ◽

Local File ◽

Energy Physics

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.

Download Full-text

A medical big data access control model based on fuzzy trust prediction and regression analysis

Applied Soft Computing ◽

10.1016/j.asoc.2022.108423 ◽

2022 ◽

pp. 108423

Author(s):

Rong Jiang ◽

Yang Xin ◽

Zhenxing Chen ◽

Ying Zhang

Keyword(s):

Big Data ◽

Regression Analysis ◽

Access Control ◽

Data Access ◽

Control Model ◽

Access Control Model ◽

Trust Prediction ◽

Data Access Control ◽

Medical Big Data ◽

Fuzzy Trust

Download Full-text

Data Analysis Services Related to the IoT and Big Data

Web Services ◽

10.4018/978-1-5225-7501-6.ch048 ◽

2019 ◽

pp. 882-903

Author(s):

Izabella V. Lokshina ◽

Barbara J. Durkin ◽

Cees J.M. Lanting

Keyword(s):

Big Data ◽

Data Analysis ◽

Business Models ◽

Third Party ◽

Data Driven ◽

Easy Access ◽

Provision Of Services ◽

And Control ◽

Measurement And Control ◽

The Internet Of Things

The Internet of Things (IoT) provides the tools for the development of a major, global data-driven ecosystem. When accessible to people and businesses, this information can make every area of life, including business, more data-driven. In this ecosystem, with its emphasis on Big Data, there has been a focus on building business models for the provision of services, the so-called Internet of Services (IoS). These models assume the existence and development of the necessary IoT measurement and control instruments, communications infrastructure, and easy access to the data collected and information generated by any party. Different business models may support opportunities that generate revenue and value for various types of customers. This paper contributes to the literature by considering business models and opportunities for third-party data analysis services and discusses access to information generated by third parties in relation to Big Data techniques and potential business opportunities.

Download Full-text

Big Data-Driven Privacy and Security Issues and Challenges

Services and Business Process Reengineering - Privacy and Security Issues in Big Data ◽

10.1007/978-981-16-1007-3_2 ◽

2021 ◽

pp. 17-32

Author(s):

Selvakumar Samuel ◽

Kesava Pillai Rajadorai ◽

Vazeerudeen Abdul Hameed

Keyword(s):

Big Data ◽

Data Driven ◽

Privacy And Security ◽

Security Issues

Download Full-text

Extracting Value from Industrial Alarms and Events: A Data-Driven Approach Based on Exploratory Data Analysis

Sensors ◽

10.3390/s19122772 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2772 ◽

Cited By ~ 3

Author(s):

Aguinaldo Bezerra ◽

Ivanovitch Silva ◽

Luiz Affonso Guedes ◽

Diego Silva ◽

Gustavo Leitão ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Exchange ◽

Data Science ◽

Exploratory Data Analysis ◽

Data Driven ◽

Industrial Data ◽

Industrial Big Data ◽

Data Driven Approach ◽

Exploratory Data

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.

Download Full-text

Secure and Verifiable Policy Update Outsourcing for Big Data Access Control in the Cloud

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2014.2380373 ◽

2015 ◽

Vol 26 (12) ◽

pp. 3461-3470 ◽

Cited By ~ 47

Author(s):

Kan Yang ◽

Xiaohua Jia ◽

Kui Ren

Keyword(s):

Big Data ◽

Access Control ◽

Data Access ◽

Data Access Control

Download Full-text

A Medical Big Data Analysis Algorithm Based on Access Control System

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2018.10009929 ◽

2018 ◽

Vol 10 (2) ◽

pp. 1

Author(s):

XiaoRong Diao ◽

Xingyan Yao ◽

Yegang Chen ◽

Jun Luo

Keyword(s):

Big Data ◽

Control System ◽

Data Analysis ◽

Access Control ◽

Big Data Analysis ◽

Analysis Algorithm ◽

Access Control System ◽

Medical Big Data

Download Full-text

A 7.11mJ/Gb/query data-driven machine learning processor (D2MLP) for big data analysis and applications

2014 Symposium on VLSI Circuits Digest of Technical Papers ◽

10.1109/vlsic.2014.6858422 ◽

2014 ◽

Author(s):

Chang-Hung Tsai ◽

Tung-Yu Wu ◽

Shu-Yu Hsu ◽

Chia-Ching Chu ◽

Fang-Ju Ku ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Data Driven

Download Full-text

Challenges for the DOE methodology related to the introduction of Industry 4.0

Production Engineering Archives ◽

10.30657/pea.2020.26.33 ◽

2020 ◽

Vol 26 (4) ◽

pp. 190-194

Author(s):

Jacek Pietraszek ◽

Norbert Radek ◽

Andrii V. Goroshko

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Integration ◽

Design Of Experiments ◽

Industry 4.0 ◽

Data Stream ◽

Data Driven ◽

Industrial Data ◽

Existing Problems

AbstractThe introduction of solutions conventionally called Industry 4.0 to the industry resulted in the need to make many changes in the traditional procedures of industrial data analysis based on the DOE (Design of Experiments) methodology. The increase in the number of controlled and observed factors considered, the intensity of the data stream and the size of the analyzed datasets revealed the shortcomings of the existing procedures. Modifying procedures by adapting Big Data solutions and data-driven methods is becoming an increasingly pressing need. The article presents the current methods of DOE, considers the existing problems caused by the introduction of mass automation and data integration under Industry 4.0, and indicates the most promising areas in which to look for possible problem solutions.

Download Full-text