scholarly journals Big Data Warehouse for Healthcare-Sensitive Data Applications

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2353
Author(s):  
Arsalan Shahid ◽  
Thien-An Ngoc Nguyen ◽  
M-Tahar Kechadi

Obesity is a major public health problem worldwide, and the prevalence of childhood obesity is of particular concern. Effective interventions for preventing and treating childhood obesity aim to change behaviour and exposure at the individual, community, and societal levels. However, monitoring and evaluating such changes is very challenging. The EU Horizon 2020 project “Big Data against Childhood Obesity (BigO)” aims at gathering large-scale data from a large number of children using different sensor technologies to create comprehensive obesity prevalence models for data-driven predictions about specific policies on a community. It further provides real-time monitoring of the population responses, supported by meaningful real-time data analysis and visualisations. Since BigO involves monitoring and storing of personal data related to the behaviours of a potentially vulnerable population, the data representation, security, and access control are crucial. In this paper, we briefly present the BigO system architecture and focus on the necessary components of the system that deals with data access control, storage, anonymisation, and the corresponding interfaces with the rest of the system. We propose a three-layered data warehouse architecture: The back-end layer consists of a database management system for data collection, de-identification, and anonymisation of the original datasets. The role-based permissions and secured views are implemented in the access control layer. Lastly, the controller layer regulates the data access protocols for any data access and data analysis. We further present the data representation methods and the storage models considering the privacy and security mechanisms. The data privacy and security plans are devised based on the types of collected personal, the types of users, data storage, data transmission, and data analysis. We discuss in detail the challenges of privacy protection in this large distributed data-driven application and implement novel privacy-aware data analysis protocols to ensure that the proposed models guarantee the privacy and security of datasets. Finally, we present the BigO system architecture and its implementation that integrates privacy-aware protocols.

2020 ◽  
Vol 245 ◽  
pp. 06042
Author(s):  
Oliver Gutsche ◽  
Igor Mandrichenko

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approach allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.


Web Services ◽  
2019 ◽  
pp. 882-903
Author(s):  
Izabella V. Lokshina ◽  
Barbara J. Durkin ◽  
Cees J.M. Lanting

The Internet of Things (IoT) provides the tools for the development of a major, global data-driven ecosystem. When accessible to people and businesses, this information can make every area of life, including business, more data-driven. In this ecosystem, with its emphasis on Big Data, there has been a focus on building business models for the provision of services, the so-called Internet of Services (IoS). These models assume the existence and development of the necessary IoT measurement and control instruments, communications infrastructure, and easy access to the data collected and information generated by any party. Different business models may support opportunities that generate revenue and value for various types of customers. This paper contributes to the literature by considering business models and opportunities for third-party data analysis services and discusses access to information generated by third parties in relation to Big Data techniques and potential business opportunities.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2772 ◽  
Author(s):  
Aguinaldo Bezerra ◽  
Ivanovitch Silva ◽  
Luiz Affonso Guedes ◽  
Diego Silva ◽  
Gustavo Leitão ◽  
...  

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.


2020 ◽  
Vol 26 (4) ◽  
pp. 190-194
Author(s):  
Jacek Pietraszek ◽  
Norbert Radek ◽  
Andrii V. Goroshko

AbstractThe introduction of solutions conventionally called Industry 4.0 to the industry resulted in the need to make many changes in the traditional procedures of industrial data analysis based on the DOE (Design of Experiments) methodology. The increase in the number of controlled and observed factors considered, the intensity of the data stream and the size of the analyzed datasets revealed the shortcomings of the existing procedures. Modifying procedures by adapting Big Data solutions and data-driven methods is becoming an increasingly pressing need. The article presents the current methods of DOE, considers the existing problems caused by the introduction of mass automation and data integration under Industry 4.0, and indicates the most promising areas in which to look for possible problem solutions.


Sign in / Sign up

Export Citation Format

Share Document