Semantics for Big Data access & integration: Improving industrial equipment design through increased data usability

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

An Efficient Ciphertext Policy-Attribute Based Encryption for Big Data Access Control in Cloud Computing

2017 Ninth International Conference on Advanced Computing (ICoAC) ◽

10.1109/icoac.2017.8441507 ◽

2017 ◽

Cited By ~ 4

Author(s):

P. Praveen Kumar ◽

P. Syam Kumar ◽

P.J.A. Alphonse

Keyword(s):

Cloud Computing ◽

Big Data ◽

Access Control ◽

Data Access ◽

Data Access Control ◽

Attribute Based Encryption ◽

Ciphertext Policy

Download Full-text

Integrating cellular automata and discrete global grid systems: a case study into wildfire modelling

AGILE: GIScience Series ◽

10.5194/agile-giss-1-6-2020 ◽

2020 ◽

Vol 1 ◽

pp. 1-23

Author(s):

Majid Hojati ◽

Colin Robertson

Keyword(s):

Big Data ◽

Spatial Analysis ◽

Cellular Automata ◽

Spatial Data ◽

Data Model ◽

Data Access ◽

Environmental Modeling ◽

Modeling Framework ◽

Global Grid

Abstract. With new forms of digital spatial data driving new applications for monitoring and understanding environmental change, there are growing demands on traditional GIS tools for spatial data storage, management and processing. Discrete Global Grid System (DGGS) are methods to tessellate globe into multiresolution grids, which represent a global spatial fabric capable of storing heterogeneous spatial data, and improved performance in data access, retrieval, and analysis. While DGGS-based GIS may hold potential for next-generation big data GIS platforms, few of studies have tried to implement them as a framework for operational spatial analysis. Cellular Automata (CA) is a classic dynamic modeling framework which has been used with traditional raster data model for various environmental modeling such as wildfire modeling, urban expansion modeling and so on. The main objectives of this paper are to (i) investigate the possibility of using DGGS for running dynamic spatial analysis, (ii) evaluate CA as a generic data model for dynamic phenomena modeling within a DGGS data model and (iii) evaluate an in-database approach for CA modelling. To do so, a case study into wildfire spread modelling is developed. Results demonstrate that using a DGGS data model not only provides the ability to integrate different data sources, but also provides a framework to do spatial analysis without using geometry-based analysis. This results in a simplified architecture and common spatial fabric to support development of a wide array of spatial algorithms. While considerable work remains to be done, CA modelling within a DGGS-based GIS is a robust and flexible modelling framework for big-data GIS analysis in an environmental monitoring context.

Download Full-text

Big Data Access Patterns

Big Data Application Architecture Q & A ◽

10.1007/978-1-4302-6293-0_5 ◽

2013 ◽

pp. 57-68

Author(s):

Nitin Sawant ◽

Himanshu Shah

Keyword(s):

Big Data ◽

Data Access ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

A medical big data access control model based on fuzzy trust prediction and regression analysis

Applied Soft Computing ◽

10.1016/j.asoc.2022.108423 ◽

2022 ◽

pp. 108423

Author(s):

Rong Jiang ◽

Yang Xin ◽

Zhenxing Chen ◽

Ying Zhang

Keyword(s):

Big Data ◽

Regression Analysis ◽

Access Control ◽

Data Access ◽

Control Model ◽

Access Control Model ◽

Trust Prediction ◽

Data Access Control ◽

Medical Big Data ◽

Fuzzy Trust

Download Full-text

Data Literacy and Citizenship

Advances in Educational Technologies and Instructional Design - Handbook of Research on Driving STEM Learning With Educational Technologies ◽

10.4018/978-1-5225-2026-9.ch004 ◽

2017 ◽

pp. 65-79 ◽

Cited By ~ 3

Author(s):

Eddy L. Borges-Rey

Keyword(s):

Big Data ◽

Teaching And Learning ◽

New Technologies ◽

Data Access ◽

Data Sampling ◽

Public And Private ◽

Data Literacy ◽

Vital Fluid ◽

And Mathematics ◽

Access Data

This chapter explores the challenges that emerge from a narrow understanding of the principles underpinning Big data, framed in the context of the teaching and learning of Science and Mathematics. This study considers the materiality of computerised data and examines how notions of data access, data sampling, data sense-making and data collection are nowadays contested by datafied public and private bodies, hindering the capacity of citizens to effectively understand and make better use of the data they generate or engage with. The study offers insights from secondary and documentary research and its results suggest that understanding data in less constraining terms, namely: a) as capable of secondary agency, b) as the vital fluid of societal institutions, c) as gathered or accessed by new data brokers and through new technologies and techniques, and d) as mediated by the constant interplay between public and corporate spheres and philosophies, could greatly enhance the teaching and learning of Science and Mathematics in the framework of current efforts to advance data literacy.

Download Full-text

Intelligent Techniques for Analysis of Big Data About Healthcare and Medical Records

10.4018/978-1-6684-3662-2.ch021 ◽

2022 ◽

pp. 431-454

Author(s):

Pinar Kirci

Keyword(s):

Operating System ◽

Big Data ◽

Electronic Medical Records ◽

Data Storage ◽

Medical Records ◽

Data Access ◽

Efficient Solutions ◽

Unstructured Data ◽

Distributed File System ◽

Healthcare Industry

To define huge datasets, the term of big data is used. The considered “4 V” datasets imply volume, variety, velocity and value for many areas especially in medical images, electronic medical records (EMR) and biometrics data. To process and manage such datasets at storage, analysis and visualization states are challenging processes. Recent improvements in communication and transmission technologies provide efficient solutions. Big data solutions should be multithreaded and data access approaches should be tailored to big amounts of semi-structured/unstructured data. Software programming frameworks with a distributed file system (DFS) that owns more units compared with the disk blocks in an operating system to multithread computing task are utilized to cope with these difficulties. Huge datasets in data storage and analysis of healthcare industry need new solutions because old fashioned and traditional analytic tools become useless.

Download Full-text