Grid Data Handling

2013 ◽  
pp. 294-321
Author(s):  
Alexandru Costan

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.

2011 ◽  
pp. 112-139
Author(s):  
Alexandru Costan

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.


2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Vasileios Moysiadis ◽  
Panagiotis Sarigiannidis ◽  
Ioannis Moscholios

In the emerging area of the Internet of Things (IoT), the exponential growth of the number of smart devices leads to a growing need for efficient data storage mechanisms. Cloud Computing was an efficient solution so far to store and manipulate such huge amount of data. However, in the next years it is expected that Cloud Computing will be unable to handle the huge amount of the IoT devices efficiently due to bandwidth limitations. An arising technology which promises to overwhelm many drawbacks in large-scale networks in IoT is Fog Computing. Fog Computing provides high-quality Cloud services in the physical proximity of mobile users. Computational power and storage capacity could be offered from the Fog, with low latency and high bandwidth. This survey discusses the main features of Fog Computing, introduces representative simulators and tools, highlights the benefits of Fog Computing in line with the applications of large-scale IoT networks, and identifies various aspects of issues we may encounter when designing and implementing social IoT systems in the context of the Fog Computing paradigm. The rationale behind this work lies in the data storage discussion which is performed by taking into account the importance of storage capabilities in modern Fog Computing systems. In addition, we provide a comprehensive comparison among previously developed distributed data storage systems which consist of a promising solution for data storage allocation in Fog Computing.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Houshyar Honar Pajooh ◽  
Mohammed A. Rashid ◽  
Fakhrul Alam ◽  
Serge Demidenko

AbstractThe diversity and sheer increase in the number of connected Internet of Things (IoT) devices have brought significant concerns associated with storing and protecting a large volume of IoT data. Storage volume requirements and computational costs are continuously rising in the conventional cloud-centric IoT structures. Besides, dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks. In this paper, a layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system are proposed. It has been developed to mitigate the above-mentioned challenges by using the Hyperledger Fabric (HLF) platform for distributed ledger solutions. The need for a centralized server and a third-party auditor was eliminated by leveraging HLF peers performing transaction verifications and records audits in a big data system with the help of blockchain technology. The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata are stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Additionally, a prototype has been implemented on embedded hardware showing the feasibility of deploying the proposed solution in IoT edge computing and big data ecosystems. Finally, experiments have been conducted to evaluate the performance of the proposed scheme in terms of its throughput, latency, communication, and computation costs. The obtained results have indicated the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain. The experimental results show the throughput of about 600 transactions, 500 ms average response time, about 2–3% of the CPU consumption at the peer process and approximately 10–20% at the client node. The minimum latency remained below 1 s however, there is an increase in the maximum latency when the sending rate reached around 200 transactions per second (TPS).


Author(s):  
Ismail Akturk ◽  
Xinqi Wang ◽  
Tevfik Kosar

The unbounded increase in the size of data generated by scientific applications necessitates collaboration and sharing among the nation’s education and research institutions. Simply purchasing high-capacity, high-performance storage systems and adding them to the existing infrastructure of the collaborating institutions does not solve the underlying and highly challenging data handling problem. Scientists are compelled to spend a great deal of time and energy on solving basic data-handling issues, such as the physical location of data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis. This chapter presents the design and implementation of a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent, and scalable access. In PetaShare, the authors have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability. The authors also present a high level cross-domain metadata schema to provide a structured systematic view of multiple science domains supported by PetaShare.


2015 ◽  
Vol 9 ◽  
pp. 7011-7025
Author(s):  
Nikolai Pleshchinskii ◽  
Alexander Tormasov ◽  
Dmitrii Tumakov

Big data is all about the developing challenge that associations face in today’s world, As they manage enormous and quickly developing wellsprings of information or data, with the complex range of analysis and the problem includes computing infrastructure, accessing mixed data both structured and unstructured data from various sources such as networking, Recording and stored images. Hadoop is the open source software framework includes no of compartments that are specifically designed for solving large-scale distributed data storage. MapReduce is a parallel programming design for processing


Author(s):  
D. V. Gribanov

Introduction. This article is devoted to legal regulation of digital assets turnover, utilization possibilities of distributed computing and distributed data storage systems in activities of public authorities and entities of public control. The author notes that some national and foreign scientists who study a “blockchain” technology (distributed computing and distributed data storage systems) emphasize its usefulness in different activities. Data validation procedure of digital transactions, legal regulation of creation, issuance and turnover of digital assets need further attention.Materials and methods. The research is based on common scientific (analysis, analogy, comparing) and particular methods of cognition of legal phenomena and processes (a method of interpretation of legal rules, a technical legal method, a formal legal method and a formal logical one).Results of the study. The author conducted an analysis which resulted in finding some advantages of the use of the “blockchain” technology in the sphere of public control which are as follows: a particular validation system; data that once were entered in the system of distributed data storage cannot be erased or forged; absolute transparency of succession of actions while exercising governing powers; automatic repeat of recurring actions. The need of fivefold validation of exercising governing powers is substantiated. The author stresses that the fivefold validation shall ensure complex control over exercising of powers by the civil society, the entities of public control and the Russian Federation as a federal state holding sovereignty over its territory. The author has also conducted a brief analysis of judicial decisions concerning digital transactions.Discussion and conclusion. The use of the distributed data storage system makes it easier to exercise control due to the decrease of risks of forge, replacement or termination of data. The author suggests defining digital transaction not only as some actions with digital assets, but also as actions toward modification and addition of information about legal facts with a purpose of its establishment in the systems of distributed data storage. The author suggests using the systems of distributed data storage for independent validation of information about activities of the bodies of state authority. In the author’s opinion, application of the “blockchain” technology may result not only in the increase of efficiency of public control, but also in the creation of a new form of public control – automatic control. It is concluded there is no legislation basis for regulation of legal relations concerning distributed data storage today.


Sign in / Sign up

Export Citation Format

Share Document