An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

Obadah Hammoud; Ivan Tarkhanov; Artyom Kosmarski

doi:10.3390/computers10110142

An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

Computers ◽

10.3390/computers10110142 ◽

2021 ◽

Vol 10 (11) ◽

pp. 142

Author(s):

Obadah Hammoud ◽

Ivan Tarkhanov ◽

Artyom Kosmarski

Keyword(s):

Distributed Systems ◽

Data Storage ◽

Distributed Storage ◽

Distributed Data ◽

Erasure Coding ◽

Distributed Data Storage ◽

Electronic Documents ◽

File Storage ◽

Load Balancer ◽

The Cost

This paper investigates the problem of distributed storage of electronic documents (both metadata and files) in decentralized blockchain-based b2b systems (DApps). The need to reduce the cost of implementing such systems and the insufficient elaboration of the issue of storing big data in DLT are considered. An approach for building such systems is proposed, which allows optimizing the size of the required storage (by using Erasure coding) and simultaneously providing secure data storage in geographically distributed systems of a company, or within a consortium of companies. The novelty of this solution is that we are the first who combine enterprise DLT with distributed file storage, in which the availability of files is controlled. The results of our experiment demonstrate that the speed of the described DApp is comparable to known b2c torrent projects, and subsequently justify the choice of Hyperledger Fabric and Ethereum Enterprise for its use. Obtained test results show that public blockchain networks are not suitable for creating such a b2b system. The proposed system solves the main challenges of distributed data storage by grouping data into clusters and managing them with a load balancer, while preventing data tempering using a blockchain network. The considered DApps storage methodology easily scales horizontally in terms of distributed file storage and can be deployed on cloud computing technologies, while minimizing the required storage space. We compare this approach with known methods of file storage in distributed systems, including central storage, torrents, IPFS, and Storj. The reliability of this approach is calculated and the result is compared to traditional solutions based on full backup.

Download Full-text

Heterogeneous Distributed Data Storage for an Object-Oriented Engineering Design Environment

7th Database Symposium: Engineering Data Management — Key to Success in a Global Market ◽

10.1115/edm1993-0117 ◽

1993 ◽

Author(s):

Lee P. Brintle ◽

Elizabeth A. Koppes ◽

J. K. Wu

Keyword(s):

Data Storage ◽

Object Oriented ◽

Current Method ◽

Distributed Data ◽

Design Environment ◽

Single Node ◽

Distributed Data Storage ◽

Object Oriented Design ◽

File Storage ◽

Network File System

Abstract The Tracked Vehicle Workstation (TVWS) is a distributed, object-oriented design environment that stores the large amounts of data associated with mechanical system’s design and analysis across heterogeneous UNIX† machines on a network. A Distributed ASCII File Storage (DAFS) server was developed to provide an easy to port, easy to modify means of retrieving and updating files on remote machines. This paper describes the different techniques and difficulties of methods the TVWS environment has used previously, including commercial solutions, single-node storage and the Network File System (NFS)‡, as well as a description of the current method utilizing DAFS.

Download Full-text

Decentralized Transaction Mechanism Based on Smart Contract in Distributed Data Storage

Information ◽

10.3390/info9110286 ◽

2018 ◽

Vol 9 (11) ◽

pp. 286 ◽

Cited By ~ 4

Author(s):

Yonggen Gu ◽

Dingding Hou ◽

Xiaohong Wu ◽

Jie Tao ◽

Yanqiong Zhang

Keyword(s):

Data Storage ◽

Resource Selection ◽

Single Point ◽

Distributed Data ◽

Erasure Coding ◽

Distributed Data Storage ◽

Transaction Model ◽

Smart Contract ◽

Storage Resource ◽

Lock In

Distributed data storage has received more attention due to its advantages in reliability, availability and scalability, and it brings both opportunities and challenges for distributed data storage transaction. The traditional transaction system of storage resources, which generally runs in a centralized mode, results in high cost, vendor lock-in and single point failure risk. To overcome the above shortcomings, considering the storage policy with erasure coding, in this paper we propose a decentralized transaction method for cloud storage based on a smart contract, which takes into account the resource cost for distributed data storage. First, to guarantee the availability and decrease the storing cost, a reverse Vickrey-Clarke-Groves (VCG) based auction mechanism is proposed for storage resource selection and transaction. Then we deploy and implement the proposed mechanism by designing a corresponding smart contract. Especially, we address the problem of how to implement a VCG-like mechanism in a blockchain environment. Based on the private chain of Ethereum, we make the simulation for the proposed storage transaction method. The results of simulation show that the proposed transaction model can realize competitive trading of storage resources and ensure the safe and economic operation of resource trading.

Download Full-text

Research on Distributed Storage Technology Based on Mass Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.2710 ◽

2014 ◽

Vol 687-691 ◽

pp. 2710-2713

Author(s):

Jing Yang

Keyword(s):

Data Storage ◽

Distributed Storage ◽

Rapid Development ◽

Distributed Data ◽

Distributed Environment ◽

Environment Analysis ◽

Distributed Data Storage ◽

Storage Technology ◽

Mass Data ◽

Storage Performance

With the rapid development of computer technology and network technology, mass data store distributed and management pattern already received accepted extensively. Thus it can be seen malpractice obviously, data storage structure, storage environment are different and other problems such as data handing. The paper go into how improve data storage performance in the distributed environment, analysis the data storage technology at present and data storage performance in the distributed environment, summarize the claim of distributed storage database design, provide the theory in vacation distributed data storage performance standardization.

Download Full-text

REGENERATING CODETECHNIQUE IN DISTRIBUTED STORAGE

Jurnal Sains Dasar ◽

10.21831/jsd.v5i1.12669 ◽

2017 ◽

Vol 5 (1) ◽

pp. 60

Author(s):

Agus Maman Abadi ◽

Karyati Karyati ◽

Musthofa Musthofa ◽

Emut Emut

Keyword(s):

Data Storage ◽

Algebraic Structure ◽

Distributed Storage ◽

Storage System ◽

Distributed Data ◽

Distributed Data Storage ◽

Code Technique ◽

Code Module ◽

Data Storage System ◽

Regenerating Code

Abstract The Increasing need of storing large amounts of data presents a new challenge. One way to address this challenge is to use distributed data storage system. One of the strategies implemented in the distributed data storage system is using the technique of regenerating code. The code used in this technique is based on the algebraic structure of fields. Some studies have also been carried out to create code that is based on the other algebraic structure namely module. In this study, we attempted to assess the implementation of the code module at regenerating technique code. The study showed there is a potential properties code based on module that can be used in regenerating code technique. Keywords: Distributed storage, regenerating code technique, module code

Download Full-text

PetaShare: A Reliable, Efficient and Transparent Distributed Storage Management System

Scientific Programming ◽

10.1155/2011/901230 ◽

2011 ◽

Vol 19 (1) ◽

pp. 27-43

Author(s):

Tevfik Kosar ◽

Ismail Akturk ◽

Mehmet Balman ◽

Xinqi Wang

Keyword(s):

Data Management ◽

Data Storage ◽

Management System ◽

Distributed Storage ◽

Storage System ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Movement ◽

Reliability And Availability

Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science.

Download Full-text

Detailed black-box monitoring of distributed systems

ACM SIGAPP Applied Computing Review ◽

10.1145/3477133.3477135 ◽

2021 ◽

Vol 21 (1) ◽

pp. 24-36

Author(s):

Francisco Neves ◽

Ricardo Vilaça ◽

José Pereira

Keyword(s):

Big Data ◽

Distributed Systems ◽

Data Storage ◽

Distributed System ◽

Structural Information ◽

Black Box ◽

Distributed Data ◽

Resource Usage ◽

Distributed Data Storage ◽

Big Data Storage

Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, interapplication traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

Download Full-text

Analysis of data integrity and storage quality of a distributed storage system

EPJ Web of Conferences ◽

10.1051/epjconf/202125102035 ◽

2021 ◽

Vol 251 ◽

pp. 02035

Author(s):

Adrian Eduard Negru ◽

Latchezar Betev ◽

Mihai Carabaș ◽

Costin Grigoraș ◽

Nicolae Țăpuş ◽

...

Keyword(s):

Data Storage ◽

Distributed Storage ◽

Storage System ◽

Essential Element ◽

Data Access ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Lifetime ◽

Operational Issues ◽

And Storage

CERN uses the world’s largest scientific computing grid, WLCG, for distributed data storage and processing. Monitoring of the CPU and storage resources is an important and essential element to detect operational issues in its systems, for example in the storage elements, and to ensure their proper and efficient function. The processing of experiment data depends strongly on the data access quality, as well as its integrity and both of these key parameters must be assured for the data lifetime. Given the substantial amount of data, O(200 PB), already collected by ALICE and kept at various storage elements around the globe, scanning every single data chunk would be a very expensive process, both in terms of computing resources usage and in terms of execution time. In this paper, we describe a distributed file crawler that addresses these natural limits by periodically extracting and analyzing statistically significant samples of files from storage elements, evaluates the results and is integrated with the existing monitoring solution, MonALISA.

Download Full-text

Metadata Management in PetaShare Distributed Storage Network

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch005 ◽

2012 ◽

pp. 118-139

Author(s):

Ismail Akturk ◽

Xinqi Wang ◽

Tevfik Kosar

Keyword(s):

Data Storage ◽

High Performance ◽

Distributed Storage ◽

Storage System ◽

High Capacity ◽

Distributed Data ◽

Data Handling ◽

Distributed Data Storage ◽

Efficient Data ◽

High Level

The unbounded increase in the size of data generated by scientific applications necessitates collaboration and sharing among the nation’s education and research institutions. Simply purchasing high-capacity, high-performance storage systems and adding them to the existing infrastructure of the collaborating institutions does not solve the underlying and highly challenging data handling problem. Scientists are compelled to spend a great deal of time and energy on solving basic data-handling issues, such as the physical location of data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis. This chapter presents the design and implementation of a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent, and scalable access. In PetaShare, the authors have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability. The authors also present a high level cross-domain metadata schema to provide a structured systematic view of multiple science domains supported by PetaShare.

Download Full-text

A Distributed Storage for Astroparticle Physics

EPJ Web of Conferences ◽

10.1051/epjconf/201920708003 ◽

2019 ◽

Vol 207 ◽

pp. 08003

Author(s):

Alexander Kryukov ◽

Minh-Duc Nguyen

Keyword(s):

Data Storage ◽

Distributed Storage ◽

Application Programming Interface ◽

Aggregate Data ◽

Distributed Data ◽

Astroparticle Physics ◽

Distributed Data Storage ◽

Special Service ◽

Application Programming ◽

Programming Interface

In this paper we present the architecture of a distributed data storage for astroparticle physics. The main advantage of the proposed architecture is the possibility to extract data on both file and event level for further processing and analysis. The storage also provides users with a special service allowing to aggregate data from different storages into a single sample. This feature permits to apply multi-messenger methods for more sophisticated investigation of the data. Users can use both Webinterface and Application Programming Interface (API) for accessing the storage.

Download Full-text

Coding Schemes for Distributed Storage Systems: Implementation and Improvements

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2014.1218 ◽

2014 ◽

pp. 44-46

Author(s):

Saman Tabatabaeian ◽

Rajendra P. Lal ◽

Wilson Naik

Keyword(s):

Finite Field ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Linear Equations ◽

Original Data ◽

Distributed Data ◽

Distributed Data Storage ◽

Coding Schemes ◽

Whole Process

Distributed data storage systems are used to store data reliably over a distributed collection of storage locations, called peers. Coding schemes are used to store a portion of the data in the peers ensuring the complete retrieval of data, during peer failures. This has applications in various areas like Wireless Networks, Sensor Networks etc. In this framework we consider a large file to be stored in a distributed manner over few peers of limited capacity. Each peer stores a portion of the coded data, without the knowledge of the contents of other peers. Random Coding is one of the coding schemes used for this. In [1] coding coefficients are chosen randomly from a finite field to encode the data. The encoding is basically a linear combination of file pieces (pieces are elements of finite fields). The data downloader downloads these coded data from several peers and decodes to get the original data. The decoding is basically solving a system of linear equations over a finite field, which is the most time consuming step in the whole process. We give a simple C++ implementation of the schemes in [1] and plot the results. We are trying to find a scheme where coding vectors can be chosen such that the decoding complexity is reduced significantly. Also in a dynamic setting where nodes enter and leave system intermittently, are discussed.

Download Full-text