erasure coding
Recently Published Documents


TOTAL DOCUMENTS

207
(FIVE YEARS 58)

H-INDEX

15
(FIVE YEARS 2)

2021 ◽  
Vol 17 (4) ◽  
pp. 1-38
Author(s):  
Takayuki Fukatani ◽  
Hieu Hanh Le ◽  
Haruo Yokota

With the recent performance improvements in commodity hardware, low-cost commodity server-based storage has become a practical alternative to dedicated-storage appliances. Because of the high failure rate of commodity servers, data redundancy across multiple servers is required in a server-based storage system. However, the extra storage capacity for this redundancy significantly increases the system cost. Although erasure coding (EC) is a promising method to reduce the amount of redundant data, it requires distributing and encoding data among servers. There remains a need to reduce the performance impact of these processes involving much network traffic and processing overhead. Especially, the performance impact becomes significant for random-intensive applications. In this article, we propose a new lightweight redundancy control for server-based storage. Our proposed method uses a new local filesystem-based approach that avoids distributing data by adding data redundancy to locally stored user data. Our method switches the redundancy method of user data between replication and EC according to workloads to improve capacity efficiency while achieving higher performance. Our experiments show up to 230% better online-transaction-processing performance for our method compared with CephFS, a widely used alternative system. We also confirmed that our proposed method prevents unexpected performance degradation while achieving better capacity efficiency.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Andreas J. Peters ◽  
Daniel C. van der Ster

AbstractCephFS is a network filesystem built upon the Reliable Autonomic Distributed Object Store (RADOS). At CERN we have demonstrated its reliability and elasticity while operating several 100-to-1000TB clusters which provide NFS-like storage to infrastructure applications and services. At the same time, our lab developed EOS to offer high performance 100PB-scale storage for the LHC at extremely low costs while also supporting the complete set of security and functional APIs required by the particle-physics user community. This work seeks to evaluate the performance of CephFS on this cost-optimized hardware when it is combined with EOS to support the missing functionalities. To this end, we have setup a proof-of-concept Ceph Octopus cluster on high-density JBOD servers (840 TB each) with 100Gig-E networking. The system uses EOS to provide an overlayed namespace and protocol gateways for HTTP(S) and XROOTD, and uses CephFS as an erasure-coded object storage backend. The solution also enables operators to aggregate several CephFS instances and adds features, such as third-party-copy, SciTokens, and high-level user and quota management. Using simple benchmarks we measure the cost/performance tradeoffs of different erasure-coding layouts, as well as the network overheads of these coding schemes. We demonstrate some relevant limitations of the CephFS metadata server and offer improved tunings which can be generally applicable. To conclude, we reflect on the advantages and drawbacks related to this architecture, such as RADOS-level free space requirements and double-network penalties, and offer ideas for improvements in the future.


Computers ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 142
Author(s):  
Obadah Hammoud ◽  
Ivan Tarkhanov ◽  
Artyom Kosmarski

This paper investigates the problem of distributed storage of electronic documents (both metadata and files) in decentralized blockchain-based b2b systems (DApps). The need to reduce the cost of implementing such systems and the insufficient elaboration of the issue of storing big data in DLT are considered. An approach for building such systems is proposed, which allows optimizing the size of the required storage (by using Erasure coding) and simultaneously providing secure data storage in geographically distributed systems of a company, or within a consortium of companies. The novelty of this solution is that we are the first who combine enterprise DLT with distributed file storage, in which the availability of files is controlled. The results of our experiment demonstrate that the speed of the described DApp is comparable to known b2c torrent projects, and subsequently justify the choice of Hyperledger Fabric and Ethereum Enterprise for its use. Obtained test results show that public blockchain networks are not suitable for creating such a b2b system. The proposed system solves the main challenges of distributed data storage by grouping data into clusters and managing them with a load balancer, while preventing data tempering using a blockchain network. The considered DApps storage methodology easily scales horizontally in terms of distributed file storage and can be deployed on cloud computing technologies, while minimizing the required storage space. We compare this approach with known methods of file storage in distributed systems, including central storage, torrents, IPFS, and Storj. The reliability of this approach is calculated and the result is compared to traditional solutions based on full backup.


2021 ◽  
Author(s):  
Lei Sun ◽  
Qiang Cao ◽  
Shucheng Wang ◽  
Changsheng Xie

2021 ◽  
Vol 11 (18) ◽  
pp. 8727
Author(s):  
Dong-Jin Shin ◽  
Jeong-Joon Kim

Research has been conducted to efficiently transfer blocks and reduce network costs when decoding and recovering data from an erasure coding-based distributed file system. Technologies using software-defined network (SDN) controllers can collect and more efficiently manage network data. However, the bandwidth depends dynamically on the number of data transmitted on the network, and the data transfer time is inefficient owing to the longer latency of existing routing paths when nodes and switches fail. We propose deep Q-network erasure coding (DQN-EC) to solve routing problems by converging erasure coding with DQN to learn dynamically changing network elements. Using the SDN controller, DQN-EC collects the status, number, and block size of nodes possessing stored blocks during erasure coding. The fat-tree network topology used for experimental evaluation collects elements of typical network packets, the bandwidth of the nodes and switches, and other information. The data collected undergo deep reinforcement learning to avoid node and switch failures and provide optimized routing paths by selecting switches that efficiently conduct block transfers. DQN-EC achieves a 2.5-times-faster block transmission time and 0.4-times-higher network throughput than open shortest path first (OSPF) routing algorithms. The bottleneck bandwidth and transmission link cost can be reduced, improving the recovery time approximately twofold.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Zijian Li ◽  
Chuqiao Xiao

In distributed database systems, as cluster scales grow, efficiency and availability become critical considerations. In a cluster, a common approach to high availability is using replication, but this is inefficient due to its low storage utilization. Erasure coding can provide data reliability while ensuring high storage utilization. However, due to the large number of coding and decoding operations required by the CPU, it is not suitable for some frequently updated data. In order to optimize the storage efficiency of the data in the distributed system without affecting the availability of the data, this paper proposes a data temperature recognition algorithm that can distinguish data tablets and divides data tablets into three types, cold, warm, and hot, according to the frequency of access. Combining three replicas and erasure coding technology, ER-store is proposed, a hybrid storage mechanism for different data types. At the same time, we combined the read-write separation architecture of the distributed database system to design the data temperature conversion cycle, which reduces the computational overhead caused by frequent updates of erasure coding technology. We have implemented this design on the CBase database system based on the read-write separation architecture, and the experimental results show that it can save 14.6%–18.3% of the storage space while meeting the efficient access performance of the system.


Sign in / Sign up

Export Citation Format

Share Document