erasure coding Latest Research Papers

With the recent performance improvements in commodity hardware, low-cost commodity server-based storage has become a practical alternative to dedicated-storage appliances. Because of the high failure rate of commodity servers, data redundancy across multiple servers is required in a server-based storage system. However, the extra storage capacity for this redundancy significantly increases the system cost. Although erasure coding (EC) is a promising method to reduce the amount of redundant data, it requires distributing and encoding data among servers. There remains a need to reduce the performance impact of these processes involving much network traffic and processing overhead. Especially, the performance impact becomes significant for random-intensive applications. In this article, we propose a new lightweight redundancy control for server-based storage. Our proposed method uses a new local filesystem-based approach that avoids distributing data by adding data redundancy to locally stored user data. Our method switches the redundancy method of user data between replication and EC according to workloads to improve capacity efficiency while achieving higher performance. Our experiments show up to 230% better online-transaction-processing performance for our method compared with CephFS, a widely used alternative system. We also confirmed that our proposed method prevents unexpected performance degradation while achieving better capacity efficiency.

Download Full-text

Accelerating XOR-based erasure coding using program optimization techniques

10.1145/3458817.3476204 ◽

2021 ◽

Author(s):

Yuya Uezato

Keyword(s):

Optimization Techniques ◽

Program Optimization ◽

Erasure Coding

Download Full-text

Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers

Computing and Software for Big Science ◽

10.1007/s41781-021-00071-1 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Andreas J. Peters ◽

Daniel C. van der Ster

Keyword(s):

Particle Physics ◽

High Performance ◽

High Density ◽

Third Party ◽

Erasure Coding ◽

Coding Schemes ◽

Quota Management ◽

High Level ◽

The Cost ◽

Space Requirements

AbstractCephFS is a network filesystem built upon the Reliable Autonomic Distributed Object Store (RADOS). At CERN we have demonstrated its reliability and elasticity while operating several 100-to-1000TB clusters which provide NFS-like storage to infrastructure applications and services. At the same time, our lab developed EOS to offer high performance 100PB-scale storage for the LHC at extremely low costs while also supporting the complete set of security and functional APIs required by the particle-physics user community. This work seeks to evaluate the performance of CephFS on this cost-optimized hardware when it is combined with EOS to support the missing functionalities. To this end, we have setup a proof-of-concept Ceph Octopus cluster on high-density JBOD servers (840 TB each) with 100Gig-E networking. The system uses EOS to provide an overlayed namespace and protocol gateways for HTTP(S) and XROOTD, and uses CephFS as an erasure-coded object storage backend. The solution also enables operators to aggregate several CephFS instances and adds features, such as third-party-copy, SciTokens, and high-level user and quota management. Using simple benchmarks we measure the cost/performance tradeoffs of different erasure-coding layouts, as well as the network overheads of these coding schemes. We demonstrate some relevant limitations of the CephFS metadata server and offer improved tunings which can be generally applicable. To conclude, we reflect on the advantages and drawbacks related to this architecture, such as RADOS-level free space requirements and double-network penalties, and offer ideas for improvements in the future.

Download Full-text

An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

Computers ◽

10.3390/computers10110142 ◽

2021 ◽

Vol 10 (11) ◽

pp. 142

Author(s):

Obadah Hammoud ◽

Ivan Tarkhanov ◽

Artyom Kosmarski

Keyword(s):

Distributed Systems ◽

Data Storage ◽

Distributed Storage ◽

Distributed Data ◽

Erasure Coding ◽

Distributed Data Storage ◽

Electronic Documents ◽

File Storage ◽

Load Balancer ◽

The Cost

This paper investigates the problem of distributed storage of electronic documents (both metadata and files) in decentralized blockchain-based b2b systems (DApps). The need to reduce the cost of implementing such systems and the insufficient elaboration of the issue of storing big data in DLT are considered. An approach for building such systems is proposed, which allows optimizing the size of the required storage (by using Erasure coding) and simultaneously providing secure data storage in geographically distributed systems of a company, or within a consortium of companies. The novelty of this solution is that we are the first who combine enterprise DLT with distributed file storage, in which the availability of files is controlled. The results of our experiment demonstrate that the speed of the described DApp is comparable to known b2c torrent projects, and subsequently justify the choice of Hyperledger Fabric and Ethereum Enterprise for its use. Obtained test results show that public blockchain networks are not suitable for creating such a b2b system. The proposed system solves the main challenges of distributed data storage by grouping data into clusters and managing them with a load balancer, while preventing data tempering using a blockchain network. The considered DApps storage methodology easily scales horizontally in terms of distributed file storage and can be deployed on cloud computing technologies, while minimizing the required storage space. We compare this approach with known methods of file storage in distributed systems, including central storage, torrents, IPFS, and Storj. The reliability of this approach is calculated and the result is compared to traditional solutions based on full backup.

Download Full-text

EFLOG: A Full Stream-Logging Scheme with Erasure Coding in Cloud Storage Systems

10.1109/nas51552.2021.9605428 ◽

2021 ◽

Author(s):

Lei Sun ◽

Qiang Cao ◽

Shucheng Wang ◽

Changsheng Xie

Keyword(s):

Cloud Storage ◽

Storage Systems ◽

Erasure Coding

Download Full-text

Measurement and Analysis of the Performance of Erasure Coding with Compression

The Journal of Korean Institute of Information Technology ◽

10.14801/jkiit.2021.19.9.11 ◽

2021 ◽

Vol 19 (9) ◽

pp. 11-21

Author(s):

Eun-Kyung Kim ◽

Bong-Geol Sim ◽

Seung-Ho Lim

Keyword(s):

Erasure Coding ◽

Measurement And Analysis

Download Full-text

Deep Reinforcement Learning-Based Network Routing Technology for Data Recovery in Exa-Scale Cloud Distributed Clustering Systems

Applied Sciences ◽

10.3390/app11188727 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8727

Author(s):

Dong-Jin Shin ◽

Jeong-Joon Kim

Keyword(s):

Reinforcement Learning ◽

Data Transfer ◽

Block Size ◽

Network Routing ◽

Erasure Coding ◽

Block Transmission ◽

Other Information ◽

The Status ◽

Link Cost ◽

Network Costs

Research has been conducted to efficiently transfer blocks and reduce network costs when decoding and recovering data from an erasure coding-based distributed file system. Technologies using software-defined network (SDN) controllers can collect and more efficiently manage network data. However, the bandwidth depends dynamically on the number of data transmitted on the network, and the data transfer time is inefficient owing to the longer latency of existing routing paths when nodes and switches fail. We propose deep Q-network erasure coding (DQN-EC) to solve routing problems by converging erasure coding with DQN to learn dynamically changing network elements. Using the SDN controller, DQN-EC collects the status, number, and block size of nodes possessing stored blocks during erasure coding. The fat-tree network topology used for experimental evaluation collects elements of typical network packets, the bandwidth of the nodes and switches, and other information. The data collected undergo deep reinforcement learning to avoid node and switch failures and provide optimized routing paths by selecting switches that efficiently conduct block transfers. DQN-EC achieves a 2.5-times-faster block transmission time and 0.4-times-higher network throughput than open shortest path first (OSPF) routing algorithms. The bottleneck bandwidth and transmission link cost can be reduced, improving the recovery time approximately twofold.

Download Full-text

ER-Store: A Hybrid Storage Mechanism with Erasure Coding and Replication in Distributed Database Systems

Scientific Programming ◽

10.1155/2021/9910942 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Zijian Li ◽

Chuqiao Xiao

Keyword(s):

Database Systems ◽

Distributed Database ◽

Database System ◽

Recognition Algorithm ◽

Erasure Coding ◽

Data Types ◽

Hybrid Storage ◽

Storage Mechanism ◽

Distributed Database Systems ◽

Computational Overhead

In distributed database systems, as cluster scales grow, efficiency and availability become critical considerations. In a cluster, a common approach to high availability is using replication, but this is inefficient due to its low storage utilization. Erasure coding can provide data reliability while ensuring high storage utilization. However, due to the large number of coding and decoding operations required by the CPU, it is not suitable for some frequently updated data. In order to optimize the storage efficiency of the data in the distributed system without affecting the availability of the data, this paper proposes a data temperature recognition algorithm that can distinguish data tablets and divides data tablets into three types, cold, warm, and hot, according to the frequency of access. Combining three replicas and erasure coding technology, ER-store is proposed, a hybrid storage mechanism for different data types. At the same time, we combined the read-write separation architecture of the distributed database system to design the data temperature conversion cycle, which reduces the computational overhead caused by frequent updates of erasure coding technology. We have implemented this design on the CBase database system based on the read-write separation architecture, and the experimental results show that it can save 14.6%–18.3% of the storage space while meeting the efficient access performance of the system.

Download Full-text

Enabling Low-Redundancy Proactive Fault Tolerance for Stream Machine Learning via Erasure Coding

10.1109/srds53918.2021.00019 ◽

2021 ◽

Author(s):

Zhinan Cheng ◽

Lu Tang ◽

Qun Huang ◽

Patrick P. C. Lee

Keyword(s):

Machine Learning ◽

Fault Tolerance ◽

Erasure Coding ◽

Proactive Fault Tolerance

Download Full-text

Repair Pipelining Mechanism with Hybrid Erasure Coding for Block Chain

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.33733 ◽

2021 ◽

Vol 9 (4) ◽

pp. 863-866

Author(s):

Ms. M. Bavithra

Keyword(s):

Erasure Coding ◽

Block Chain

Download Full-text

erasure coding
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Lightweight Dynamic Redundancy Control with Adaptive Encoding for Server-based Storage

Accelerating XOR-based erasure coding using program optimization techniques

Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers

An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

EFLOG: A Full Stream-Logging Scheme with Erasure Coding in Cloud Storage Systems

Measurement and Analysis of the Performance of Erasure Coding with Compression

Deep Reinforcement Learning-Based Network Routing Technology for Data Recovery in Exa-Scale Cloud Distributed Clustering Systems

ER-Store: A Hybrid Storage Mechanism with Erasure Coding and Replication in Distributed Database Systems

Enabling Low-Redundancy Proactive Fault Tolerance for Stream Machine Learning via Erasure Coding

Repair Pipelining Mechanism with Hybrid Erasure Coding for Block Chain

Export Citation Format

erasure codingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Lightweight Dynamic Redundancy Control with Adaptive Encoding for Server-based Storage

Accelerating XOR-based erasure coding using program optimization techniques

Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers

An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

EFLOG: A Full Stream-Logging Scheme with Erasure Coding in Cloud Storage Systems

Measurement and Analysis of the Performance of Erasure Coding with Compression

Deep Reinforcement Learning-Based Network Routing Technology for Data Recovery in Exa-Scale Cloud Distributed Clustering Systems

ER-Store: A Hybrid Storage Mechanism with Erasure Coding and Replication in Distributed Database Systems

Enabling Low-Redundancy Proactive Fault Tolerance for Stream Machine Learning via Erasure Coding

Repair Pipelining Mechanism with Hybrid Erasure Coding for Block Chain

erasure coding
Recently Published Documents