Malicious Node Identification in Coded Distributed Storage Systems under Pollution Attacks

In coding-based distributed storage systems (DSSs), a set of storage nodes (SNs) hold coded fragments of a data unit that collectively allow one to recover the original information. It is well known that data modification (a.k.a. pollution attack) is the Achilles’ heel of such coding systems; indeed, intentional modification of a single coded fragment has the potential to prevent the reconstruction of the original information because of error propagation induced by the decoding algorithm. The challenge we take in this work is to devise an algorithm to identify polluted coded fragments within the set encoding a data unit and to characterize its performance. To this end, we provide the following contributions: (i) We devise MIND (Malicious node IdeNtification in DSS), an algorithm that is general with respect to the encoding mechanism chosen for the DSS, it is able to cope with a heterogeneous allocation of coded fragments to SNs, and it is effective in successfully identifying polluted coded fragments in a low-redundancy scenario; (ii) We formally prove both MIND termination and correctness; (iii) We derive an accurate analytical characterization of MIND performance (hit probability and complexity); (iv) We develop a C++ prototype that implements MIND to validate the performance predictions of the analytical model. Finally, to show applicability of our work, we define performance and robustness metrics for an allocation of coded fragments to SNs and we apply the results of the analytical characterization of MIND performance to select coded fragments allocations yielding robustness to collusion as well as the highest probability to identify actual attackers.

Download Full-text

On the impact of pollution attacks on coding-based distributed storage systems

IEEE Transactions on Information Forensics and Security ◽

10.1109/tifs.2022.3140924 ◽

2022 ◽

pp. 1-1

Author(s):

Rossano Gaeta

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Pollution Attacks ◽

Distributed Storage Systems ◽

The Impact

Download Full-text

An Application-Oriented Cache Allocation and Prefetching Method for Long-Running Applications in Distributed Storage Systems

Chinese Journal of Electronics ◽

10.1049/cje.2019.05.004 ◽

2019 ◽

Vol 28 (4) ◽

pp. 773-780 ◽

Cited By ~ 1

Author(s):

Chang Guo ◽

Ying Li ◽

Hongzhi Liu ◽

Zhonghai Wu

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Distributed Storage Systems ◽

Cache Allocation

Download Full-text

Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3143314.3078531 ◽

2017 ◽

Vol 45 (1) ◽

pp. 51-51

Author(s):

Wen Sun ◽

Véronique Simon ◽

Sébastien Monnet ◽

Philippe Robert ◽

Pierre Sens

Keyword(s):

Stochastic Model ◽

Storage Systems ◽

Distributed Storage ◽

Distributed Storage Systems

Download Full-text

Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems

2015 44th International Conference on Parallel Processing ◽

10.1109/icpp.2015.48 ◽

2015 ◽

Cited By ~ 6

Author(s):

Qingyuan Gong ◽

Jiaqi Wang ◽

Dongsheng Wei ◽

Jin Wang ◽

Xin Wang

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Node Selection ◽

Optimal Node ◽

Distributed Storage Systems ◽

Selection For

Download Full-text

A Generic Transformation to Enable Optimal Repair in MDS Codes for Distributed Storage Systems

IEEE Transactions on Information Theory ◽

10.1109/tit.2018.2855059 ◽

2018 ◽

Vol 64 (9) ◽

pp. 6257-6267 ◽

Cited By ~ 13

Author(s):

Jie Li ◽

Xiaohu Tang ◽

Chao Tian

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Mds Codes ◽

Distributed Storage Systems ◽

Generic Transformation

Download Full-text

Data placement strategy in data center distributed storage systems

2016 IEEE International Conference on Communication Systems (ICCS) ◽

10.1109/iccs.2016.7833566 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yang Qin ◽

Xiao Ai ◽

Lingjian Chen ◽

Weihong Yang

Keyword(s):

Data Center ◽

Storage Systems ◽

Distributed Storage ◽

Data Placement ◽

Distributed Storage Systems

Download Full-text

Tree-Structured Parallel Regeneration Based on Regenerating Codes for Multiple Data Losses in Distributed Storage Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.918.295 ◽

2014 ◽

Vol 918 ◽

pp. 295-300

Author(s):

Peng Fei You ◽

Yu Xing Peng ◽

Zhen Huang ◽

Chang Jian Wang

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Data Loss ◽

Data Reliability ◽

Data Redundancy ◽

Regeneration Time ◽

Multiple Data ◽

Distributed Storage Systems ◽

Regenerating Codes ◽

Reliability And Availability

In distributed storage systems, erasure codes represent an attractive data redundancy solution which can provide the same reliability as replication requiring much less storage space. Multiple data losses happens usually and the lost data should be regenerated to maintain data redundancy in distributed storage systems. Regeneration for multiple data losses is expected to be finished as soon as possible, because the regeneration time can influence the data reliability and availability of distributed storage systems. However, multiple data losses is usually regenerated by regenerating single data loss one by one, which brings high entire regeneration time and severely reduces the data reliability and availability of distributed storage systems. In this paper, we propose a tree-structured parallel regeneration scheme based on regenerating codes (TPRORC) for multiple data losses in distributed storage systems. In our scheme, multiple regeneration trees based on regenerating code are constructed. Firstly, these trees are created independently, each of which dose not share any edges from the others and is responsible for one data loss; secondly, every regeneration tree based on regenerating codes owns the least network traffic and bandwidth optimized-paths for regenerating its data loss. Thus it can perform parallel regeneration for multiple data losses by using multiple optimized topology trees, in which network bandwidth is utilized efficiently and entire regeneration is overlapped. Our simulation results show that the tree-structured parallel regeneration scheme reduces the regeneration time significantly, compared to other regular regeneration schemes.

Download Full-text