Malicious Node Identification in Coded Distributed Storage Systems under Pollution Attacks

Author(s):  
Rossano Gaeta ◽  
Marco Grangetto

In coding-based distributed storage systems (DSSs), a set of storage nodes (SNs) hold coded fragments of a data unit that collectively allow one to recover the original information. It is well known that data modification (a.k.a. pollution attack) is the Achilles’ heel of such coding systems; indeed, intentional modification of a single coded fragment has the potential to prevent the reconstruction of the original information because of error propagation induced by the decoding algorithm. The challenge we take in this work is to devise an algorithm to identify polluted coded fragments within the set encoding a data unit and to characterize its performance. To this end, we provide the following contributions: (i) We devise MIND (Malicious node IdeNtification in DSS), an algorithm that is general with respect to the encoding mechanism chosen for the DSS, it is able to cope with a heterogeneous allocation of coded fragments to SNs, and it is effective in successfully identifying polluted coded fragments in a low-redundancy scenario; (ii) We formally prove both MIND termination and correctness; (iii) We derive an accurate analytical characterization of MIND performance (hit probability and complexity); (iv) We develop a C++ prototype that implements MIND to validate the performance predictions of the analytical model. Finally, to show applicability of our work, we define performance and robustness metrics for an allocation of coded fragments to SNs and we apply the results of the analytical characterization of MIND performance to select coded fragments allocations yielding robustness to collusion as well as the highest probability to identify actual attackers.

2017 ◽  
Vol 45 (1) ◽  
pp. 51-51
Author(s):  
Wen Sun ◽  
Véronique Simon ◽  
Sébastien Monnet ◽  
Philippe Robert ◽  
Pierre Sens

2014 ◽  
Vol 918 ◽  
pp. 295-300
Author(s):  
Peng Fei You ◽  
Yu Xing Peng ◽  
Zhen Huang ◽  
Chang Jian Wang

In distributed storage systems, erasure codes represent an attractive data redundancy solution which can provide the same reliability as replication requiring much less storage space. Multiple data losses happens usually and the lost data should be regenerated to maintain data redundancy in distributed storage systems. Regeneration for multiple data losses is expected to be finished as soon as possible, because the regeneration time can influence the data reliability and availability of distributed storage systems. However, multiple data losses is usually regenerated by regenerating single data loss one by one, which brings high entire regeneration time and severely reduces the data reliability and availability of distributed storage systems. In this paper, we propose a tree-structured parallel regeneration scheme based on regenerating codes (TPRORC) for multiple data losses in distributed storage systems. In our scheme, multiple regeneration trees based on regenerating code are constructed. Firstly, these trees are created independently, each of which dose not share any edges from the others and is responsible for one data loss; secondly, every regeneration tree based on regenerating codes owns the least network traffic and bandwidth optimized-paths for regenerating its data loss. Thus it can perform parallel regeneration for multiple data losses by using multiple optimized topology trees, in which network bandwidth is utilized efficiently and entire regeneration is overlapped. Our simulation results show that the tree-structured parallel regeneration scheme reduces the regeneration time significantly, compared to other regular regeneration schemes.


Sign in / Sign up

Export Citation Format

Share Document