Hekate a tool for gauging Data Deduplication Performance

Author(s):  
Lars Nielsen ◽  
Daniel E. Lucani
Keyword(s):  
2013 ◽  
Vol 33 (9) ◽  
pp. 2493-2496
Author(s):  
Xueqiong LIU ◽  
Gang WU ◽  
Houping DENG

Author(s):  
B. Tirapathi reddy ◽  
Maddireddy Vaishnavi ◽  
Makireddy Lalitha ◽  
Papineni Poojitha ◽  
Vakalapudi Bhavya Sri Kanthi

2021 ◽  
Author(s):  
Xuming Ye ◽  
Jia Tang ◽  
Wenlong Tian ◽  
Ruixuan Li ◽  
Weijun Xiao ◽  
...  

2018 ◽  
Vol 7 (2.4) ◽  
pp. 46 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Akanksha Kaushik ◽  
Pooja Sharma

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of  which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.


Sign in / Sign up

Export Citation Format

Share Document