scholarly journals A Novel approach of data deduplication for distributed storage

2018 ◽  
Vol 7 (2.4) ◽  
pp. 46 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Akanksha Kaushik ◽  
Pooja Sharma

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of  which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.

2018 ◽  
Vol 10 (4) ◽  
pp. 43-66 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Pooja Sharma ◽  
Rajesh Kumar Aggarwal ◽  
Vishal Passricha

This article describes how data deduplication efficiently eliminates the redundant data by selecting and storing only single instance of it and becoming popular in storage systems. Digital data is growing much faster than storage volumes, which shows the importance of data deduplication among scientists and researchers. Data deduplication is considered as most successful and efficient technique of data reduction because it is computationally efficient and offers a lossless data reduction. It is applicable to various storage systems, i.e. local storage, distributed storage, and cloud storage. This article discusses the background, components, and key features of data deduplication which helps the reader to understand the design issues and challenges in this field.


2019 ◽  
Vol 11 (1) ◽  
pp. 49-70
Author(s):  
Mohsin Altaf Wani ◽  
Manzoor Ahmad

Modern GPUs perform computation at a very high rate when compared to CPUs; as a result, they are increasingly used for general purpose parallel computation. Determining if a statically optimal binary search tree is an optimization problem to find the optimal arrangement of nodes in a binary search tree so that average search time is minimized. Knuth's modification to the dynamic programming algorithm improves the time complexity to O(n2). We develop a multiple GPU-based implementation of this algorithm using different approaches. Using suitable GPU implementation for a given workload provides a speedup of up to four times over other GPU based implementations. We are able to achieve a speedup factor of 409 on older GTX 570 and a speedup factor of 745 is achieved on a more modern GTX 1060 when compared to a conventional single threaded CPU based implementation.


2020 ◽  
Vol 18 (1) ◽  
pp. 1-10
Author(s):  
A. D. GBADEBO ◽  
A. T. AKINWALE ◽  
S. AKINLEYE

The task of storing items to allow for fast access to an item given its key is an ubiquitous problem in many organizations. Treap as a method uses key and priority for searching in databases. When the keys are drawn from a large totally ordered set, the choice of storing the items is usually some sort of search tree. The simplest form of such tree is a binary search tree. In this tree, a set X of n items is stored at the nodes of a rooted binary tree in which some item y ϵ X is chosen to be stored at the root of the tree. Heap as data structure is an array object that can be viewed as a nearly complete binary tree in which each node of the tree corresponds to an element of the array that stores the value in the node. Both algorithms were subjected to sorting under the same experimental environment and conditions. This was implemented by means of threads which call each of the two methods simultaneously. The server keeps records of individual search time which was the basis of the comparison. It was discovered that treap was faster than heap sort in sorting and searching for elements using systems with homogenous properties.    


2020 ◽  
Vol 17 (8) ◽  
pp. 3631-3635
Author(s):  
L. Mary Gladence ◽  
Priyanka Reddy ◽  
Apoorva Shetty ◽  
E. Brumancia ◽  
Senduru Srinivasulu

Data deduplication is one of the main techniques for copying recovery data duplicates and was widely used in distributed storage to minimize extra space and spare data transfer capacity. It was proposed that the simultaneous encryption method encode the data before re-appropriating to preserve the confidentiality of delicate data while facilitating de replication. Unlike conventional de duplication systems, consumers are therefore viewed as having differential advantages as indupli-cate tests other than the data itself. Security analysis shows that our approach is safe in terms of the values set out in the proposed security model. For this deduplication M3 encryption algorithm and DES algorithm are used. M3 encryption is to compare another with the latest technology, for more effective, security purposes, fast actions and. The second DES encryption that was used to open the file and decrypt understandable language for humans in a secure language. A model of our current accepted copy check program is revised as proof of concept by the current research and explicitly shows the tests using our model. The proposed research shows that when opposed to conventional operations, our proposed duplicate test plot creates marginal overhead.


Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1186
Author(s):  
Fahed Jubair ◽  
Mohammed Hawa

Pathfinding is the problem of finding the shortest path between a pair of nodes in a graph. In the context of uniform-cost undirected grid maps, heuristic search algorithms, such as A ★ and weighted A ★ ( W A ★ ), have been dominantly used for pathfinding. However, the lack of knowledge about obstacle shapes in a gird map often leads heuristic search algorithms to unnecessarily explore areas where a viable path is not available. We refer to such areas in a grid map as blocked areas (BAs). This paper introduces a preprocessing algorithm that analyzes the geometry of obstacles in a grid map and stores knowledge about blocked areas in a memory-efficient balanced binary search tree data structure. During actual pathfinding, a search algorithm accesses the binary search tree to identify blocked areas in a grid map and therefore avoid exploring them. As a result, the search time is significantly reduced. The scope of the paper covers maps in which obstacles are represented as horizontal and vertical line-segments. The impact of using the blocked area knowledge during pathfinding in A ★ and W A ★ is evaluated using publicly available benchmark set, consisting of sixty grid maps of mazes and rooms. In mazes, the search time for both A ★ and W A ★ is reduced by 28 % , on average. In rooms, the search time for both A ★ and W A ★ is reduced by 30 % , on average. This is achieved while preserving the search optimality of A ★ and the search sub-optimality of W A ★ .


2000 ◽  
Vol 11 (03) ◽  
pp. 485-513 ◽  
Author(s):  
SEONGHUN CHO ◽  
SARTAJ SAHNI

We develop a new class of weight balanced binary search trees called β-balanced binary search trees (β-BBSTs). β-BBSTs are designed to have reduced internal path length. As a result, they are expected to exhibit good search time characteristics. Individual search, insert, and delete operations in an n node β-BBST take O( log n) time for [Formula: see text]. Experimental results comparing the performance of β-BBSTs, WB(α) trees, AVL-trees, red/black trees, treaps, deterministic skip lists and skip lists are presented. Two simplified versions of, β-BBSTs are also developed.


Sign in / Sign up

Export Citation Format

Share Document