Inline Data Deduplication for SSD-Based Distributed Storage

Author(s):  
Binqi Zhang ◽  
Chen Wang ◽  
Bing Bing Zhou ◽  
Albert Y. Zomaya
2018 ◽  
Vol 7 (2.4) ◽  
pp. 46 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Akanksha Kaushik ◽  
Pooja Sharma

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of  which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.


Author(s):  
MD. Jareena Begum ◽  
B. Haritha

Cloud computing assumes an essential job in the business stage as figuring assets are conveyed on request to clients over the Internet. Distributed computing gives on-request and pervasive access to a concentrated pool of configurable assets, for example, systems, applications, and administrations This guarantees the vast majority of undertakings and number of clients externalize their information into the cloud worker. As of late, secure deduplication strategies have bid extensive interests in the both scholastic and mechanical associations. The primary preferred position of utilizing distributed storage from the clients' perspective is that they can diminish their consumption in buying and keeping up capacity framework. By the creating data size of appropriated registering, a decline in data volumes could help providers reducing the costs of running gigantic accumulating system and saving power usage. So information deduplication strategies have been proposed to improve capacity effectiveness in cloud stockpiles. Also, thinking about the assurance of delicate documents. Before putting away the records into the cloude stockpile they frequently utilize some encryption calculations to ensure them.In this paper we propose stratagies for secure information deduplication


2018 ◽  
Vol 10 (4) ◽  
pp. 43-66 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Pooja Sharma ◽  
Rajesh Kumar Aggarwal ◽  
Vishal Passricha

This article describes how data deduplication efficiently eliminates the redundant data by selecting and storing only single instance of it and becoming popular in storage systems. Digital data is growing much faster than storage volumes, which shows the importance of data deduplication among scientists and researchers. Data deduplication is considered as most successful and efficient technique of data reduction because it is computationally efficient and offers a lossless data reduction. It is applicable to various storage systems, i.e. local storage, distributed storage, and cloud storage. This article discusses the background, components, and key features of data deduplication which helps the reader to understand the design issues and challenges in this field.


2020 ◽  
Vol 17 (8) ◽  
pp. 3631-3635
Author(s):  
L. Mary Gladence ◽  
Priyanka Reddy ◽  
Apoorva Shetty ◽  
E. Brumancia ◽  
Senduru Srinivasulu

Data deduplication is one of the main techniques for copying recovery data duplicates and was widely used in distributed storage to minimize extra space and spare data transfer capacity. It was proposed that the simultaneous encryption method encode the data before re-appropriating to preserve the confidentiality of delicate data while facilitating de replication. Unlike conventional de duplication systems, consumers are therefore viewed as having differential advantages as indupli-cate tests other than the data itself. Security analysis shows that our approach is safe in terms of the values set out in the proposed security model. For this deduplication M3 encryption algorithm and DES algorithm are used. M3 encryption is to compare another with the latest technology, for more effective, security purposes, fast actions and. The second DES encryption that was used to open the file and decrypt understandable language for humans in a secure language. A model of our current accepted copy check program is revised as proof of concept by the current research and explicitly shows the tests using our model. The proposed research shows that when opposed to conventional operations, our proposed duplicate test plot creates marginal overhead.


Author(s):  
Hema S and Dr.Kangaiammal A

Cloud services increase data availability so as to offer flawless service to the client. Because of increasing data availability, more redundancies and more memory space are required to store such data. Cloud computing requires essential storage and efficient protection for all types of data. With the amount of data produced seeing an exponential increase with time, storing the replicated data contents is inevitable. Hence, using storage optimization approaches becomes an important pre-requisite for enormous storage domains like cloud storage. Data deduplication is the technique which compresses the data by eliminating the replicated copies of similar data and it is widely utilized in cloud storage to conserve bandwidth and minimize the storage space. Despite the data deduplication eliminates data redundancy and data replication; it likewise presents significant data privacy and security problems for the end-user. Considering this, in this work, a novel security-based deduplication model is proposed to reduce a hash value of a given file size and provide additional security for cloud storage. In proposed method the hash value of a given file is reduced employing Distributed Storage Hash Algorithm (DSHA) and to provide security the file is encrypted by using an Improved Blowfish Encryption Algorithm (IBEA). This framework also proposes the enhanced fuzzy based intrusion detection system (EFIDS) by defining rules for the major attacks, thereby alert the system automatically. Finally the combination of data exclusion and security encryption technique allows cloud users to effectively manage their cloud storage by avoiding repeated data encroachment. It also saves bandwidth and alerts the system from attackers. The results of experiments reveal that the discussed algorithm yields improved throughput and bytes saved per second in comparison with other chunking algorithms.


2020 ◽  
Author(s):  
◽  
Dorian Burihabwa

Cloud storage has durably entered the stage as go-to solution for business and personal storage. Virtually extending storage capabilities to infinity, cloud storage enables companies and individuals to focus on content creation without fear of running out of space or losing data. But as users entrust more and more data to the cloud, they also have to accept a loss of control over the data they o˜oad to the cloud. At a time when online services seem to be making a significant part of their profits by exploiting customer data, concerns over privacy and integrity of said data naturally arise. Are their online documents read by the storage provider or its employees? Is the content of these documents shared with third party partners of the storage provider? What happens if the provider goes bankrupt? Whatever answer can be o˙ered by the storage provider, the loss of control should be cause for concern. But storage providers also have to worry about trust and reliability. As they build distributed solutions to accommodate their customers’ needs, these concerns of control extend to the infrastructure they operate on. Conciliating security, confidentiality, resilience and perform-ance over large sets of distributed storage nodes is a tricky balancing act. And even when a suitable balance can be found, it is often done at the expense of increased storage overhead. In this dissertation, we try to mitigate these issues by focusing on three aspects. First, we study solutions to empower users with flexible tooling ensuring security, integrity and redundancy in distributed storage settings. By leveraging public cloud storage o˙erings to build a configurable file system and storage middleware, we show that securing cloud-storage from the client-side is an e˙ective way maintaining control. Second, we build a distributed archive whose resilience goes beyond standard redundancy schemes. To achieve this, we implement Recast, relying on a data entanglement scheme, that encodes and distributes data over a set of storage nodes to ensure durability at a manageable cost. Finally, we look into o˙setting the increase in storage overhead by means of data reduction. This is made possible by the use of Generalised Deduplication, a scheme that improves over classical data deduplication by detecting similarities beyond exact matches.


Data de-duplication is a standout amongst the most explicit coagulation strategy for dispensing with indistinguishable duplicates of improved information in distributed storage to Defeat the measurement of the storage space and the recovery of the transmission ability. Information pressure performs coherent decrease of storage room by least hashing. To ensure the classification of delicate information while supporting de-duplication, the merged encryption procedure has been proposed to encode the information before re-appropriating. To more readily ensure information security, it endeavors to formally address the issue of approved information de-duplication. Not quite the same as customary de-duplication, the benefits of client improved by upgrading their capacity limit and security examination .it likewise present a few new de-duplication developments supporting approved copy check in crossover cloud design. Security examination shows that are set up to keep away from unapproved get to. As proof of notion, we are updating the model of our proposed approved copy control system. and lead proving ground tests utilizing our model. We demonstrate that our proposed approved copy check conspire brings about negligible overhead contrasted with ordinary operations. Deduplication has demonstrated to accomplish high space and cost investment funds and many distributed storage suppliers are presently embracing it. Deduplication can diminish capacity needs by up to 90-95 percent for reinforcement


Distributed computing empowers organizations to devour a figure asset, for example, a virtual machine (VM), stockpiling or an application, as a utility simply like power as opposed to building and keep up registering frameworks in house. In distributed computing, the most significant part is server farm, where client's/client's information is put away. In server farms, the information may be transferred various time or information can be hacked along these lines, while utilizing the cloud benefits the information should be encoded and put away. With the consistent and exponential increment of the quantity of clients and the size of their information, information deduplication turns out to be increasingly more a need for distributed storage suppliers. By putting away a one of a kind duplicate of copy information, cloud suppliers significantly decrease their stockpiling and information move costs. As a result of the approved information holders who get the symmetric the encoded information can likewise be safely gotten to. Keys utilized for unscrambling of information. The outcomes demonstrate the predominant productivity and viability of the plan for huge information deduplication in distributed storage. Assess its exhibition dependent on broad examination and PC re-enactments with the assistance of logs caught at the hour of deduplication.


Sign in / Sign up

Export Citation Format

Share Document