Improving Data Availability for Deduplication in Cloud Storage

2018 ◽  
Vol 10 (2) ◽  
pp. 70-89 ◽  
Author(s):  
Jun Li ◽  
Mengshu Hou

This article describes how in order to reduce the amount of data, deduplication technology is introduced in the cloud storage. Adopting this technology, the duplicated data can be eliminated, users can conserve the storage requirement. However, deduplication technology also increases the data unavailability. To solve this problem, the authors propose a method to improve data availability in the deduplication storage system. It is based on the data chunk reference count and access frequency, and increases redundant information for the data chunks, to ensure data availability and minimize storage overhead. Extensive experiments are conducted to evaluate effectiveness of the improved method. WFD, CDC, and sliding block deduplication technology are used for comparison. The experimental results show that the proposed method can achieve higher data availability than the conventional method and increase little storage overhead.

Author(s):  
Anil Kumar G. ◽  
Shantala C. P.

Owing to the highly distributed nature of the cloud storage system, it is one of the challenging tasks to incorporate a higher degree of security towards the vulnerable data. Apart from various security concerns, data privacy is still one of the unsolved problems in this regards. The prime reason is that existing approaches of data privacy doesn't offer data integrity and secure data deduplication process at the same time, which is highly essential to ensure a higher degree of resistance against all form of dynamic threats over cloud and internet systems. Therefore, data integrity, as well as data deduplication is such associated phenomena which influence data privacy. Therefore, this manuscript discusses the explicit research contribution toward data integrity, data privacy, and data deduplication. The manuscript also contributes towards highlighting the potential open research issues followed by a discussion of the possible future direction of work towards addressing the existing problems.


2019 ◽  
Vol 30 (04) ◽  
pp. 551-570 ◽  
Author(s):  
Wenjuan Meng ◽  
Jianhua Ge ◽  
Tao Jiang

A cloud storage system which incorporates the deletion and deduplication functionalities will have both security and efficiency advantages over exiting solutions which provide only one of them. However, the security models of secure data deletion and data deduplication functionalities are not compatible with each other, which will cause security and efficiency vulnerability under coercive adversaries. To solve these security and efficiency challenges, we define and construct a scheme, whose security relies on the proper erasure of keys in the wrapped key tree and periodical update of the deduplication encryption keys. Moreover, we enhance the efficiency of the proposed scheme by introducing incremental data update, where only the changed part is encrypted/decrypted and uploaded/downloaded in data updating. Further security analysis shows that the proposed scheme is secure against coercive attack. Finally, the practical implementation shows that our scheme is performance efficient in computation, storage and communication for both the cloud storage server and users.


IJARCCE ◽  
2017 ◽  
Vol 6 (4) ◽  
pp. 316-323
Author(s):  
Bhos Komal ◽  
Ingale Karuna ◽  
Hattikatti Susmita ◽  
Jadhav Sachin ◽  
Mirajkar SS ◽  
...  

Cloud computing, an efficient technology that utilizes huge amount of data file storage with security. However, the content owner does not controlling data access for unauthorized clients and does not control data storage and usage of data. Some previous approaches data access control to help data de-duplication concurrently for cloud storage system. Encrypted data for cloud storage is not effectively handled by current industrial de-duplication solutions. The deduplication is unguarded from brute-force attacks and fails in supporting control of data access .An efficient data confining technique that eliminates redundant data’s multiple copies which is commonly used is Data-Deduplication. It reduces the space needed to store these data and thus bandwidth is saved. An efficient content discovery and preserving De-duplication (ECDPD) algorithm that detects client file range and block range of de-duplication in storing data files in the cloud storage system was proposed to overpower the above problems.Data access control is supported by ECDPD actively. Based on Experimental evaluations, proposed ECDPD method reduces 3.802 milliseconds of DUT (Data Uploading Time) and 3.318 milliseconds of DDT (Data Downloading Time) compared than existing approaches


2014 ◽  
Vol 644-650 ◽  
pp. 1915-1918
Author(s):  
Shao Min Zhang ◽  
Hai Pu Dong ◽  
Bao Yi Wang

With development of computer technology, massive information has brought huge challenge on the storage system reliability. A algorithm called HG(Heuristic greedy) algorithm is proposed to optimal calculation path, reduce XOR operation and computational complexity for data recovery, which applies CRS(Cauchy Reed-Solomon) code to cloud storage system HDFS and turns multiply operation of CRS coding to binary matrix multiplication operation.The performance analysis shows that it improves fault tolerance of cloud file system, storage space effectively and timeliness with reduction of additional storage overhead.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 288-301
Author(s):  
G. Sujatha ◽  
Dr. Jeberson Retna Raj

Data storage is one of the significant cloud services available to the cloud users. Since the magnitude of information outsourced grows extremely high, there is a need of implementing data deduplication technique in the cloud storage space for efficient utilization. The cloud storage space supports all kind of digital data like text, audio, video and image. In the hash-based deduplication system, cryptographic hash value should be calculated for all data irrespective of its type and stored in the memory for future reference. Using these hash value only, duplicate copies can be identified. The problem in this existing scenario is size of the hash table. To find a duplicate copy, all the hash values should be checked in the worst case irrespective of its data type. At the same time, all kind of digital data does not suit with same structure of hash table. In this study we proposed an approach to have multiple hash tables for different digital data. By having dedicated hash table for each digital data type will improve the searching time of duplicate data.


2018 ◽  
Vol 7 (S1) ◽  
pp. 16-19
Author(s):  
B. Rasina Begum ◽  
P. Chithra

Cloud computing provides a scalable platform for large amount of data and processes that work on various applications and services by means of on-demand service. The storage services offered by clouds have become a new profit growth by providing a comparable cheaper, scalable, location-independent platform for managing users’ data. The client uses the cloud storage and enjoys the high end applications and services from a shared group of configurable computing resources using cloud services. It reduces the difficulty of local data storage and maintenance. But it gives severe security issues toward users’ outsourced data. Data Redundancy promotes the data reliability in Cloud Storage. At the same time, it increases storage space, Bandwidth and Security threats due to some server vulnerability. Data Deduplication helps to improve storage utilization. Backup is also less which means less Hardware and Backup media. But it has lots of security issues. Data reliability is a very risky issue in a Deduplication storage system because there is single copy for each file stored in the server which is shared by all the data owners. If such a shared file/chunk was missing, large amount of data becomes unreachable. The main aim of this work is to implement Deduplication System without sacrificing Security in cloud storage. It combines both Deduplication and convergent key cryptography with reduced overhead.


Author(s):  
Hema S and Dr.Kangaiammal A

Cloud services increase data availability so as to offer flawless service to the client. Because of increasing data availability, more redundancies and more memory space are required to store such data. Cloud computing requires essential storage and efficient protection for all types of data. With the amount of data produced seeing an exponential increase with time, storing the replicated data contents is inevitable. Hence, using storage optimization approaches becomes an important pre-requisite for enormous storage domains like cloud storage. Data deduplication is the technique which compresses the data by eliminating the replicated copies of similar data and it is widely utilized in cloud storage to conserve bandwidth and minimize the storage space. Despite the data deduplication eliminates data redundancy and data replication; it likewise presents significant data privacy and security problems for the end-user. Considering this, in this work, a novel security-based deduplication model is proposed to reduce a hash value of a given file size and provide additional security for cloud storage. In proposed method the hash value of a given file is reduced employing Distributed Storage Hash Algorithm (DSHA) and to provide security the file is encrypted by using an Improved Blowfish Encryption Algorithm (IBEA). This framework also proposes the enhanced fuzzy based intrusion detection system (EFIDS) by defining rules for the major attacks, thereby alert the system automatically. Finally the combination of data exclusion and security encryption technique allows cloud users to effectively manage their cloud storage by avoiding repeated data encroachment. It also saves bandwidth and alerts the system from attackers. The results of experiments reveal that the discussed algorithm yields improved throughput and bytes saved per second in comparison with other chunking algorithms.


2019 ◽  
Vol 10 (1) ◽  
pp. 1-29 ◽  
Author(s):  
Anindita Sarkar Mondal ◽  
Madhupa Sanyal ◽  
Samiran Chattapadhyay ◽  
Kartick Chandra Mondal

Big Data management is an interesting research challenge for all storage vendors. Since data can be structured or unstructured, hence variety of storage systems has been designed to meet storage requirement as per organization's demands. The article focuses on different kinds of storage systems, their architecture and implementations. The first portion of the article describes different examples of structured (PostgreSQL) and unstructured databases (MongoDB, OrientDB and Neo4j) along with data models and comparative performance analysis between them. The second portion of the paper focuses on cloud storage systems. As an example of cloud storage, Google Cloud Storage and mainly its implementation details have been discussed. The aim of the article is not to eulogize any particular storage system, but to clearly point out that every storage has a role to play in the industry. It depends on the enterprise to identify the requirements and deploy the storage systems.


Author(s):  
Sunil S ◽  
A Ananda Shankar

Cloud storage system is to provides facilitative file storage and sharing services for distributed clients.The cloud storage preserve the privacy of data holders by proposing a scheme to manage encrypted data storage with deduplication. This process can flexibly support data sharing with deduplication even when the data holder is offline, and it does not intrude the privacy of data holders. It is an effective approach to verify data ownership and check duplicate storage with secure challenge and big data support. We integrate cloud data deduplication with data access control in a simple way, thus reconciling data deduplication and encryption.We prove the security and assess the performance through analysis and simulation. The results show its efficiency, effectiveness and applicability.In this proposed system the upload data will be stored on the cloud based on date.This means that it has to be available to the data holder who need it when they need it. The web log record represents whether the keyword is repeated or not. Records with only repeated search data are retained in primary storage in cloud. All the other records are stored in temporary storage server. This step reduces the size of the web log thereby avoids the burden on the memory and speeds up the analysis.


Sign in / Sign up

Export Citation Format

Share Document