Improving Data Availability for Deduplication in Cloud Storage

Jun Li; Mengshu Hou

doi:10.4018/ijghpc.2018040106

Improving Data Availability for Deduplication in Cloud Storage

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018040106 ◽

2018 ◽

Vol 10 (2) ◽

pp. 70-89 ◽

Cited By ~ 3

Author(s):

Jun Li ◽

Mengshu Hou

Keyword(s):

Conventional Method ◽

Cloud Storage ◽

Storage System ◽

Data Availability ◽

Redundant Information ◽

Storage Requirement ◽

Improved Method ◽

Data Deduplication ◽

Access Frequency ◽

Storage Overhead

This article describes how in order to reduce the amount of data, deduplication technology is introduced in the cloud storage. Adopting this technology, the duplicated data can be eliminated, users can conserve the storage requirement. However, deduplication technology also increases the data unavailability. To solve this problem, the authors propose a method to improve data availability in the deduplication storage system. It is based on the data chunk reference count and access frequency, and increases redundant information for the data chunks, to ensure data availability and minimize storage overhead. Extensive experiments are conducted to evaluate effectiveness of the improved method. WFD, CDC, and sliding block deduplication technology are used for comparison. The experimental results show that the proposed method can achieve higher data availability than the conventional method and increase little storage overhead.

Download Full-text

An extensive research survey on data integrity and deduplication towards privacy in cloud storage

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp2011-2022 ◽

2020 ◽

Vol 10 (2) ◽

pp. 2011

Author(s):

Anil Kumar G. ◽

Shantala C. P.

Keyword(s):

Cloud Storage ◽

Data Privacy ◽

Storage System ◽

Data Integrity ◽

Data Deduplication ◽

Research Issues ◽

Existing Problems ◽

Research Survey ◽

Challenging Tasks ◽

Future Direction

Owing to the highly distributed nature of the cloud storage system, it is one of the challenging tasks to incorporate a higher degree of security towards the vulnerable data. Apart from various security concerns, data privacy is still one of the unsolved problems in this regards. The prime reason is that existing approaches of data privacy doesn't offer data integrity and secure data deduplication process at the same time, which is highly essential to ensure a higher degree of resistance against all form of dynamic threats over cloud and internet systems. Therefore, data integrity, as well as data deduplication is such associated phenomena which influence data privacy. Therefore, this manuscript discusses the explicit research contribution toward data integrity, data privacy, and data deduplication. The manuscript also contributes towards highlighting the potential open research issues followed by a discussion of the possible future direction of work towards addressing the existing problems.

Download Full-text

Secure Data Deduplication with Reliable Data Deletion in Cloud

International Journal of Foundations of Computer Science ◽

10.1142/s0129054119400124 ◽

2019 ◽

Vol 30 (04) ◽

pp. 551-570 ◽

Cited By ~ 1

Author(s):

Wenjuan Meng ◽

Jianhua Ge ◽

Tao Jiang

Keyword(s):

Cloud Storage ◽

Storage System ◽

Security Analysis ◽

Practical Implementation ◽

Data Deduplication ◽

Storage Server ◽

Security Models ◽

Secure Data ◽

Data Updating ◽

Data Update

A cloud storage system which incorporates the deletion and deduplication functionalities will have both security and efficiency advantages over exiting solutions which provide only one of them. However, the security models of secure data deletion and data deduplication functionalities are not compatible with each other, which will cause security and efficiency vulnerability under coercive adversaries. To solve these security and efficiency challenges, we define and construct a scheme, whose security relies on the proper erasure of keys in the wrapped key tree and periodical update of the deduplication encryption keys. Moreover, we enhance the efficiency of the proposed scheme by introducing incremental data update, where only the changed part is encrypted/decrypted and uploaded/downloaded in data updating. Further security analysis shows that the proposed scheme is secure against coercive attack. Finally, the practical implementation shows that our scheme is performance efficient in computation, storage and communication for both the cloud storage server and users.

Download Full-text

A Secure Data Deduplication Scheme for Cloud Storage System

IJARCCE ◽

10.17148/ijarcce.2017.6460 ◽

2017 ◽

Vol 6 (4) ◽

pp. 316-323

Author(s):

Bhos Komal ◽

Ingale Karuna ◽

Hattikatti Susmita ◽

Jadhav Sachin ◽

Mirajkar SS ◽

...

Keyword(s):

Cloud Storage ◽

Storage System ◽

Data Deduplication ◽

Secure Data

Download Full-text

Secure Content De-Duplication Utilizing Efficient Content Discovery and Preserving De-Duplication (Ecdpd)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1273.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 986-990

Keyword(s):

Access Control ◽

Data Storage ◽

Cloud Storage ◽

Storage System ◽

Data Access ◽

Data File ◽

Data Deduplication ◽

Efficient Technology ◽

Data Access Control ◽

Content Discovery

Cloud computing, an efficient technology that utilizes huge amount of data file storage with security. However, the content owner does not controlling data access for unauthorized clients and does not control data storage and usage of data. Some previous approaches data access control to help data de-duplication concurrently for cloud storage system. Encrypted data for cloud storage is not effectively handled by current industrial de-duplication solutions. The deduplication is unguarded from brute-force attacks and fails in supporting control of data access .An efficient data confining technique that eliminates redundant data’s multiple copies which is commonly used is Data-Deduplication. It reduces the space needed to store these data and thus bandwidth is saved. An efficient content discovery and preserving De-duplication (ECDPD) algorithm that detects client file range and block range of de-duplication in storing data files in the cloud storage system was proposed to overpower the above problems.Data access control is supported by ECDPD actively. Based on Experimental evaluations, proposed ECDPD method reduces 3.802 milliseconds of DUT (Data Uploading Time) and 3.318 milliseconds of DDT (Data Downloading Time) compared than existing approaches

Download Full-text

Research and Implementation of Optimizing CRS Code for Data Recovery in Cloud Storage System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.1915 ◽

2014 ◽

Vol 644-650 ◽

pp. 1915-1918

Author(s):

Shao Min Zhang ◽

Hai Pu Dong ◽

Bao Yi Wang

Keyword(s):

Cloud Storage ◽

Storage System ◽

Matrix Multiplication ◽

Data Recovery ◽

A Algorithm ◽

Binary Matrix ◽

Storage Overhead ◽

Reed Solomon Code ◽

Xor Operation ◽

Massive Information

With development of computer technology, massive information has brought huge challenge on the storage system reliability. A algorithm called HG(Heuristic greedy) algorithm is proposed to optimal calculation path, reduce XOR operation and computational complexity for data recovery, which applies CRS(Cauchy Reed-Solomon) code to cloud storage system HDFS and turns multiply operation of CRS coding to binary matrix multiplication operation.The performance analysis shows that it improves fault tolerance of cloud file system, storage space effectively and timeliness with reduction of additional storage overhead.

Download Full-text

Improving the Efficiency of Deduplication Process by Dedicated Hash Table for each Digital Data Type in Cloud Storage System

Webology ◽

10.14704/web/v18si01/web18060 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 288-301

Author(s):

G. Sujatha ◽

Dr. Jeberson Retna Raj

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Storage System ◽

Hash Table ◽

Cloud Services ◽

Data Type ◽

Digital Data ◽

Storage Space ◽

Data Deduplication ◽

Worst Case

Data storage is one of the significant cloud services available to the cloud users. Since the magnitude of information outsourced grows extremely high, there is a need of implementing data deduplication technique in the cloud storage space for efficient utilization. The cloud storage space supports all kind of digital data like text, audio, video and image. In the hash-based deduplication system, cryptographic hash value should be calculated for all data irrespective of its type and stored in the memory for future reference. Using these hash value only, duplicate copies can be identified. The problem in this existing scenario is size of the hash table. To find a duplicate copy, all the hash values should be checked in the worst case irrespective of its data type. At the same time, all kind of digital data does not suit with same structure of hash table. In this study we proposed an approach to have multiple hash tables for different digital data. By having dedicated hash table for each digital data type will improve the searching time of duplicate data.

Download Full-text

Improving Security on Cloud Based Deduplication System

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2018.7.s1.1813 ◽

2018 ◽

Vol 7 (S1) ◽

pp. 16-19

Author(s):

B. Rasina Begum ◽

P. Chithra

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Storage System ◽

Single Copy ◽

Cloud Services ◽

Data Deduplication ◽

Data Reliability ◽

Configurable Computing ◽

Local Data ◽

Security Issues

Cloud computing provides a scalable platform for large amount of data and processes that work on various applications and services by means of on-demand service. The storage services offered by clouds have become a new profit growth by providing a comparable cheaper, scalable, location-independent platform for managing users’ data. The client uses the cloud storage and enjoys the high end applications and services from a shared group of configurable computing resources using cloud services. It reduces the difficulty of local data storage and maintenance. But it gives severe security issues toward users’ outsourced data. Data Redundancy promotes the data reliability in Cloud Storage. At the same time, it increases storage space, Bandwidth and Security threats due to some server vulnerability. Data Deduplication helps to improve storage utilization. Backup is also less which means less Hardware and Backup media. But it has lots of security issues. Data reliability is a very risky issue in a Deduplication storage system because there is single copy for each file stored in the server which is shared by all the data owners. If such a shared file/chunk was missing, large amount of data becomes unreachable. The main aim of this work is to implement Deduplication System without sacrificing Security in cloud storage. It combines both Deduplication and convergent key cryptography with reduced overhead.

Download Full-text

A Secure Method for Managing Data in Cloud Storage using Deduplication and Enhanced Fuzzy Based Intrusion Detection Framework

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061131 ◽

2020 ◽

Vol 6 (11) ◽

pp. 165-173

Author(s):

Hema S and Dr.Kangaiammal A

Keyword(s):

Intrusion Detection ◽

Cloud Storage ◽

Data Privacy ◽

Detection System ◽

Distributed Storage ◽

Cloud Services ◽

Data Availability ◽

Similar Data ◽

Data Deduplication ◽

Privacy And Security

Cloud services increase data availability so as to offer flawless service to the client. Because of increasing data availability, more redundancies and more memory space are required to store such data. Cloud computing requires essential storage and efficient protection for all types of data. With the amount of data produced seeing an exponential increase with time, storing the replicated data contents is inevitable. Hence, using storage optimization approaches becomes an important pre-requisite for enormous storage domains like cloud storage. Data deduplication is the technique which compresses the data by eliminating the replicated copies of similar data and it is widely utilized in cloud storage to conserve bandwidth and minimize the storage space. Despite the data deduplication eliminates data redundancy and data replication; it likewise presents significant data privacy and security problems for the end-user. Considering this, in this work, a novel security-based deduplication model is proposed to reduce a hash value of a given file size and provide additional security for cloud storage. In proposed method the hash value of a given file is reduced employing Distributed Storage Hash Algorithm (DSHA) and to provide security the file is encrypted by using an Improved Blowfish Encryption Algorithm (IBEA). This framework also proposes the enhanced fuzzy based intrusion detection system (EFIDS) by defining rules for the major attacks, thereby alert the system automatically. Finally the combination of data exclusion and security encryption technique allows cloud users to effectively manage their cloud storage by avoiding repeated data encroachment. It also saves bandwidth and alerts the system from attackers. The results of experiments reveal that the discussed algorithm yields improved throughput and bytes saved per second in comparison with other chunking algorithms.

Download Full-text

Performance Analysis of Structured, Un-Structured, and Cloud Storage Systems

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2019010101 ◽

2019 ◽

Vol 10 (1) ◽

pp. 1-29 ◽

Cited By ~ 2

Author(s):

Anindita Sarkar Mondal ◽

Madhupa Sanyal ◽

Samiran Chattapadhyay ◽

Kartick Chandra Mondal

Keyword(s):

Big Data ◽

Performance Analysis ◽

Data Management ◽

Cloud Storage ◽

Storage Systems ◽

Storage System ◽

Storage Requirement ◽

Comparative Performance ◽

Research Challenge ◽

Unstructured Databases

Big Data management is an interesting research challenge for all storage vendors. Since data can be structured or unstructured, hence variety of storage systems has been designed to meet storage requirement as per organization's demands. The article focuses on different kinds of storage systems, their architecture and implementations. The first portion of the article describes different examples of structured (PostgreSQL) and unstructured databases (MongoDB, OrientDB and Neo4j) along with data models and comparative performance analysis between them. The second portion of the paper focuses on cloud storage systems. As an example of cloud storage, Google Cloud Storage and mainly its implementation details have been discussed. The aim of the article is not to eulogize any particular storage system, but to clearly point out that every storage has a role to play in the industry. It depends on the enterprise to identify the requirements and deploy the storage systems.

Download Full-text

Fast and Efficient Cloud Data Utilization with Deduplication

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.247 ◽

2018 ◽

Vol 6 (6) ◽

pp. 68

Author(s):

Sunil S ◽

A Ananda Shankar

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Storage System ◽

Data Access ◽

Data Deduplication ◽

Cloud Data ◽

Web Log ◽

File Storage ◽

Data Access Control ◽

The Web

Cloud storage system is to provides facilitative file storage and sharing services for distributed clients.The cloud storage preserve the privacy of data holders by proposing a scheme to manage encrypted data storage with deduplication. This process can flexibly support data sharing with deduplication even when the data holder is offline, and it does not intrude the privacy of data holders. It is an effective approach to verify data ownership and check duplicate storage with secure challenge and big data support. We integrate cloud data deduplication with data access control in a simple way, thus reconciling data deduplication and encryption.We prove the security and assess the performance through analysis and simulation. The results show its efficiency, effectiveness and applicability.In this proposed system the upload data will be stored on the cloud based on date.This means that it has to be available to the data holder who need it when they need it. The web log record represents whether the keyword is repeated or not. Records with only repeated search data are retained in primary storage in cloud. All the other records are stored in temporary storage server. This step reduces the size of the web log thereby avoids the burden on the memory and speeds up the analysis.

Download Full-text