Study on Deduplication on Distributed Cloud Environment

Deduplication methods were designed to destroy copy information which bring about capacity of single duplicates of information as it were. Information Deduplication diminishes the circle space needed to store the back-ups in the extra room, tracks and kill the second duplicate of information inside the capacity unit. It permits as it were one case information event to be put away initially and afterward following occasions will be given reference pointer to the first information put away. In a Big information stockpiling climate, immense measure of information should be secure. For this legitimate administration, work, misrepresentation identification, investigation of information protection is an significant theme to be thought of. This paper inspects and assesses the common deduplication procedures and which are introduced in plain structure. In this review, it was seen that the secrecy and security of information has been undermined at numerous levels in common strategies for deduplication. Albeit much exploration is being done in different zones of distributed computing still work relating to this point is inadequate. To get rid of duplicate data which results in storage of single copies of data, data deduplication techniques were used. Data deduplication helps in decreasing storage capacity requirements and eliminates extra copies of same data inside storage unit. Proper management, work, fraud detection, analysis of data privacy are the topics to be considered in a big data storage environment, since, large amount of data needs to be secure. At many levels in general techniques for deduplication it is observed that safety of data and confidentiality has been compromised. Even though more research is being carried out in different areas of cloud computing still work related to this topic is little.

Download Full-text

Characterizing the efficiency of data deduplication for big data storage management

2013 IEEE International Symposium on Workload Characterization (IISWC) ◽

10.1109/iiswc.2013.6704674 ◽

2013 ◽

Cited By ~ 14

Author(s):

Ruijin Zhou ◽

Ming Liu ◽

Tao Li

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Management ◽

Data Deduplication ◽

Big Data Storage

Download Full-text

Cryptographic Hashing Method using for Secure and Similarity Detection in Distributed Cloud Data

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v9.i1.pp107-110 ◽

2018 ◽

Vol 9 (1) ◽

pp. 107

Author(s):

A. Mohamed Divan Masood ◽

S. K. Muthusundar

Keyword(s):

Data Storage ◽

Main Role ◽

Data Deduplication ◽

Cloud Data ◽

Security Challenges ◽

The Face ◽

New Challenges ◽

Cloud Servers ◽

Duplication Detection ◽

Distributed Cloud

<p>The explosive increase of data brings new challenges to the data storage and supervision in cloud settings. These data typically have to be processed in an appropriate fashion in the cloud. Thus, any improved latency may originanimmense loss to the enterprises. Duplication detection plays a very main role in data management. Data deduplication calculates an exclusive fingerprint for each data chunk by using hash algorithms such as MD5 and SHA-1. The designed fingerprint is then comparing against other accessible chunks in a database that dedicates for storing the chunks. As an outcome, Deduplication system improves storage consumption while reducing reliability. Besides, the face of privacy for responsive data also arises while they are outsourced by users to cloud. Aiming to deal with the above security challenges, this paper makes the first effort to honor the notion of distributed dependable Deduplication system. We offer new distributed Deduplication systems with privileged reliability in which the data chunks are distributed across a variety of cloud servers. The protection needs an different of using convergent encryption as in foregoing Deduplication systems.</p>

Download Full-text

Genetic optimized data deduplication for distributed big data storage systems

2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) ◽

10.1109/ispcc.2017.8269581 ◽

2017 ◽

Cited By ~ 1

Author(s):

Naresh Kumar ◽

Shobha Antwal ◽

Ganesh Samarthyam ◽

S.C Jain

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Data Deduplication ◽

Big Data Storage

Download Full-text

Bucket based data deduplication technique for big data storage system

2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) ◽

10.1109/icrito.2016.7784963 ◽

2016 ◽

Cited By ~ 3

Author(s):

Naresh Kumar ◽

Rahul Rawat ◽

S. C. Jain

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Data Deduplication ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Encrypted De-duplication over Distributed Cloud Server

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16124 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 437

Author(s):

R Aditya Balaji ◽

R Pragadeeeshwaran ◽

G K. Sandhia

Keyword(s):

Data Storage ◽

Cloud Service ◽

Storage Space ◽

Data Deduplication ◽

Cloud Server ◽

Security And Reliability ◽

New Challenges ◽

Distributed Cloud ◽

Reduced Data

The most common cloud service is Data Storage. In order to reduce the storage space, deduplication is used. Data deduplication is a process of removing redundant copies of same data. If a file which is already present in the cloud, is uploaded by the same user or different user, then it will not be uploaded again. Therefore storage required is decreased but reliability is also reduced. Data are encrypted and stored in cloud to protect the privacy of users and this introduces new challenges. The proposed system uses M3 algorithm for encryption and Chunking technique for deduplication. The results of the evaluation show that the security and reliability are increased in the proposed scheme.

Download Full-text

Data Deduplication Techniques for Big Data Storage Systems

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9129.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1145-1150 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Optimization Technique ◽

Optimization Techniques ◽

Digital Data ◽

Data Deduplication ◽

Redundant Data ◽

And Performance ◽

Big Data Storage

The enormous growth of digital data, especially the data in unstructured format has brought a tremendous challenge on data analysis as well as the data storage systems which are essentially increasing the cost and performance of the backup systems. The traditional systems do not provide any optimization techniques to keep the duplicated data from being backed up. Deduplication of data has become an essential and financial way of the capacity optimization technique which replaces the redundant data. The following paper reviews the deduplication process, types of deduplication and techniques available for data deduplication. Also, many approaches proposed by various researchers on deduplication in Big data storage systems are studied and compared.

Download Full-text

Efficient Data Deduplication for Big Data Storage Systems

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-13-0224-4_32 ◽

2018 ◽

pp. 351-371 ◽

Cited By ~ 2

Author(s):

Naresh Kumar ◽

Shobha ◽

S. C. Jain

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Data Deduplication ◽

Efficient Data ◽

Big Data Storage

Download Full-text

Differential Evolution based bucket indexed data deduplication for big data storage

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-17593 ◽

2018 ◽

Vol 34 (1) ◽

pp. 491-505 ◽

Cited By ~ 1

Author(s):

Naresh Kumar ◽

Shobha Antwal ◽

S.C. Jain

Keyword(s):

Big Data ◽

Differential Evolution ◽

Data Storage ◽

Data Deduplication ◽

Big Data Storage

Download Full-text

Crowd Sourcing-based Deduplication in Big Data Environment

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8201.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2329-2333

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Space ◽

Crowd Sourcing ◽

Data Deduplication ◽

Fixed Size ◽

Hadoop Distributed File System ◽

Data Environment ◽

Mapreduce Model ◽

Big Data Storage

Frequently, in reality, substances have at least two portrayals in databases. Copy records don't share a typical key as well as they contain mistakes that make copy coordinating a troublesome assignment. Mistakes are presented as the consequence of interpretation blunders, inadequate data, absence of standard configurations, or any mix of these components. In big data storage data is excessively enormous and productively store data is troublesome errand. To take care of this issue Hadoop instrument gives HDFS that oversees data by keep up duplication of data however this expanded duplication. In our anticipated strategy bigdata stream is given to the fixed size chunking calculation to make fixed size chunks. In this manuscript, we introduce an exhaustive investigation of the writing on crowd sourcing based big data deduplication technique. In our strategy is to create the guide diminish result after that MapReduce model is connected to discover whether hash esteems and are copy or not. To be familiar with the copy hash esteems MapReduce model contrasted these hash esteems and as of now put away hash esteems in Big data storage space. On the off chance that these hash esteems are now there in the Big data storage space, at that point these can be distinguished as copy. On the off chance that the hash esteems are copied, at that point don't store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. we additionally spread various deduplication systems in crowd sourcing data's.

Download Full-text

Big Data Storage Concepts

Big Data ◽

10.1002/9781119701859.ch2 ◽

2021 ◽

pp. 31-52

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Storage

Download Full-text