scholarly journals Study on Deduplication on Distributed Cloud Environment

Author(s):  
Pradeep Nayak ◽  
Poornachandra S ◽  
Pawan J Acharya ◽  
Shravya ◽  
Shravani

Deduplication methods were designed to destroy copy information which bring about capacity of single duplicates of information as it were. Information Deduplication diminishes the circle space needed to store the back-ups in the extra room, tracks and kill the second duplicate of information inside the capacity unit. It permits as it were one case information event to be put away initially and afterward following occasions will be given reference pointer to the first information put away. In a Big information stockpiling climate, immense measure of information should be secure. For this legitimate administration, work, misrepresentation identification, investigation of information protection is an significant theme to be thought of. This paper inspects and assesses the common deduplication procedures and which are introduced in plain structure. In this review, it was seen that the secrecy and security of information has been undermined at numerous levels in common strategies for deduplication. Albeit much exploration is being done in different zones of distributed computing still work relating to this point is inadequate. To get rid of duplicate data which results in storage of single copies of data, data deduplication techniques were used. Data deduplication helps in decreasing storage capacity requirements and eliminates extra copies of same data inside storage unit. Proper management, work, fraud detection, analysis of data privacy are the topics to be considered in a big data storage environment, since, large amount of data needs to be secure. At many levels in general techniques for deduplication it is observed that safety of data and confidentiality has been compromised. Even though more research is being carried out in different areas of cloud computing still work related to this topic is little.

Author(s):  
A. Mohamed Divan Masood ◽  
S. K. Muthusundar

<p>The explosive increase of data brings new challenges to the data storage and supervision in cloud settings. These data typically have to be processed in an appropriate fashion in the cloud. Thus, any improved latency may originanimmense loss to the enterprises. Duplication detection plays a very main role in data management. Data deduplication calculates an exclusive fingerprint for each data chunk by using hash algorithms such as MD5 and SHA-1. The designed fingerprint is then comparing against other accessible chunks in a database that dedicates for storing the chunks. As an outcome, Deduplication system improves storage consumption while reducing reliability. Besides, the face of privacy for responsive data also arises while they are outsourced by users to cloud. Aiming to deal with the above security challenges, this paper makes the first effort to honor the notion of distributed dependable Deduplication system. We offer new distributed Deduplication systems with privileged reliability in which the data chunks are distributed across a variety of cloud servers. The protection needs an different of using convergent encryption as in foregoing Deduplication systems.</p>


2018 ◽  
Vol 7 (3.12) ◽  
pp. 437
Author(s):  
R Aditya Balaji ◽  
R Pragadeeeshwaran ◽  
G K. Sandhia

The most common cloud service is Data Storage. In order to reduce the storage space, deduplication is used. Data deduplication is a process of removing redundant copies of same data. If a file which is already present in the cloud, is uploaded by the same user or different user, then it will not be uploaded again. Therefore storage required is decreased but reliability is also reduced. Data are encrypted and stored in cloud to protect the privacy of users and this introduces new challenges. The proposed system uses M3 algorithm for encryption and Chunking technique for deduplication. The results of the evaluation show that the security and reliability are increased in the proposed scheme.  


The enormous growth of digital data, especially the data in unstructured format has brought a tremendous challenge on data analysis as well as the data storage systems which are essentially increasing the cost and performance of the backup systems. The traditional systems do not provide any optimization techniques to keep the duplicated data from being backed up. Deduplication of data has become an essential and financial way of the capacity optimization technique which replaces the redundant data. The following paper reviews the deduplication process, types of deduplication and techniques available for data deduplication. Also, many approaches proposed by various researchers on deduplication in Big data storage systems are studied and compared.


2019 ◽  
Vol 8 (4) ◽  
pp. 2329-2333

Frequently, in reality, substances have at least two portrayals in databases. Copy records don't share a typical key as well as they contain mistakes that make copy coordinating a troublesome assignment. Mistakes are presented as the consequence of interpretation blunders, inadequate data, absence of standard configurations, or any mix of these components. In big data storage data is excessively enormous and productively store data is troublesome errand. To take care of this issue Hadoop instrument gives HDFS that oversees data by keep up duplication of data however this expanded duplication. In our anticipated strategy bigdata stream is given to the fixed size chunking calculation to make fixed size chunks. In this manuscript, we introduce an exhaustive investigation of the writing on crowd sourcing based big data deduplication technique. In our strategy is to create the guide diminish result after that MapReduce model is connected to discover whether hash esteems and are copy or not. To be familiar with the copy hash esteems MapReduce model contrasted these hash esteems and as of now put away hash esteems in Big data storage space. On the off chance that these hash esteems are now there in the Big data storage space, at that point these can be distinguished as copy. On the off chance that the hash esteems are copied, at that point don't store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. we additionally spread various deduplication systems in crowd sourcing data's.


Sign in / Sign up

Export Citation Format

Share Document