Towards Efficient Big Data Storage With MapReduce Deduplication System

Author(s):  
Vijesh Joe ◽  
Jennifer S. Raj ◽  
Smys S.

In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.

2015 ◽  
Vol 12 (6) ◽  
pp. 106-115 ◽  
Author(s):  
Hongbing Cheng ◽  
Chunming Rong ◽  
Kai Hwang ◽  
Weihong Wang ◽  
Yanyan Li

2019 ◽  
Vol 15 (4) ◽  
pp. 2338-2348 ◽  
Author(s):  
Amritpal Singh ◽  
Sahil Garg ◽  
Kuljeet Kaur ◽  
Shalini Batra ◽  
Neeraj Kumar ◽  
...  

2015 ◽  
Vol 50 ◽  
pp. 264-269
Author(s):  
Thirumalaisamy Ragunathan ◽  
Sudheer Kumar Battula ◽  
Rathnamma Gopisetty ◽  
B. RangaSwamy ◽  
N. Geethanjali

Author(s):  
Anupama C. Raman

Unstructured data is growing exponentially. Present day storage infrastructures like Storage Area Networks and Network Attached Storage are not very suitable for storing huge volumes of unstructured data. This has led to the development of new types of storage technologies like object-based storage. Huge amounts of both structured and unstructured data that needs to be made available in real time for analytical insights is referred to as Big Data. On account of the distinct nature of big data, the storage infrastructures for storing big data should possess some specific features. In this chapter, the authors examine the various storage technology options that are available nowadays and their suitability for storing big data. This chapter also provides a bird's eye view of cloud storage technology, which is used widely for big data storage.


Sign in / Sign up

Export Citation Format

Share Document