Towards Efficient Big Data Storage With MapReduce Deduplication System

In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.

Download Full-text

Big Data Storage Concepts

Big Data ◽

10.1002/9781119701859.ch2 ◽

2021 ◽

pp. 31-52

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Storage

Download Full-text

Secure big data storage and sharing scheme for cloud tenants

China Communications ◽

10.1109/cc.2015.7122469 ◽

2015 ◽

Vol 12 (6) ◽

pp. 106-115 ◽

Cited By ~ 33

Author(s):

Hongbing Cheng ◽

Chunming Rong ◽

Kai Hwang ◽

Weihong Wang ◽

Yanyan Li

Keyword(s):

Big Data ◽

Data Storage ◽

Sharing Scheme ◽

Big Data Storage

Download Full-text

Algorithm for fuzzy based compression of gray JPEG images for big data storage

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) ◽

10.1109/ic3i.2016.7918019 ◽

2016 ◽

Cited By ~ 2

Author(s):

Navneet Kaur ◽

Navneet Bawa

Keyword(s):

Big Data ◽

Data Storage ◽

Jpeg Images ◽

Big Data Storage

Download Full-text

A Method of Data Integrity Check and Repair in Big Data Storage Platform

Bio-inspired Information and Communication Technologies - Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ◽

10.1007/978-3-030-57115-3_15 ◽

2020 ◽

pp. 183-188

Author(s):

Jiaxin Li ◽

Yun Liu ◽

Zhenjiang Zhang ◽

Han-Chieh Chao

Keyword(s):

Big Data ◽

Data Storage ◽

Data Integrity ◽

Big Data Storage ◽

Integrity Check

Download Full-text

Entity and Relational Queries over Big Data Storage

10.31979/etd.5kh4-nepw ◽

2015 ◽

Author(s):

Nachappa Achakalera Ponnappa

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Storage

Download Full-text

Research on Evaluation Method of Big Data Storage Utilization

2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering (ACIT-CSII-BCD) ◽

10.1109/acit-csii-bcd.2016.077 ◽

2016 ◽

Author(s):

Yang Xiaoshan ◽

Zhu Ligu ◽

Zhang Qicong ◽

Feng Dongyu

Keyword(s):

Big Data ◽

Data Storage ◽

Evaluation Method ◽

Research On Evaluation ◽

Big Data Storage

Download Full-text

Fuzzy-Folded Bloom Filter-as-a-Service for Big Data Storage in the Cloud

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2018.2850053 ◽

2019 ◽

Vol 15 (4) ◽

pp. 2338-2348 ◽

Cited By ~ 6

Author(s):

Amritpal Singh ◽

Sahil Garg ◽

Kuljeet Kaur ◽

Shalini Batra ◽

Neeraj Kumar ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

Bloom Filter ◽

Big Data Storage

Download Full-text

Novel Read Algorithms for Improving the Performance of Big Data Storage Systems

Procedia Computer Science ◽

10.1016/j.procs.2015.04.050 ◽

2015 ◽

Vol 50 ◽

pp. 264-269

Author(s):

Thirumalaisamy Ragunathan ◽

Sudheer Kumar Battula ◽

Rathnamma Gopisetty ◽

B. RangaSwamy ◽

N. Geethanjali

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Big Data Storage

Download Full-text

Research of Big Data Storage System Based on Underground Space Information

10.1145/3491396.3506516 ◽

2021 ◽

Author(s):

Chunxiao Wang ◽

Zhigang Zhao ◽

Jian Zhang ◽

Jidong Huo

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Underground Space ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Storage Infrastructure for Big Data and Cloud

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch005 ◽

2014 ◽

pp. 110-128 ◽

Cited By ~ 3

Author(s):

Anupama C. Raman

Keyword(s):

Big Data ◽

Real Time ◽

Data Storage ◽

Unstructured Data ◽

Storage Area Networks ◽

Storage Technology ◽

Object Based ◽

Storage Area ◽

Storage Technologies ◽

Big Data Storage

Unstructured data is growing exponentially. Present day storage infrastructures like Storage Area Networks and Network Attached Storage are not very suitable for storing huge volumes of unstructured data. This has led to the development of new types of storage technologies like object-based storage. Huge amounts of both structured and unstructured data that needs to be made available in real time for analytical insights is referred to as Big Data. On account of the distinct nature of big data, the storage infrastructures for storing big data should possess some specific features. In this chapter, the authors examine the various storage technology options that are available nowadays and their suitability for storing big data. This chapter also provides a bird's eye view of cloud storage technology, which is used widely for big data storage.

Download Full-text