Analysis and Experimental Study of HDFS Performance

In the age of big data, the amount of data that people generate and use on a daily basis has far exceeded the storage and processing capabilities of a single computer system. That motivates the use of distributed big data storage and processing system such as Hadoop. It provides a reliable, horizontallyscalable, fault-tolerant and efficient service, based on the Hadoop Distributed File System (HDFS) and MapReduce. The purpose of this research is to experimentally determine whether (and to what extent) the network communication speed, the file replication factor, the files’ sizes and their number, and the location of the HDFS client influence the performance of the HDFS read/write operations.

Download Full-text

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop

2014 International Conference on Intelligent Computing Applications ◽

10.1109/icica.2014.16 ◽

2014 ◽

Cited By ~ 11

Author(s):

E. Sivaraman ◽

R. Manickachezian

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Fault Tolerant ◽

Distributed File System ◽

Big Data Storage

Download Full-text

A Distribution of Nodes in Big Data using Hadoop Open Source System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8459.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 106-110

Keyword(s):

Big Data ◽

Open Source ◽

Data Storage ◽

High Speed ◽

File System ◽

Fault Tolerant ◽

Heart Beat ◽

Distributed File System ◽

Process Data ◽

Hadoop Distributed File System

Apache Hadoop is an free open source Java framework under Apache Software Foundation. It provides storage of large amount of data efficiently with low costing. Hadoop has two main core components one is HDFS (Hadoop Distributed File System) and second Map Reduce. It is basically a file system and has capability of high fault-tolerant and while deploying supports less cost hardware. It. provides the high speed admittance to the relevance data. The Hadoop architecture is based on cluster, which consist of two nodes named as Data -Node and Name-Node which perform the internal activity known as heart beat to process data storage on distributed file system and Map reducing is performed internally to show the clustering of distributed data on localhost of ssh serverwebsite. Large quantity of data is needed to store in distributed file structure, for this Hadoop has played important role. Maintaining the large volume storage, making data duplicity for providing security and recovery of big data for its analysis and prediction.

Download Full-text

Research on Power Big Data Storage Platform Based on Distributed File System

Advances in Intelligent, Interactive Systems and Applications - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-02804-6_99 ◽

2019 ◽

pp. 760-767

Author(s):

Liu Fei ◽

Pang Hao-Yuan ◽

Zhang Yi-Ying ◽

Liang Kun ◽

He Ye-Shen ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Big Data Storage

Download Full-text

Towards a New Model of Storage and Access to Data in Big Data and Cloud Computing

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2017100103 ◽

2017 ◽

Vol 8 (4) ◽

pp. 31-44 ◽

Cited By ~ 24

Author(s):

Houcine Matallah ◽

Ghalem Belalem ◽

Karim Bouamrane

Keyword(s):

Big Data ◽

Data Storage ◽

New Technologies ◽

Mixed Solution ◽

Storage Servers ◽

Hadoop Distributed File System ◽

Reference Implementation ◽

Access To Data ◽

Big Data Storage ◽

New Location

The technological revolution integrating multiple information sources and extension of computer science in different sectors led to the explosion of the data quantities, which reflects the scaling of vo-lumes, numbers and types. These massive increases have resulted in the development of new location techniques and access to data. The final steps in this evolution have emerged new technologies: Cloud and Big Data. The reference implementation of the Clouds and Big Data storage is incontestably the Hadoop Distributed File System (HDFS). This latter is based on the separation of metadata to data that consists in the centralization and isolation of the metadata of storage servers. In this paper, the authors propose an approach to improve the service metadata for Hadoop to maintain consistency without much compromising performance and scalability of metadata by suggesting a mixed solution between centralization and distribution of metadata to enhance the performance and scalability of the model.

Download Full-text

The File System Recommendations to Reduce the Space and Time Parameters in Hadoop File Storage and Map Reduce Processing of Big Data Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7579.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 353-356

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Map Reduce ◽

Space And Time ◽

File Storage ◽

Hadoop Distributed File System ◽

Hadoop Framework

The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).

Download Full-text

Crowd Sourcing-based Deduplication in Big Data Environment

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8201.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2329-2333

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Space ◽

Crowd Sourcing ◽

Data Deduplication ◽

Fixed Size ◽

Hadoop Distributed File System ◽

Data Environment ◽

Mapreduce Model ◽

Big Data Storage

Frequently, in reality, substances have at least two portrayals in databases. Copy records don't share a typical key as well as they contain mistakes that make copy coordinating a troublesome assignment. Mistakes are presented as the consequence of interpretation blunders, inadequate data, absence of standard configurations, or any mix of these components. In big data storage data is excessively enormous and productively store data is troublesome errand. To take care of this issue Hadoop instrument gives HDFS that oversees data by keep up duplication of data however this expanded duplication. In our anticipated strategy bigdata stream is given to the fixed size chunking calculation to make fixed size chunks. In this manuscript, we introduce an exhaustive investigation of the writing on crowd sourcing based big data deduplication technique. In our strategy is to create the guide diminish result after that MapReduce model is connected to discover whether hash esteems and are copy or not. To be familiar with the copy hash esteems MapReduce model contrasted these hash esteems and as of now put away hash esteems in Big data storage space. On the off chance that these hash esteems are now there in the Big data storage space, at that point these can be distinguished as copy. On the off chance that the hash esteems are copied, at that point don't store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. we additionally spread various deduplication systems in crowd sourcing data's.

Download Full-text