scholarly journals Analysis and Experimental Study of HDFS Performance

TEM Journal ◽  
2021 ◽  
pp. 806-814
Author(s):  
Yordan Kalmukov ◽  
Milko Marinov ◽  
Tsvetelina Mladenova ◽  
Irena Valova

In the age of big data, the amount of data that people generate and use on a daily basis has far exceeded the storage and processing capabilities of a single computer system. That motivates the use of distributed big data storage and processing system such as Hadoop. It provides a reliable, horizontallyscalable, fault-tolerant and efficient service, based on the Hadoop Distributed File System (HDFS) and MapReduce. The purpose of this research is to experimentally determine whether (and to what extent) the network communication speed, the file replication factor, the files’ sizes and their number, and the location of the HDFS client influence the performance of the HDFS read/write operations.

Apache Hadoop is an free open source Java framework under Apache Software Foundation. It provides storage of large amount of data efficiently with low costing. Hadoop has two main core components one is HDFS (Hadoop Distributed File System) and second Map Reduce. It is basically a file system and has capability of high fault-tolerant and while deploying supports less cost hardware. It. provides the high speed admittance to the relevance data. The Hadoop architecture is based on cluster, which consist of two nodes named as Data -Node and Name-Node which perform the internal activity known as heart beat to process data storage on distributed file system and Map reducing is performed internally to show the clustering of distributed data on localhost of ssh serverwebsite. Large quantity of data is needed to store in distributed file structure, for this Hadoop has played important role. Maintaining the large volume storage, making data duplicity for providing security and recovery of big data for its analysis and prediction.


2017 ◽  
Vol 8 (4) ◽  
pp. 31-44 ◽  
Author(s):  
Houcine Matallah ◽  
Ghalem Belalem ◽  
Karim Bouamrane

The technological revolution integrating multiple information sources and extension of computer science in different sectors led to the explosion of the data quantities, which reflects the scaling of vo-lumes, numbers and types. These massive increases have resulted in the development of new location techniques and access to data. The final steps in this evolution have emerged new technologies: Cloud and Big Data. The reference implementation of the Clouds and Big Data storage is incontestably the Hadoop Distributed File System (HDFS). This latter is based on the separation of metadata to data that consists in the centralization and isolation of the metadata of storage servers. In this paper, the authors propose an approach to improve the service metadata for Hadoop to maintain consistency without much compromising performance and scalability of metadata by suggesting a mixed solution between centralization and distribution of metadata to enhance the performance and scalability of the model.


The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).


2019 ◽  
Vol 8 (4) ◽  
pp. 2329-2333

Frequently, in reality, substances have at least two portrayals in databases. Copy records don't share a typical key as well as they contain mistakes that make copy coordinating a troublesome assignment. Mistakes are presented as the consequence of interpretation blunders, inadequate data, absence of standard configurations, or any mix of these components. In big data storage data is excessively enormous and productively store data is troublesome errand. To take care of this issue Hadoop instrument gives HDFS that oversees data by keep up duplication of data however this expanded duplication. In our anticipated strategy bigdata stream is given to the fixed size chunking calculation to make fixed size chunks. In this manuscript, we introduce an exhaustive investigation of the writing on crowd sourcing based big data deduplication technique. In our strategy is to create the guide diminish result after that MapReduce model is connected to discover whether hash esteems and are copy or not. To be familiar with the copy hash esteems MapReduce model contrasted these hash esteems and as of now put away hash esteems in Big data storage space. On the off chance that these hash esteems are now there in the Big data storage space, at that point these can be distinguished as copy. On the off chance that the hash esteems are copied, at that point don't store the data into the Hadoop Distributed File System (HDFS) else then store the data into the HDFS. we additionally spread various deduplication systems in crowd sourcing data's.


2015 ◽  
Vol 12 (6) ◽  
pp. 106-115 ◽  
Author(s):  
Hongbing Cheng ◽  
Chunming Rong ◽  
Kai Hwang ◽  
Weihong Wang ◽  
Yanyan Li

Sign in / Sign up

Export Citation Format

Share Document