A noble approach to develop dynamically scalable namenode in hadoop distributed file system using secondary storage

Tumpa Rani Shaha; Md. Nasim Akhtar; Fatema Tuj Johora; Md. Zakir Hossain; Mostafijur Rahman; R. B. Ahmad

doi:10.11591/ijeecs.v13.i2.pp729-736

A noble approach to develop dynamically scalable namenode in hadoop distributed file system using secondary storage

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp729-736 ◽

2019 ◽

Vol 13 (2) ◽

pp. 729

Author(s):

Tumpa Rani Shaha ◽

Md. Nasim Akhtar ◽

Fatema Tuj Johora ◽

Md. Zakir Hossain ◽

Mostafijur Rahman ◽

...

Keyword(s):

Data Storage ◽

File System ◽

Secondary Data ◽

Threshold Value ◽

Main Memory ◽

Distributed File System ◽

Access Time ◽

Secondary Memory ◽

Application Data ◽

Time Utilization

For scalable data storage, Hadoop is widely used nowadays. It provides a distributed file system that stores data on the compute nodes. Basically, it represents a master/slave architecture that consists of a NameNode and copious Data Nodes. Data Nodes contain application data and metadata of application data resides in the Main Memory of NameNode. In cached approach, they fragment the metadata depending on the last access time and move the least frequently used data to secondary memory. If the requested data is not found in main memory then the secondary data will be loaded again on the RAM. So when the secondary data reloads to the primary memory then the NameNode main memory limitation arises again. The focus of this research is to reduce the namespace problem of main memory and to make the system dynamically scalable. A new Metadata Fragmentation Algorithm is proposed that separates the metadata list of NameNode dynamically. The NameNode creates Secondary Memory File in perspective of the threshold value and allocates secondary memory location based on the requirement. According to the proposed algorithm the maximum third, out of fourth of main memory is used at the secondary file caching time. The free space aids in faster operation by Dynamically Scalable NameNode approach. This proposed algorithm shows that the space utilization is increased to 17% and time utilization is increased to 0.0005% with the comparison of the existing fragmentation algorithm.For scalable data storage, Hadoop is widely used nowadays. It provides a distributed file system that stores data on the compute nodes. Basically, it represents a master/slave architecture that consists of a NameNode and copious Data Nodes. Data Nodes contain application data and metadata of application data resides in the Main Memory of NameNode. In cached approach, they fragment the metadata depending on the last access time and move the least frequently used data to secondary memory. If the requested data is not found in main memory then the secondary data will be loaded again on the RAM. So when the secondary data reloads to the primary memory then the NameNode main memory limitation arises again. The focus of this research is to reduce the namespace problem of main memory and to make the system dynamically scalable. A new Metadata Fragmentation Algorithm is proposed that separates the metadata list of NameNode dynamically. The NameNode creates Secondary Memory File in perspective of the threshold value and allocates secondary memory location based on the requirement. According to the proposed algorithm the maximum third, out of fourth of main memory is used at the secondary file caching time. The free space aids in faster operation by Dynamically Scalable NameNode approach. This proposed algorithm shows that the space utilization is increased to 17% and time utilization is increased to 0.0005% with the comparison of the existing fragmentation algorithm.For scalable data storage, Hadoop is widely used nowadays. It provides a distributed file system that stores data on the compute nodes. Basically, it represents a master/slave architecture that consists of a NameNode and copious Data Nodes. Data Nodes contain application data and metadata of application data resides in the Main Memory of NameNode. In cached approach, they fragment the metadata depending on the last access time and move the least frequently used data to secondary memory. If the requested data is not found in main memory then the secondary data will be loaded again on the RAM. So when the secondary data reloads to the primary memory then the NameNode main memory limitation arises again. The focus of this research is to reduce the namespace problem of main memory and to make the system dynamically scalable. A new Metadata Fragmentation Algorithm is proposed that separates the metadata list of NameNode dynamically. The NameNode creates Secondary Memory File in perspective of the threshold value and allocates secondary memory location based on the requirement. According to the proposed algorithm the maximum third, out of fourth of main memory is used at the secondary file caching time. The free space aids in faster operation by Dynamically Scalable NameNode approach. This proposed algorithm shows that the space utilization is increased to 17% and time utilization is increased to 0.0005% with the comparison of the existing fragmentation algorithm.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

An Energy-Efficient and Fast Scheme for Hybrid Storage Class Memory in an AIoT Terminal System

Electronics ◽

10.3390/electronics9061013 ◽

2020 ◽

Vol 9 (6) ◽

pp. 1013

Author(s):

Hao Sun ◽

Lan Chen ◽

Xiaoran Hao ◽

Chenji Liu ◽

Mao Ni

Keyword(s):

Energy Consumption ◽

Data Storage ◽

File System ◽

Random Access ◽

Memory System ◽

Main Memory ◽

Fast Mode ◽

Hybrid Storage ◽

Storage Class Memory ◽

The Impact

Conventional main memory can no longer meet the requirements of low energy consumption and massive data storage in an artificial intelligence Internet of Things (AIoT) system. Moreover, the efficiency is decreased due to the swapping of data between the main memory and storage. This paper presents a hybrid storage class memory system to reduce the energy consumption and optimize IO performance. Phase change memory (PCM) brings the advantages of low static power and a large capacity to a hybrid memory system. In order to avoid the impact of poor write performance in PCM, a migration scheme implemented in the memory controller is proposed. By counting the write times and row buffer miss times in PCM simultaneously, the write-intensive data can be selected and migrated from PCM to dynamic random-access memory (DRAM) efficiently, which improves the performance of hybrid storage class memory. In addition, a fast mode with a tmpfs-based, in-memory file system is applied to hybrid storage class memory to reduce the number of data movements between memory and external storage. Experimental results show that the proposed system can reduce energy consumption by 46.2% on average compared with the traditional DRAM-only system. The fast mode increases the IO performance of the system by more than 30 times compared with the common ext3 file system.

Download Full-text

Data Storage Technology and its Development Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1275 ◽

2013 ◽

Vol 756-759 ◽

pp. 1275-1279

Author(s):

Lin Na Huang ◽

Feng Hua Liu

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Cloud Storage ◽

High Performance ◽

File System ◽

Storage System ◽

Distributed File System ◽

Cloud Data ◽

Storage Technology ◽

Cloud Data Storage

Cloud storage of high performance is the basic condition for cloud computing. This article introduces the concept and advantage of cloud storage, discusses the infrastructure of cloud storage system as well as the architecture of cloud data storage, researches the details about the design of Distributed File System within cloud data storage, at the same time, puts forward different developing strategies for the enterprises according to the different roles that the enterprises are acting as during the developing process of cloud computing.

Download Full-text

Research of Cloud Storage Based on Hadoop Distributed File System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2472 ◽

2014 ◽

Vol 513-517 ◽

pp. 2472-2475

Author(s):

Yong Qi Han ◽

Yun Zhang ◽

Shui Yu

Keyword(s):

Data Storage ◽

File System ◽

Multimedia Data ◽

Distributed File System ◽

Cloud Platform ◽

Computing Technology ◽

Training Video ◽

File Storage ◽

Remote Training ◽

Hadoop Distributed File System

This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.

Download Full-text

Design of Space Remote Sensing Data Storage Platform Based on Distributed File System

Proceedings of International Conference on Artificial Life and Robotics ◽

10.5954/icarob.2020.os11-7 ◽

2020 ◽

Vol 25 ◽

pp. 560-563

Author(s):

Di Li ◽

Yizhun Peng ◽

Ruixiang Bai ◽

Zhenjiang Chen ◽

Lianchen Zhao

Keyword(s):

Remote Sensing ◽

Data Storage ◽

File System ◽

Remote Sensing Data ◽

Distributed File System ◽

Sensing Data

Download Full-text

Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500519 ◽

2021 ◽

pp. 2150051

Author(s):

Anisha P Rodrigues ◽

Roshan Fernandes ◽

P. Vijaya ◽

Satish Chander

Keyword(s):

Processing Time ◽

Performance Metrics ◽

File System ◽

Distributed File System ◽

Access Time ◽

Memory Usage ◽

File Access ◽

Sequence File ◽

Hadoop Distributed File System ◽

Small File

Hadoop Distributed File System (HDFS) is developed to efficiently store and handle the vast quantity of files in a distributed environment over a cluster of computers. Various commodity hardware forms the Hadoop cluster, which is inexpensive and easily available. The large number of small files stored in HDFS consumed more memory which lags the performance because small files consumed heavy load on NameNode. Thus, the efficiency of indexing and accessing the small files on HDFS is improved by several techniques, such as archive files, New Hadoop Archive (New HAR), CombineFileInputFormat (CFIF), and Sequence file generation. The archive file combines the small files into single blocks. The new HAR file combines the smaller files into a single large file. The CFIF module merges the multiple files into a single split using NameNode, and the sequence file combines all the small files into a single sequence. The indexing and accessing of a small file in HDFS are evaluated using performance metrics, such as processing time and memory usage. The experiment shows that the sequence file generation approach is efficient when compared to other approaches concerning file access time is 1.5[Formula: see text]s, memory usage is 20 KB in multi-node, and the processing time is 0.1[Formula: see text]s.

Download Full-text

Research on Power Big Data Storage Platform Based on Distributed File System

Advances in Intelligent, Interactive Systems and Applications - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-02804-6_99 ◽

2019 ◽

pp. 760-767

Author(s):

Liu Fei ◽

Pang Hao-Yuan ◽

Zhang Yi-Ying ◽

Liang Kun ◽

He Ye-Shen ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Big Data Storage

Download Full-text

Fast and failure-consistent updates of application data in non-volatile main memory file system

2016 32nd Symposium on Mass Storage Systems and Technologies (MSST) ◽

10.1109/msst.2016.7897078 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jiaxin Ou ◽

Jiwu Shu

Keyword(s):

File System ◽

Main Memory ◽

Application Data

Download Full-text

A Comprehensive Survey for Hadoop Distributed File System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i230260 ◽

2021 ◽

pp. 46-57

Author(s):

Karwan Jameel Merceedi ◽

Nareen Abdulla Sabry

Keyword(s):

Distributed Systems ◽

Data Storage ◽

File System ◽

Low Cost ◽

File Systems ◽

Cost Effective ◽

Distributed File System ◽

Software Frameworks ◽

Hadoop Distributed File System ◽

Basic Ideas

In the last few days, data and the internet have become increasingly growing, occurring in big data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available ample data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count, and distribute each word in a large file and know the number of affecting for each of them. The HDFS is designed to effectively store and transmit colossal data sets to high-bandwidth user applications. The differences between this and other file systems provided are relevant. HDFS is intended for low-cost hardware and is exceptionally tolerant to defects. Thousands of computers in a vast cluster both have directly associated storage functions and user programmers. The resource scales with demand while being cost-effective in all sizes by distributing storage and calculation through numerous servers. Depending on the above characteristics of the HDFS, many researchers worked in this field trying to enhance the performance and efficiency of the addressed file system to be one of the most active cloud systems. This paper offers an adequate study to review the essential investigations as a trend beneficial for researchers wishing to operate in such a system. The basic ideas and features of the investigated experiments were taken into account to have a robust comparison, which simplifies the selection for future researchers in this subject. According to many authors, this paper will explain what Hadoop is and its architectures, how it works, and its performance analysis in a distributed systems. In addition, assessing each Writing and compare with each other.

Download Full-text

A Data Storage and Mining Algorithm Based on Distributed File System

Advances in Intelligent Systems and Computing - International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 ◽

10.1007/978-3-319-98776-7_71 ◽

2018 ◽

pp. 634-638

Author(s):

Jian Nong

Keyword(s):

Data Storage ◽

File System ◽

Distributed File System ◽

Mining Algorithm

Download Full-text