Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy

Paul Bartus

doi:10.1017/s1743921321000387

Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy

Proceedings of the International Astronomical Union ◽

10.1017/s1743921321000387 ◽

2019 ◽

Vol 15 (S367) ◽

pp. 464-466

Author(s):

Paul Bartus

Keyword(s):

Data Storage ◽

Storage Capacity ◽

File System ◽

Storage Systems ◽

Storage System ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Output Performance ◽

Hadoop Distributed File System

AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

A Comprehensive Survey for Hadoop Distributed File System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i230260 ◽

2021 ◽

pp. 46-57

Author(s):

Karwan Jameel Merceedi ◽

Nareen Abdulla Sabry

Keyword(s):

Distributed Systems ◽

Data Storage ◽

File System ◽

Low Cost ◽

File Systems ◽

Cost Effective ◽

Distributed File System ◽

Software Frameworks ◽

Hadoop Distributed File System ◽

Basic Ideas

In the last few days, data and the internet have become increasingly growing, occurring in big data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available ample data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count, and distribute each word in a large file and know the number of affecting for each of them. The HDFS is designed to effectively store and transmit colossal data sets to high-bandwidth user applications. The differences between this and other file systems provided are relevant. HDFS is intended for low-cost hardware and is exceptionally tolerant to defects. Thousands of computers in a vast cluster both have directly associated storage functions and user programmers. The resource scales with demand while being cost-effective in all sizes by distributing storage and calculation through numerous servers. Depending on the above characteristics of the HDFS, many researchers worked in this field trying to enhance the performance and efficiency of the addressed file system to be one of the most active cloud systems. This paper offers an adequate study to review the essential investigations as a trend beneficial for researchers wishing to operate in such a system. The basic ideas and features of the investigated experiments were taken into account to have a robust comparison, which simplifies the selection for future researchers in this subject. According to many authors, this paper will explain what Hadoop is and its architectures, how it works, and its performance analysis in a distributed systems. In addition, assessing each Writing and compare with each other.

Download Full-text

Requirements for a Forensically Ready Cloud Storage Service

International Journal of Digital Crime and Forensics ◽

10.4018/jdcf.2011070102 ◽

2011 ◽

Vol 3 (3) ◽

pp. 19-36 ◽

Cited By ~ 12

Author(s):

Theodoros Spyridopoulos ◽

Vasilios Katos

Keyword(s):

Cloud Storage ◽

File System ◽

Storage Systems ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Cloud Storage Service ◽

Storage Service ◽

Acquisition Processes

This paper examines the feasibility of developing a forensic acquisition tool in a distributed file system. Using GFS and KFS distributed file systems as vehicles and through representative scenarios and examples, the authors develop forensic acquisition processes and examine both the requirements of the tool and the distributed file system must meet in order to facilitate the acquisition. The authors conclude that cloud storage has features that can be leveraged to perform acquisition (such as redundancy and replication triggers) but also maintains a complexity, which is higher than traditional storage systems leading to a need for forensic-readiness-by-design.

Download Full-text

Data Storage Technology and its Development Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1275 ◽

2013 ◽

Vol 756-759 ◽

pp. 1275-1279

Author(s):

Lin Na Huang ◽

Feng Hua Liu

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Cloud Storage ◽

High Performance ◽

File System ◽

Storage System ◽

Distributed File System ◽

Cloud Data ◽

Storage Technology ◽

Cloud Data Storage

Cloud storage of high performance is the basic condition for cloud computing. This article introduces the concept and advantage of cloud storage, discusses the infrastructure of cloud storage system as well as the architecture of cloud data storage, researches the details about the design of Distributed File System within cloud data storage, at the same time, puts forward different developing strategies for the enterprises according to the different roles that the enterprises are acting as during the developing process of cloud computing.

Download Full-text

Research of Cloud Storage Based on Hadoop Distributed File System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2472 ◽

2014 ◽

Vol 513-517 ◽

pp. 2472-2475

Author(s):

Yong Qi Han ◽

Yun Zhang ◽

Shui Yu

Keyword(s):

Data Storage ◽

File System ◽

Multimedia Data ◽

Distributed File System ◽

Cloud Platform ◽

Computing Technology ◽

Training Video ◽

File Storage ◽

Remote Training ◽

Hadoop Distributed File System

This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.

Download Full-text

Research and Design of a Trusted Distributed File System Based on HDFS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.3282 ◽

2014 ◽

Vol 602-605 ◽

pp. 3282-3284

Author(s):

Fa Gui Liu ◽

Xiao Jie Zhang

Keyword(s):

File System ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Security Mechanism ◽

Advanced Persistent Threat ◽

Research And Design

Distributed file systems such as HDFS are facing the threat of Advanced Persistent Threat, APT. Although security mechanisms such as Kerberos and ACL are implemented in distributed file systems, most of them are not sufficient to solve the threats caused by APT. With the observation into traits of APT, we propose a trusted distributed file system based on HDFS, which guarantees another further security facing APT compared to the current security mechanism.

Download Full-text

An Adaptive Performance Prediction Method of Distributed File System Based on Performance Correlation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.1362 ◽

2014 ◽

Vol 998-999 ◽

pp. 1362-1365

Author(s):

Wei Feng Gao ◽

Tie Zhu Zhao ◽

Ming Bin Lin

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Large Scale ◽

File System ◽

File Systems ◽

Prediction Method ◽

Distributed File System ◽

Distributed File Systems ◽

Continuous Growth ◽

Modeling And Analysis

Distributed file systems are emerging as a key component of large scale cloud storage platform due to the continuous growth of the amount of application data. Performance modeling and analysis is an important concern in the distributed file system area. This paper focuses on the performance prediction and modeling issues. An adaptive prediction model (APModel) is proposed to predict the performance of distributed file systems by capturing the performance correlation of different performance factors. We perform a series of experiments to validate the proposed prediction model. The experiment results indicate our proposed approach can get better prediction accuracy. It is practical and can achieve sufficient performance analysis for distributed file systems.

Download Full-text

Hadoop Setup

Advances in Data Mining and Database Management - Big Data Processing With Hadoop ◽

10.4018/978-1-5225-3790-8.ch004 ◽

2018 ◽

pp. 45-62

Keyword(s):

Distributed Computing ◽

File Systems ◽

Distributed File System ◽

Map Reduce ◽

Distributed File Systems ◽

Single Node ◽

Apache Hadoop ◽

Commodity Hardware ◽

Open Source Framework ◽

Hadoop Distributed File System

Apache Hadoop is an open source framework for storage and processing massive amounts of data. The skeleton of Hadoop can be viewed as distributed computing across a cluster of computers. This chapter deals with the single node, multinode setup of Hadoop environment along with the Hadoop user commands and administration commands. Hadoop processes the data on a cluster of machines with commodity hardware. It has two components, Hadoop Distributed File System for storage and Map Reduce/YARN for processing. Single node processing can be done through standalone or pseudo-distributed mode whereas multinode is through cluster mode. The execution procedure for each environment is briefly stated. Then the chapter explores the Hadoop user commands for operations like copying to and from files in distributed file systems, running jar, creating archive, setting version, classpath, etc. Further, Hadoop administration manages the configuration including functions like cluster balance, running the dfs, MapReduce admin, namenode, secondary namenode, etc.

Download Full-text

The File System Recommendations to Reduce the Space and Time Parameters in Hadoop File Storage and Map Reduce Processing of Big Data Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7579.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 353-356

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Map Reduce ◽

Space And Time ◽

File Storage ◽

Hadoop Distributed File System ◽

Hadoop Framework

The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).

Download Full-text

File Systems and Hadoop Distributed File System in Big Data

IJARCCE ◽

10.17148/ijarcce.2016.51207 ◽

2016 ◽

Vol 5 (12) ◽

pp. 36-40 ◽

Cited By ~ 1

Author(s):

G Fayaz Hussain ◽

Tarakeswar T

Keyword(s):

Big Data ◽

File System ◽

File Systems ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text