distributed files Latest Research Papers

2021 ◽

Vol 9 (VIII) ◽

pp. 20-26

Author(s):

Basireddy Ithihas Reddy

Keyword(s):

Comparative Analysis ◽

File Systems ◽

Distributed File Systems ◽

Input Output ◽

Primary Input ◽

Multiple Systems ◽

Execution Engine ◽

Distributed Execution ◽

Testing Algorithms ◽

Distributed Files

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.

Download Full-text

A Novel Encryption Scheme for Securing Data in HDFS Environment Inspired By DNA

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9379.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4034-4038

Keyword(s):

Open Source ◽

Data Encryption ◽

Encryption Algorithm ◽

Massive Data ◽

Encryption Scheme ◽

Security Mechanism ◽

Distributed Files ◽

And Diffusion ◽

Confusion And Diffusion

Today, the majority of industries used Hadoop for processing their data. Hadoop is an open-source and programming based framework that has many components. One of them is HDFS (Hadoop Distributed Files System) that is used to stored data. Hadoop by default does not have any security mechanism. According to the previous study authentication, authorization, and Data encryption are the principal techniques to enhance the security in HDFS. As huge volume of data is stored in HDFS, encryption of massive data will consume more time and need more resources for operations. In this paper we have developed one DNA based that used confusion and Diffusion for securing data in HDFS. This proposed algorithm is efficient as compared to other encryption algorithm.

Download Full-text

Fog Computıng – A Rasperry Pı Decentralızed Network

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2020.1.003 ◽

2020 ◽

Vol 02 (01) ◽

pp. 27-42

Author(s):

Dr. Bhalaji N. ◽

Shanmuga Skandh Vinayak E

Keyword(s):

Cloud Computing ◽

Parallel Processing ◽

Fog Computing ◽

Optimal Solution ◽

Raspberry Pi ◽

Blockchain Technology ◽

Remote Computation ◽

Distributed Files

Ever since the concept of parallel processing and remote computation became feasible, Cloud computing is at its highest peak in its popularity. Although cloud computing is effective and feasible in its usage, using the cloud for frequent operations may not be the be the most optimal solution. Hence the concept of FOG proves to be more optimal and efficient. In this paper, we propose a solution by improving the FOG computing concept of decentralization by implementing a secure distributed files system utilizing the IPFS and the Ethereum Blockchain technology. Our proposed system has proved to be efficient by successfully distributing the data in a Raspberry Pi network. The outcome of this work will assist FOG architects in implementing this system in their infrastructure and also prove to be effective for IoT developers in implementing a Raspberry Pi decentralized network while providing more security to the data.

Download Full-text

Data I/O management approach for the post-hoc visualization of big simulation data results

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962318400068 ◽

2018 ◽

Vol 09 (03) ◽

pp. 1840006 ◽

Cited By ~ 1

Author(s):

Jorji Nonaka ◽

Eduardo C. Inacio ◽

Kenji Ono ◽

Mario A. R. Dantas ◽

Yasuhiro Kawashima ◽

...

Keyword(s):

Large Scale ◽

Computational Cost ◽

Leading Edge ◽

Climate Simulation ◽

Management Approach ◽

Data Generation ◽

Simulation Results ◽

Data Loading ◽

Distributed Files ◽

Post Hoc

Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.

Download Full-text

MRFS: A Distributed Files System with Geo-replicated Metadata

Algorithms and Architectures for Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11194-0_21 ◽

2014 ◽

pp. 273-285 ◽

Cited By ~ 1

Author(s):

Jiongyu Yu ◽

Weigang Wu ◽

Di Yang ◽

Ning Huang

Keyword(s):

Distributed Files

Download Full-text

Distributed File Information Management System Based on Hadoop

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.820 ◽

2013 ◽

Vol 756-759 ◽

pp. 820-823

Author(s):

Jun Xiong Sun ◽

Yan Chen ◽

Tao Ying Li ◽

Ren Yuan Wang ◽

Peng Hui Li

Keyword(s):

Information Management ◽

Single Machine ◽

Management System ◽

Information Management System ◽

Storage Space ◽

System Data ◽

Distributed Files

There are two main problems to store the system data on single machine: limited storage space and low reliability. The concept of distribution solves the two problems fundamentally. Many independent machines are integrated as a whole. As a result, these separated resources are integrated together. This paper focuses on developing a system, based on SSH, XFire and Hadoop, to help users store and manage the distributed files. All the files stored in HDFS should be encrypted to protect users privacy. In order to save resources, system is designed to avoid uploading the duplicate files by checking the files MD5 string.

Download Full-text