distributed files
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 3)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
Basireddy Ithihas Reddy

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.


2020 ◽  
Vol 8 (6) ◽  
pp. 4034-4038

Today, the majority of industries used Hadoop for processing their data. Hadoop is an open-source and programming based framework that has many components. One of them is HDFS (Hadoop Distributed Files System) that is used to stored data. Hadoop by default does not have any security mechanism. According to the previous study authentication, authorization, and Data encryption are the principal techniques to enhance the security in HDFS. As huge volume of data is stored in HDFS, encryption of massive data will consume more time and need more resources for operations. In this paper we have developed one DNA based that used confusion and Diffusion for securing data in HDFS. This proposed algorithm is efficient as compared to other encryption algorithm.


Author(s):  
Dr. Bhalaji N. ◽  
Shanmuga Skandh Vinayak E

Ever since the concept of parallel processing and remote computation became feasible, Cloud computing is at its highest peak in its popularity. Although cloud computing is effective and feasible in its usage, using the cloud for frequent operations may not be the be the most optimal solution. Hence the concept of FOG proves to be more optimal and efficient. In this paper, we propose a solution by improving the FOG computing concept of decentralization by implementing a secure distributed files system utilizing the IPFS and the Ethereum Blockchain technology. Our proposed system has proved to be efficient by successfully distributing the data in a Raspberry Pi network. The outcome of this work will assist FOG architects in implementing this system in their infrastructure and also prove to be effective for IoT developers in implementing a Raspberry Pi decentralized network while providing more security to the data.


Author(s):  
Jorji Nonaka ◽  
Eduardo C. Inacio ◽  
Kenji Ono ◽  
Mario A. R. Dantas ◽  
Yasuhiro Kawashima ◽  
...  

Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.


2013 ◽  
Vol 756-759 ◽  
pp. 820-823
Author(s):  
Jun Xiong Sun ◽  
Yan Chen ◽  
Tao Ying Li ◽  
Ren Yuan Wang ◽  
Peng Hui Li

There are two main problems to store the system data on single machine: limited storage space and low reliability. The concept of distribution solves the two problems fundamentally. Many independent machines are integrated as a whole. As a result, these separated resources are integrated together. This paper focuses on developing a system, based on SSH, XFire and Hadoop, to help users store and manage the distributed files. All the files stored in HDFS should be encrypted to protect users privacy. In order to save resources, system is designed to avoid uploading the duplicate files by checking the files MD5 string.


Sign in / Sign up

Export Citation Format

Share Document