XtreemFS

Author(s):  
Jan Stender ◽  
Michael Berlin ◽  
Alexander Reinefeld

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.

2013 ◽  
pp. 294-321
Author(s):  
Alexandru Costan

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.


2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.


2014 ◽  
Vol 573 ◽  
pp. 556-559
Author(s):  
A. Shenbaga Bharatha Priya ◽  
J. Ganesh ◽  
Mareeswari M. Devi

Infrastructure-As-A-Service (IAAS) provides an environmental setup under any type of cloud. In Distributed file system (DFS), nodes are simultaneously serve computing and storage functions; that is parallel Data Processing and storage in cloud. Here, file is considered as a data or load. That file is partitioned into a number of File chunks (FC) allocated in distinct nodes so that Map Reduce tasks can be performed in parallel over the nodes. Files and Nodes can be dynamically created, deleted, and added. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the Chunk Servers (CS). Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation or Distributed node to maintain global knowledge of all chunks. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, it may thus become the performance bottleneck and the single point of failure and memory wastage in distributed nodes. So, we have to enhance the Client side module with server side module to create, delete and update the file chunks in Client Module. And manage the overall private cloud and apply dynamic load balancing algorithm to perform auto scaling options in private cloud. In this project, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem.


2018 ◽  
Vol 210 ◽  
pp. 04042
Author(s):  
Ammar Alhaj Ali ◽  
Pavel Varacha ◽  
Said Krayem ◽  
Roman Jasek ◽  
Petr Zacek ◽  
...  

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.


2011 ◽  
Vol 1346 ◽  
Author(s):  
Hayri E. Akin ◽  
Dundar Karabay ◽  
Allen P. Mills ◽  
Cengiz S. Ozkan ◽  
Mihrimah Ozkan

ABSTRACTDNA Computing is a rapidly-developing interdisciplinary area which could benefit from more experimental results to solve problems with the current biological tools. In this study, we have integrated microelectronics and molecular biology techniques for showing the feasibility of Hopfield Neural Network using DNA molecules. Adleman’s seminal paper in 1994 showed that DNA strands using specific molecular reactions can be used to solve the Hamiltonian Path Problem. This accomplishment opened the way for possibilities of massively parallel processing power, remarkable energy efficiency and compact data storage ability with DNA. However, in various studies, small departures from the ideal selectivity of DNA hybridization lead to significant undesired pairings of strands and that leads to difficulties in schemes for implementing large Boolean functions using DNA. Therefore, these error prone reactions in the Boolean architecture of the first DNA computers will benefit from fault tolerance or error correction methods and these methods would be essential for large scale applications. In this study, we demonstrate the operation of six dimensional Hopfield associative memory storing various memories as an archetype fault tolerant neural network implemented using DNA molecular reactions. The response of the network suggests that the protocols could be scaled to a network of significantly larger dimensions. In addition the results are read on a Silicon CMOS platform exploiting the semiconductor processing knowledge for fast and accurate hybridization rates.


Author(s):  
Eduardo Inacio ◽  
Mario Antonio Dantas

To meet ever increasing capacity and performance requirements of emerging data-intensive applications, highly distributed and multilayered back-end storage systems have been employed in large-scale high performance computing (HPC) environments. A main component of these storage infrastructures is the parallel file system (PFS), a especially designed file system for absorbing bulk data transfers from applications with thousands of concurrent processes. Load distribution on PFS data servers compose a major source of intra-application input/output (I/O) performance variability. Albeit mitigating variability is desirable, as it is known to harm application-perceived performance, understanding and dealing with I/O performance variability in such complex environments remains a challenging task. In this research, a differentiated approach for evaluating and mitigating intra-application I/O performance variability over PFSs is proposed. More specifically, from the evaluation perspective, a comprehensive approach combining complementary methods is proposed. An analytical model proposal, named DTSMaxLoad, provides estimates for the maximum load in a PFS data server. To complement DTSMaxLoad, modeling conditions and mechanisms hard to represent analytically, the Parallel I/O and Storage System (PIOSS) simulation model was proposed. Finally, for experimental evaluation over real environments, a flexible and distributed I/O performance evaluation tool, coined as IOR-Extended (IORE), was proposed. Furthermore, a high-level file distribution approach for PFSs, called N-N Round-Robin (N2R2), was proposed focusing on mitigating I/O performance variability for distributed applications where each process accesses an individual and independent file. An extensive experimental effort, including measurements on real environments, was conducted in this research work for evaluating each of the proposed approaches. In summary, this evaluation indicated both DTSMaxLoad and PIOSS modeling proposals can represent load distribution behavior on PFSs with significant fidelity. Moreover, results demonstrated N2R2 successfully reduced intra-application I/O performance variability for 270 distinct experimental scenarios, which, ultimately, translated into overall application I/O performance Improvements.


Author(s):  
Nikola Davidović ◽  
Slobodan Obradović ◽  
Borislav Đorđević ◽  
Valentina Timčenko ◽  
Bojan Škorić

The rapid technological progress has led to a growing need for more data storage space. The appearance of big data requires larger storage space, faster access and exchange of data as well as data security. RAID (Redundant Array of Independent Disks) technology is one of the most cost-effective ways to satisfy needs for larger storage space, data access and protection. However, the connection of multiple secondary memory devices in RAID 0 aims to improve the secondary memory system in a way to provide greater storage capacity, increase both read data speed and write data speed but it is not fault-tolerant or error-free. This paper provides an analysis of the system for storing the data on the paired arrays of magnetic disks in a RAID 0 formation, with different number of queue entries for overlapped I/O, where queue depth parameter has the value of 1 and 4. The paper presents a range of test results and analysis for RAID 0 series for defined workload characteristics. The tests were carried on in Microsoft Windows Server 2008 R2 Standard operating system, using 2, 3, 4 and 6 paired magnetic disks and controlled by Dell PERC 6/i hardware RAID controller. For the needs of obtaining the measurement results, ATTO Disk Benchmark has been used. The obtained results have been analyzed and compared to the expected behavior.


2021 ◽  
Vol 37 ◽  
pp. 01011
Author(s):  
D Devi ◽  
G Sai Rohith ◽  
S Shri Hari ◽  
K Sri Ramachandar

Data tampering and fraud in land records have increased drastically in the modern world. A data storage model using Blockchain and Interplanetary File System (IPFS) is proposed in this work. Land records and the farmer’s information are stored inside the Interplanetary file system. To avoid data faking, the hash address of the respective data generated by IPFS is stored in the blockchain. This proposed system when deployed on a large scale can outperform the existing methods of securing user data. One of the latest technological advancements in the software industry is the innovation of Blockchain Technology. This new technology has opened up a new business relationship platform that delivers feasibility, protection, and cheap rates. It provides a new foundation of trust for transactions that can facilitate a very streamlined workflow and a faster economy.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Qingfeng Zhuge ◽  
Hao Zhang ◽  
Edwin Hsing-Mean Sha ◽  
Rui Xu ◽  
Jun Liu ◽  
...  

Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities towards solving the problem of remote data access. A general understanding about NVDIMMs, such as Intel Optane DC Persistent Memory (DCPM), is that they expand main memory capacity with a cost of multiple times lower performance than DRAM. With an in-depth exploration presented in this paper, however, we show an interesting finding that the potential of NVDIMMs for high-performance, remote in-memory accesses can be revealed through careful design. We explore multiple architectural structures for accessing remote NVDIMMs in a real system using Optane DCPM, and compare the performance of various structures. Experiments are conducted to show significant performance gaps among different ways of using NVDIMMs as memory address space accessible through RDMA interface. Furthermore, we design and implement a prototype of user-level, in-memory file system, RIMFS, in the device DAX mode on Optane DCPM. By comparing against the DAX-supported Linux file system, Ext4-DAX, we show that the performance of remote reads on RIMFS over RDMA is 11.44 higher than that on a remote Ext4-DAX on average. The experimental results also show that the performance of remote accesses on RIMFS is maintained on a heavily loaded data server with CPU utilization as high as 90%, while the performance of remote reads on Ext4-DAX is significantly reduced by 49.3%, and the performance of local reads on Ext4-DAX is even more significantly reduced by 90.1%. The performance comparisons of writes exhibit the same trends.


2012 ◽  
Vol 241-244 ◽  
pp. 1556-1561
Author(s):  
Qi Meng Wu ◽  
Ke Xie ◽  
Ming Fa Zhu ◽  
Li Min Xiao ◽  
Li Ruan

Parallel file systems deploy multiple metadata servers to distribute heavy metadata workload from clients. With the increasing number of metadata servers, metadata-intensive operations are facing some problems related with collaboration among them, compromising the performance gain. Consequently, a file system simulator is very helpful to try out some optimization ideas to solve these problems. In this paper, we propose DMFSsim to simulate the metadata-intensive operations on large-scale distributed metadata file systems. DMFSsim can flexibly replay traces of multiple metadata operations, support several commonly used metadata distribution algorithms, simulate file system tree hierarchy and underlying disk blocks management mechanism in real systems. Extensive simulations show that DMFSsim is capable of demonstrating the performance of metadata-intensive operations in distributed metadata file system.


Sign in / Sign up

Export Citation Format

Share Document