Modeling of distributed file System in big data storage by event- B

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

Tools for the Storage and Analysis of Spatial Big Data

Proccedings of 10th International Conference "Environmental Engineering" ◽

10.3846/enviro.2017.216 ◽

2017 ◽

Author(s):

Przemysław Lisowski ◽

Adam Piórkowski ◽

Andrzej Lesniak

Keyword(s):

Big Data ◽

Spatial Data ◽

File Systems ◽

Large Datasets ◽

Distributed File Systems ◽

Data Systems ◽

Data Production ◽

Spatial Big Data ◽

Big Data Systems ◽

Access To Data

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.

Download Full-text

Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy

Proceedings of the International Astronomical Union ◽

10.1017/s1743921321000387 ◽

2019 ◽

Vol 15 (S367) ◽

pp. 464-466

Author(s):

Paul Bartus

Keyword(s):

Data Storage ◽

Storage Capacity ◽

File System ◽

Storage Systems ◽

Storage System ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Output Performance ◽

Hadoop Distributed File System

AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.

Download Full-text

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop

2014 International Conference on Intelligent Computing Applications ◽

10.1109/icica.2014.16 ◽

2014 ◽

Cited By ~ 11

Author(s):

E. Sivaraman ◽

R. Manickachezian

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Fault Tolerant ◽

Distributed File System ◽

Big Data Storage

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

Using a distributed deep learning algorithm for analyzing big data in smart cities

Smart and Sustainable Built Environment ◽

10.1108/sasbe-04-2019-0040 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Mohammed Anouar Naoui ◽

Brahim Lejdel ◽

Mouloud Ayad ◽

Abdelfattah Amamra ◽

Okba kazar

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analysis ◽

Data Storage ◽

Smart City ◽

Smart Cities ◽

Smart Environment ◽

Data Systems ◽

Content Type ◽

Big Data Systems

PurposeThe purpose of this paper is to propose a distributed deep learning architecture for smart cities in big data systems.Design/methodology/approachWe have proposed an architectural multilayer to describe the distributed deep learning for smart cities in big data systems. The components of our system are Smart city layer, big data layer, and deep learning layer. The Smart city layer responsible for the question of Smart city components, its Internet of things, sensors and effectors, and its integration in the system, big data layer concerns data characteristics 10, and its distribution over the system. The deep learning layer is the model of our system. It is responsible for data analysis.FindingsWe apply our proposed architecture in a Smart environment and Smart energy. 10; In a Smart environment, we study the Toluene forecasting in Madrid Smart city. For Smart energy, we study wind energy foresting in Australia. Our proposed architecture can reduce the time of execution and improve the deep learning model, such as Long Term Short Memory10;.Research limitations/implicationsThis research needs the application of other deep learning models, such as convolution neuronal network and autoencoder.Practical implicationsFindings of the research will be helpful in Smart city architecture. It can provide a clear view into a Smart city, data storage, and data analysis. The 10; Toluene forecasting in a Smart environment can help the decision-maker to ensure environmental safety. The Smart energy of our proposed model can give a clear prediction of power generation.Originality/valueThe findings of this study are expected to contribute valuable information to decision-makers for a better understanding of the key to Smart city architecture. Its relation with data storage, processing, and data analysis.

Download Full-text

High Performance Storage for Big Data Analytics and Visualization

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch010 ◽

2018 ◽

pp. 254-275

Author(s):

Armando Fandango ◽

William Rivera

Keyword(s):

Big Data ◽

High Speed ◽

High Performance ◽

File System ◽

Predictive Analytics ◽

Big Data Analytics ◽

File Systems ◽

Distributed Applications ◽

System Level ◽

File Formats

Scientific Big Data being gathered at exascale needs to be stored, retrieved and manipulated. The storage stack for scientific Big Data includes a file system at the system level for physical organization of the data, and a file format and input/output (I/O) system at the application level for logical organization of the data; both of them of high-performance variety for exascale. The high-performance file system is designed with concurrent access, high-speed transmission and fault tolerance characteristics. High-performance file formats and I/O are designed to allow parallel and distributed applications with easy and fast access to Big Data. These specialized file formats make it easier to store and access Big Data for scientific visualization and predictive analytics. This chapter provides a brief review of the characteristics of high-performance file systems such as Lustre and GPFS, and high-performance file formats such as HDF5, NetCDF, MPI-IO, and HDFS.

Download Full-text

File Formats for Big Data Storage Systems

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1196.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2906-2912

Keyword(s):

Big Data ◽

Data Storage ◽

Data File ◽

Data System ◽

Heterogeneous Environments ◽

File Format ◽

Data Systems ◽

File Formats ◽

Big Data Systems ◽

Big Data Storage

Big data is one of the most influential technologies of the modern era. However, in order to support maturity of big data systems, development and sustenance of heterogeneous environments is requires. This, in turn, requires integration of technologies as well as concepts. Computing and storage are the two core components of any big data system. With that said, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings the facet of big data file formats into picture. This paper classifies available big data file formats into five categories namely text-based, row-based, column-based, in-memory and data storage services. It also compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Lastly, it provides a discussion on tradeoffs that must be considered while choosing a file format for a big data system, providing a framework for creation for file format selection criteria.

Download Full-text

Data Storage Technology and its Development Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1275 ◽

2013 ◽

Vol 756-759 ◽

pp. 1275-1279

Author(s):

Lin Na Huang ◽

Feng Hua Liu

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Cloud Storage ◽

High Performance ◽

File System ◽

Storage System ◽

Distributed File System ◽

Cloud Data ◽

Storage Technology ◽

Cloud Data Storage

Cloud storage of high performance is the basic condition for cloud computing. This article introduces the concept and advantage of cloud storage, discusses the infrastructure of cloud storage system as well as the architecture of cloud data storage, researches the details about the design of Distributed File System within cloud data storage, at the same time, puts forward different developing strategies for the enterprises according to the different roles that the enterprises are acting as during the developing process of cloud computing.

Download Full-text

Research and Design of a Trusted Distributed File System Based on HDFS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.3282 ◽

2014 ◽

Vol 602-605 ◽

pp. 3282-3284

Author(s):

Fa Gui Liu ◽

Xiao Jie Zhang

Keyword(s):

File System ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Security Mechanism ◽

Advanced Persistent Threat ◽

Research And Design

Distributed file systems such as HDFS are facing the threat of Advanced Persistent Threat, APT. Although security mechanisms such as Kerberos and ACL are implemented in distributed file systems, most of them are not sufficient to solve the threats caused by APT. With the observation into traits of APT, we propose a trusted distributed file system based on HDFS, which guarantees another further security facing APT compared to the current security mechanism.

Download Full-text

An Adaptive Performance Prediction Method of Distributed File System Based on Performance Correlation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.1362 ◽

2014 ◽

Vol 998-999 ◽

pp. 1362-1365

Author(s):

Wei Feng Gao ◽

Tie Zhu Zhao ◽

Ming Bin Lin

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Large Scale ◽

File System ◽

File Systems ◽

Prediction Method ◽

Distributed File System ◽

Distributed File Systems ◽

Continuous Growth ◽

Modeling And Analysis

Distributed file systems are emerging as a key component of large scale cloud storage platform due to the continuous growth of the amount of application data. Performance modeling and analysis is an important concern in the distributed file system area. This paper focuses on the performance prediction and modeling issues. An adaptive prediction model (APModel) is proposed to predict the performance of distributed file systems by capturing the performance correlation of different performance factors. We perform a series of experiments to validate the proposed prediction model. The experiment results indicate our proposed approach can get better prediction accuracy. It is practical and can achieve sufficient performance analysis for distributed file systems.

Download Full-text