A New Hybrid Storage System Base on Openstack

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

Overview of Big Data-Intensive Storage and its Technologies for Cloud and Fog Computing

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch005 ◽

2021 ◽

pp. 112-153

Author(s):

Richard S. Segall ◽

Jeffrey S Cook ◽

Gao Niu

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

Storage Systems ◽

Fog Computing ◽

Storage Management ◽

Data Intensive Computing ◽

Computing Systems ◽

Application Performance ◽

Data Intensive

Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and subsequently storage management is critical to application performance in such data-intensive computing systems. However, if existing resource management frameworks in these systems lack the support for storage management, this would cause unpredictable performance degradation when applications are under input/output (I/O) contention. Storage management of data-intensive systems is a challenge. Big Data plays a most major role in storage systems for data-intensive computing. This article deals with these difficulties along with discussion of High Performance Computing (HPC) systems, background for storage systems for data-intensive applications, storage patterns and storage mechanisms for Big Data, the Top 10 Cloud Storage Systems for data-intensive computing in today's world, and the interface between Big Data Intensive Storage and Cloud/Fog Computing. Big Data storage and its server statistics and usage distributions for the Top 500 Supercomputers in the world are also presented graphically and discussed as data-intensive storage components that can be interfaced with Fog-to-cloud interactions and enabling protocols.

Download Full-text

Research and Design on Intranet Security File System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.4704 ◽

2012 ◽

Vol 433-440 ◽

pp. 4704-4709

Author(s):

Yan Shen Chen ◽

De Zhi Han

Keyword(s):

Data Storage ◽

Data Security ◽

High Performance ◽

File System ◽

Storage System ◽

Data Encryption ◽

Security Issue ◽

Storage And Retrieval ◽

System A ◽

Secure File System

To solve the data security issue in intranet massive storage system, a Multi-Protocol Secure File System ( for short MPSFS) is designed. Firstly, the MPSFS supports the access of users with different protocols, and provides the unified access interface, so can achieve high performance in data storage and retrieval; secondly, with the help of other technologies such as identity authentication, access control and data encryption, the MPSFS can effectively ensure the data security in the intranet storage system. By the experiment, the MPSFS can provide good security and scalability for intranet massive storage system, and has less effect to the network I/O performance.

Download Full-text

Data Storage Technology and its Development Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1275 ◽

2013 ◽

Vol 756-759 ◽

pp. 1275-1279

Author(s):

Lin Na Huang ◽

Feng Hua Liu

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Cloud Storage ◽

High Performance ◽

File System ◽

Storage System ◽

Distributed File System ◽

Cloud Data ◽

Storage Technology ◽

Cloud Data Storage

Cloud storage of high performance is the basic condition for cloud computing. This article introduces the concept and advantage of cloud storage, discusses the infrastructure of cloud storage system as well as the architecture of cloud data storage, researches the details about the design of Distributed File System within cloud data storage, at the same time, puts forward different developing strategies for the enterprises according to the different roles that the enterprises are acting as during the developing process of cloud computing.

Download Full-text

Overview of Big-Data-Intensive Storage and Its Technologies

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch002 ◽

2018 ◽

pp. 33-74

Author(s):

Richard S. Segall ◽

Jeffrey S. Cook

Keyword(s):

Big Data ◽

Data Storage ◽

Storage Systems ◽

Storage System ◽

Management Strategies ◽

Sensor Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Future Challenges ◽

Data Storage System

This chapter deals with a detailed discussion on the storage systems for data-intensive computing using Big Data. The chapter begins with a brief introduction about data-intensive computing and types of parallel processing approaches. It also highlights the points that display how data-intensive computing systems differ from other forms of computing. A discussion on the importance of Big Data computing is put forth. The current and future challenges of storage in genomics are discussed in detail. Also, storage and data management strategies are given. The chapter's focus is then on the software challenges for storage. Storage use cases are provided like DataDirect Networks, SDSC, etc. The list of storage tools and their details are provided. A small section discusses the sensor data storage system. Then a table is provided that shows the top 10 cloud storage systems for data-intensive computing using Big Data in the world. Top 500 Big Data storage servers statistics are also displayed effectively by the images from Top500 website.

Download Full-text

Overview of Big Data-Intensive Storage and its Technologies for Cloud and Fog Computing

International Journal of Fog Computing ◽

10.4018/ijfc.2019010104 ◽

2019 ◽

Vol 2 (1) ◽

pp. 74-113 ◽

Cited By ~ 1

Author(s):

Richard S. Segall ◽

Jeffrey S Cook ◽

Gao Niu

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

Storage Systems ◽

Fog Computing ◽

Storage Management ◽

Data Intensive Computing ◽

Computing Systems ◽

Application Performance ◽

Data Intensive

Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and subsequently storage management is critical to application performance in such data-intensive computing systems. However, if existing resource management frameworks in these systems lack the support for storage management, this would cause unpredictable performance degradation when applications are under input/output (I/O) contention. Storage management of data-intensive systems is a challenge. Big Data plays a most major role in storage systems for data-intensive computing. This article deals with these difficulties along with discussion of High Performance Computing (HPC) systems, background for storage systems for data-intensive applications, storage patterns and storage mechanisms for Big Data, the Top 10 Cloud Storage Systems for data-intensive computing in today's world, and the interface between Big Data Intensive Storage and Cloud/Fog Computing. Big Data storage and its server statistics and usage distributions for the Top 500 Supercomputers in the world are also presented graphically and discussed as data-intensive storage components that can be interfaced with Fog-to-cloud interactions and enabling protocols.

Download Full-text

A Benchmark for Suitability of Alluxio over Spark

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8190.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 245-250

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Large Data ◽

Time Data ◽

Big Data Applications ◽

Access To Data

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Download Full-text

Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy

Proceedings of the International Astronomical Union ◽

10.1017/s1743921321000387 ◽

2019 ◽

Vol 15 (S367) ◽

pp. 464-466

Author(s):

Paul Bartus

Keyword(s):

Data Storage ◽

Storage Capacity ◽

File System ◽

Storage Systems ◽

Storage System ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Output Performance ◽

Hadoop Distributed File System

AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.

Download Full-text