Distributed Storage Strategy and Visual Analysis for Economic Big Data

With the increasing popularity of a large number of Internet-based services and a large number of services hosted on cloud platforms, a more powerful back-end storage system is needed to support these services. At present, it is very difficult or impossible to implement a distributed storage to meet all the above assumptions. Therefore, the focus of research is to limit different characteristics to design different distributed storage solutions to meet different usage scenarios. Economic big data should have the basic requirements of high storage efficiency and fast retrieval speed. The large number of small files and the diversity of file types make the storage and retrieval of economic big data face severe challenges. This paper is oriented to the application requirements of cross-modal analysis of economic big data. According to the source and characteristics of economic big data, the data types are analyzed and the database storage architecture and data storage structure of economic big data are designed. Taking into account the spatial, temporal, and semantic characteristics of economic big data, this paper proposes a unified coding method based on the spatiotemporal data multilevel division strategy combined with Geohash and Hilbert and spatiotemporal semantic constraints. A prototype system was constructed based on Mongo DB, and the performance of the multilevel partition algorithm proposed in this paper was verified by the prototype system based on the realization of data storage management functions. The Wiener distributed memory based on the principle of Wiener filter is used to store the workload of each workload distributed storage window in a distributed manner. For distributed storage workloads, this article adopts specific types of workloads. According to its periodicity, the workload is divided into distributed storage windows of specific duration. At the beginning of each distributed storage window, distributed storage is distributed to the next distributed storage window. Experiments and tests have verified the distributed storage strategy proposed in this article, which proves that the Wiener distributed storage solution can save platform resources and configuration costs while ensuring Service Level Agreement (SLA).

Download Full-text

Supporting SLA via Adaptive Mapping and Heterogeneous Storage Devices in Ceph

Electronics ◽

10.3390/electronics10070847 ◽

2021 ◽

Vol 10 (7) ◽

pp. 847

Author(s):

Sopanhapich Chum ◽

Heekwon Park ◽

Jongmoo Choi

Keyword(s):

Positive Impact ◽

Distributed Storage ◽

Storage System ◽

Hard Disk Drives ◽

Service Level Agreement ◽

Service Level ◽

Cost Ratio ◽

Solid State Drives ◽

Storage Devices ◽

Management Scheme

This paper proposes a new resource management scheme that supports SLA (Service-Level Agreement) in a bigdata distributed storage system. Basically, it makes use of two mapping modes, isolated mode and shared mode, in an adaptive manner. In specific, to ensure different QoS (Quality of Service) requirements among clients, it isolates storage devices so that urgent clients are not interfered by normal clients. When there is no urgent client, it switches to the shared mode so that normal clients can access all storage devices, thus achieving full performance. To provide this adaptability effectively, it devises two techniques, called logical cluster and normal inclusion. In addition, this paper explores how to exploit heterogeneous storage devices, HDDs (Hard Disk Drives) and SSDs (Solid State Drives), to support SLA. It examines two use cases and observes that separating data and metadata into different devices gives a positive impact on the performance per cost ratio. Real implementation-based evaluation results show that this proposal can satisfy the requirements of diverse clients and can provide better performance compared with a fixed mapping-based scheme.

Download Full-text

A Benchmark for Suitability of Alluxio over Spark

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8190.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 245-250

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Large Data ◽

Time Data ◽

Big Data Applications ◽

Access To Data

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Download Full-text

Classification and Processing of Big Data in Sensor Network Based on Suffix Tree Clustering

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i01.9785 ◽

2019 ◽

Vol 15 (01) ◽

pp. 171

Author(s):

Jun Tian ◽

Lirong Huang

Keyword(s):

Big Data ◽

Standard Deviation ◽

Sensor Network ◽

Data Storage ◽

Large Scale ◽

Suffix Tree ◽

Distributed Storage ◽

Storage System ◽

Base Station ◽

Universal System

Aiming at the perception data acquired by the widely used, fast-developing but still not perfect wireless sensor network system, a relatively complete and universal system for the collection, transmission, storage and cluster analysis of perception data is designed. Perception data is spliced and compressed at the node and reconstructed at the base station, the problem of the acquisition of perception data and energy consumption of transmission is optimized, the distributed storage system is established, and the data reading mechanism and data storage architecture are designed accordingly.The data acquisition protocol and the traditional protocol, the storage system itself and the Oracle database system, and <a name="_Hlk527548018"></a>Standard Deviation and Eigensystem Realization Algorithm are respectively adopted for comparison test.Based on Standard Deviation algorithm, the operation of suffix tree clustering is carried out, and the general steps of suffix tree clustering are studied and the structure of perception data and the characteristics of storage are adapted, and the data classification operation based on suffix tree clustering is completed. The results show that proposed Standard Deviationalgorithm algorithm not only inherits the efficiency of the classical algorithm for processing big data, but also has obvious effect on large-scale discrete data processing, and the efficiency is obviously improved compared with the traditional method.

Download Full-text

Research of Big Data Storage System Based on Underground Space Information

10.1145/3491396.3506516 ◽

2021 ◽

Author(s):

Chunxiao Wang ◽

Zhigang Zhao ◽

Jian Zhang ◽

Jidong Huo

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Underground Space ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Monitoring of a Grid Storage Virtualization Service

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2013010104 ◽

2013 ◽

Vol 5 (1) ◽

pp. 53-69

Author(s):

Jacques Jorda ◽

Aurélien Ortiz ◽

Abdelaziz M’zoughi ◽

Salam Traboulsi

Keyword(s):

Monitoring System ◽

Data Storage ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Data Access ◽

Data Placement ◽

Workload Prediction ◽

Storage Virtualization

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.

Download Full-text

A Big Data Storage Scheme Based on Distributed Storage Locations and Multiple Authorizations

2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS) ◽

10.1109/bigdatasecurity-hpsc-ids.2019.00014 ◽

2019 ◽

Cited By ~ 3

Author(s):

Zeyad A. Al-Odat ◽

Eman M. Al-Qtiemat ◽

Samee U. Khan

Keyword(s):

Big Data ◽

Data Storage ◽

Distributed Storage ◽

Storage Scheme ◽

Big Data Storage

Download Full-text

Research on Data Storage System of E-Commerce Purchasing Based on Big Data Technology

2019 International Conference on Virtual Reality and Intelligent Systems (ICVRIS) ◽

10.1109/icvris.2019.00104 ◽

2019 ◽

Author(s):

Chun-Rong Zhang ◽

Kun Wang

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Data Storage System ◽

Big Data Technology

Download Full-text

Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data

2015 IEEE International Parallel and Distributed Processing Symposium Workshop ◽

10.1109/ipdpsw.2015.65 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lars Lundberg ◽

Hakan Grahn ◽

Dragos Ilie ◽

Christian Melander

Keyword(s):

Big Data ◽

High Performance ◽

Fault Tolerant ◽

Distributed Storage ◽

Storage System ◽

Distributed Storage System

Download Full-text

Application and research of massive big data storage system based on HBase

2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) ◽

10.1109/icccbda.2018.8386515 ◽

2018 ◽

Cited By ~ 1

Author(s):

Pan Zhengjun ◽

Zhao Lianfen

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Data Storage System ◽

Big Data Storage

Download Full-text

Big Data Storage System Based on a Distributed Hash Tables System

International Journal of Database Management Systems ◽

10.5121/ijdms.2020.12501 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1-9

Author(s):

Telesphore Tiendrebeogo ◽

Mamadou Diarra

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Research Work ◽

Future Research ◽

Distributed Hash Tables ◽

Hash Tables ◽

Wide Range ◽

Data Storage System ◽

Big Data Storage

The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.

Download Full-text