scholarly journals Distributed Storage Strategy and Visual Analysis for Economic Big Data

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Xiangli Chang ◽  
Hailang Cui

With the increasing popularity of a large number of Internet-based services and a large number of services hosted on cloud platforms, a more powerful back-end storage system is needed to support these services. At present, it is very difficult or impossible to implement a distributed storage to meet all the above assumptions. Therefore, the focus of research is to limit different characteristics to design different distributed storage solutions to meet different usage scenarios. Economic big data should have the basic requirements of high storage efficiency and fast retrieval speed. The large number of small files and the diversity of file types make the storage and retrieval of economic big data face severe challenges. This paper is oriented to the application requirements of cross-modal analysis of economic big data. According to the source and characteristics of economic big data, the data types are analyzed and the database storage architecture and data storage structure of economic big data are designed. Taking into account the spatial, temporal, and semantic characteristics of economic big data, this paper proposes a unified coding method based on the spatiotemporal data multilevel division strategy combined with Geohash and Hilbert and spatiotemporal semantic constraints. A prototype system was constructed based on Mongo DB, and the performance of the multilevel partition algorithm proposed in this paper was verified by the prototype system based on the realization of data storage management functions. The Wiener distributed memory based on the principle of Wiener filter is used to store the workload of each workload distributed storage window in a distributed manner. For distributed storage workloads, this article adopts specific types of workloads. According to its periodicity, the workload is divided into distributed storage windows of specific duration. At the beginning of each distributed storage window, distributed storage is distributed to the next distributed storage window. Experiments and tests have verified the distributed storage strategy proposed in this article, which proves that the Wiener distributed storage solution can save platform resources and configuration costs while ensuring Service Level Agreement (SLA).

Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 847
Author(s):  
Sopanhapich Chum ◽  
Heekwon Park ◽  
Jongmoo Choi

This paper proposes a new resource management scheme that supports SLA (Service-Level Agreement) in a bigdata distributed storage system. Basically, it makes use of two mapping modes, isolated mode and shared mode, in an adaptive manner. In specific, to ensure different QoS (Quality of Service) requirements among clients, it isolates storage devices so that urgent clients are not interfered by normal clients. When there is no urgent client, it switches to the shared mode so that normal clients can access all storage devices, thus achieving full performance. To provide this adaptability effectively, it devises two techniques, called logical cluster and normal inclusion. In addition, this paper explores how to exploit heterogeneous storage devices, HDDs (Hard Disk Drives) and SSDs (Solid State Drives), to support SLA. It examines two use cases and observes that separating data and metadata into different devices gives a positive impact on the performance per cost ratio. Real implementation-based evaluation results show that this proposal can satisfy the requirements of diverse clients and can provide better performance compared with a fixed mapping-based scheme.


Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.


Author(s):  
Jun Tian ◽  
Lirong Huang

<span lang="EN-US">Aiming at the perception data acquired by the widely used, fast-developing but still not perfect wireless sensor network system, a relatively complete and universal system for the collection, transmission, storage and cluster analysis of perception data is designed. P</span><span lang="EN-US">erception data is spliced and compressed at the node and reconstructed at the base station, the problem of the acquisition of </span><span lang="EN-US">perception data</span><span lang="EN-US"> and energy consumption of transmission is optimized, the distributed storage system is established, and the data reading mechanism and data storage architecture are designed accordingly.</span><span lang="EN-US">The data acquisition protocol and the traditional protocol, the storage system itself and the Oracle database system, and <a name="_Hlk527548018"></a>Standard Deviation and Eigensystem Realization Algorithm are respectively adopted for comparison test.</span><span lang="EN-US">Based on Standard Deviation algorithm, the operation of suffix tree clustering is carried out, and the general steps of suffix tree clustering are studied and the structure of perception data and the characteristics of storage are adapted, and the data classification operation based on suffix tree clustering is completed.</span><span lang="EN-US"> The results show that </span><span lang="EN-US">proposed Standard Deviationalgorithm algorithm not only inherits the efficiency of the classical algorithm for processing big data, but also has obvious effect on large-scale discrete data processing, and the efficiency is obviously improved compared with the traditional method.</span>


2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.


2020 ◽  
Vol 12 (5) ◽  
pp. 1-9
Author(s):  
Telesphore Tiendrebeogo ◽  
Mamadou Diarra

The Big Data is unavoidable considering the place of the digital is the predominant form of communication in the daily life of the consumer. The control of its stakes and the quality its data must be a priority in order not to distort the strategies arising from their treatment in the aim to derive profit. In order to achieve this, a lot of research work has been carried out companies and several platforms created. MapReduce, is one of the enabling technologies, has proven to be applicable to a wide range of fields. However, despite its importance recent work has shown its limitations. And to remedy this, the Distributed Hash Tables (DHT) has been used. Thus, this document not only analyses the and MapReduce implementations and Top-Level Domain (TLD)s in general, but it also provides a description of a model of DHT as well as some guidelines for the planification of the future research.


Sign in / Sign up

Export Citation Format

Share Document