Detailed black-box monitoring of distributed systems

Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, interapplication traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

Download Full-text

An Architecture for Distributed Electronic Documents Storage in Decentralized Blockchain B2B Applications

Computers ◽

10.3390/computers10110142 ◽

2021 ◽

Vol 10 (11) ◽

pp. 142

Author(s):

Obadah Hammoud ◽

Ivan Tarkhanov ◽

Artyom Kosmarski

Keyword(s):

Distributed Systems ◽

Data Storage ◽

Distributed Storage ◽

Distributed Data ◽

Erasure Coding ◽

Distributed Data Storage ◽

Electronic Documents ◽

File Storage ◽

Load Balancer ◽

The Cost

This paper investigates the problem of distributed storage of electronic documents (both metadata and files) in decentralized blockchain-based b2b systems (DApps). The need to reduce the cost of implementing such systems and the insufficient elaboration of the issue of storing big data in DLT are considered. An approach for building such systems is proposed, which allows optimizing the size of the required storage (by using Erasure coding) and simultaneously providing secure data storage in geographically distributed systems of a company, or within a consortium of companies. The novelty of this solution is that we are the first who combine enterprise DLT with distributed file storage, in which the availability of files is controlled. The results of our experiment demonstrate that the speed of the described DApp is comparable to known b2c torrent projects, and subsequently justify the choice of Hyperledger Fabric and Ethereum Enterprise for its use. Obtained test results show that public blockchain networks are not suitable for creating such a b2b system. The proposed system solves the main challenges of distributed data storage by grouping data into clusters and managing them with a load balancer, while preventing data tempering using a blockchain network. The considered DApps storage methodology easily scales horizontally in terms of distributed file storage and can be deployed on cloud computing technologies, while minimizing the required storage space. We compare this approach with known methods of file storage in distributed systems, including central storage, torrents, IPFS, and Storj. The reliability of this approach is calculated and the result is compared to traditional solutions based on full backup.

Download Full-text

Blockchain-Based Data Market (BCBDM) Framework for Security and Privacy

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch012 ◽

2021 ◽

pp. 186-205

Author(s):

Shailesh Pancham Khapre ◽

Chandramohan Dhasarathan ◽

Puviyarasi T. ◽

Sam Goundar

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Storage ◽

Data Privacy ◽

Security And Privacy ◽

Research Progress ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Market ◽

Privacy Issues

In the internet era, incalculable data is generated every day. In the process of data sharing, complex issues such as data privacy and ownership are emerging. Blockchain is a decentralized distributed data storage technology. The introduction of blockchain can eliminate the disadvantages of the centralized data market, but at the same time, distributed data markets have created security and privacy issues. It summarizes the industry status and research progress of the domestic and foreign big data trading markets and refines the nature of the blockchain-based big data sharing and circulation platform. Based on these properties, a blockchain-based data market (BCBDM) framework is proposed, and the security and privacy issues as well as corresponding solutions in this framework are analyzed and discussed. Based on this framework, a data market testing system was implemented, and the feasibility and security of the framework were confirmed.

Download Full-text

Distributed Data Storage Technique for Big Data using Hadoop

Bonfring International Journal of Software Engineering and Soft Computing ◽

10.9756/bijsesc.8240 ◽

2016 ◽

Vol 6 (Special Issue) ◽

pp. 43-48

Author(s):

M.M. Kodabagi ◽

Savita Rathod ◽

Vilas Naik

Keyword(s):

Big Data ◽

Data Storage ◽

Distributed Data ◽

Distributed Data Storage ◽

Storage Technique

Download Full-text

IoT Big Data provenance scheme using blockchain on Hadoop ecosystem

Journal Of Big Data ◽

10.1186/s40537-021-00505-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Houshyar Honar Pajooh ◽

Mohammed A. Rashid ◽

Fakhrul Alam ◽

Serge Demidenko

Keyword(s):

Big Data ◽

Data Storage ◽

Large Scale ◽

Data System ◽

Third Party ◽

Data Provenance ◽

Distributed Data ◽

Distributed Data Storage ◽

Blockchain Technology ◽

Iot Devices

AbstractThe diversity and sheer increase in the number of connected Internet of Things (IoT) devices have brought significant concerns associated with storing and protecting a large volume of IoT data. Storage volume requirements and computational costs are continuously rising in the conventional cloud-centric IoT structures. Besides, dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks. In this paper, a layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system are proposed. It has been developed to mitigate the above-mentioned challenges by using the Hyperledger Fabric (HLF) platform for distributed ledger solutions. The need for a centralized server and a third-party auditor was eliminated by leveraging HLF peers performing transaction verifications and records audits in a big data system with the help of blockchain technology. The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata are stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Additionally, a prototype has been implemented on embedded hardware showing the feasibility of deploying the proposed solution in IoT edge computing and big data ecosystems. Finally, experiments have been conducted to evaluate the performance of the proposed scheme in terms of its throughput, latency, communication, and computation costs. The obtained results have indicated the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain. The experimental results show the throughput of about 600 transactions, 500 ms average response time, about 2–3% of the CPU consumption at the peer process and approximately 10–20% at the client node. The minimum latency remained below 1 s however, there is an increase in the maximum latency when the sending rate reached around 200 transactions per second (TPS).

Download Full-text

Research on Optimization of Big data Storage Structure in Distributed System

Proceedings of the 2017 7th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2017) ◽

10.2991/icadme-17.2017.63 ◽

2017 ◽

Author(s):

Zheng-Wu Lu

Keyword(s):

Big Data ◽

Data Storage ◽

Distributed System ◽

Storage Structure ◽

Big Data Storage

Download Full-text

A Case Study on Effective Technique of Distributed Data Storage for Big Data Processing in the Wireless Internet Environment

Wireless Personal Communications ◽

10.1007/s11277-015-2794-3 ◽

2015 ◽

Vol 86 (1) ◽

pp. 239-253 ◽

Cited By ~ 9

Author(s):

Seong-Taek Park ◽

Yeong-Real Kim ◽

Seon-Phil Jeong ◽

Chang-Ick Hong ◽

Tae-Gu Kang

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Distributed Data ◽

Wireless Internet ◽

Effective Technique ◽

Big Data Processing ◽

Distributed Data Storage ◽

Internet Environment

Download Full-text

Big Data Framework for storage Extraction and Identification of Data using Hadoop Distributed File system

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1002.1292s319 ◽

2019 ◽

Vol 9 (2S3) ◽

pp. 392-394

Keyword(s):

Big Data ◽

Data Storage ◽

Open Source Software ◽

Large Scale ◽

Mixed Data ◽

Software Framework ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Framework ◽

Hadoop Distributed File System

Big data is all about the developing challenge that associations face in today’s world, As they manage enormous and quickly developing wellsprings of information or data, with the complex range of analysis and the problem includes computing infrastructure, accessing mixed data both structured and unstructured data from various sources such as networking, Recording and stored images. Hadoop is the open source software framework includes no of compartments that are specifically designed for solving large-scale distributed data storage. MapReduce is a parallel programming design for processing

Download Full-text

Activity of public control entities and development of distributed computing and distributed data storage systems

Journal of Law and Administration ◽

10.24833/2073-8420-2018-1-46-14-22 ◽

2018 ◽

pp. 14-22

Author(s):

D. V. Gribanov

Keyword(s):

Distributed Computing ◽

Data Storage ◽

Storage Systems ◽

Legal Regulation ◽

Distributed Data ◽

Distributed Data Storage ◽

Public Control ◽

Blockchain Technology ◽

Legal Method ◽

Digital Assets

Introduction. This article is devoted to legal regulation of digital assets turnover, utilization possibilities of distributed computing and distributed data storage systems in activities of public authorities and entities of public control. The author notes that some national and foreign scientists who study a “blockchain” technology (distributed computing and distributed data storage systems) emphasize its usefulness in different activities. Data validation procedure of digital transactions, legal regulation of creation, issuance and turnover of digital assets need further attention.Materials and methods. The research is based on common scientific (analysis, analogy, comparing) and particular methods of cognition of legal phenomena and processes (a method of interpretation of legal rules, a technical legal method, a formal legal method and a formal logical one).Results of the study. The author conducted an analysis which resulted in finding some advantages of the use of the “blockchain” technology in the sphere of public control which are as follows: a particular validation system; data that once were entered in the system of distributed data storage cannot be erased or forged; absolute transparency of succession of actions while exercising governing powers; automatic repeat of recurring actions. The need of fivefold validation of exercising governing powers is substantiated. The author stresses that the fivefold validation shall ensure complex control over exercising of powers by the civil society, the entities of public control and the Russian Federation as a federal state holding sovereignty over its territory. The author has also conducted a brief analysis of judicial decisions concerning digital transactions.Discussion and conclusion. The use of the distributed data storage system makes it easier to exercise control due to the decrease of risks of forge, replacement or termination of data. The author suggests defining digital transaction not only as some actions with digital assets, but also as actions toward modification and addition of information about legal facts with a purpose of its establishment in the systems of distributed data storage. The author suggests using the systems of distributed data storage for independent validation of information about activities of the bodies of state authority. In the author’s opinion, application of the “blockchain” technology may result not only in the increase of efficiency of public control, but also in the creation of a new form of public control – automatic control. It is concluded there is no legislation basis for regulation of legal relations concerning distributed data storage today.

Download Full-text