Big Data Framework for storage Extraction and Identification of Data using Hadoop Distributed File system

AbstractThe diversity and sheer increase in the number of connected Internet of Things (IoT) devices have brought significant concerns associated with storing and protecting a large volume of IoT data. Storage volume requirements and computational costs are continuously rising in the conventional cloud-centric IoT structures. Besides, dependencies of the centralized server solution impose significant trust issues and make it vulnerable to security risks. In this paper, a layer-based distributed data storage design and implementation of a blockchain-enabled large-scale IoT system are proposed. It has been developed to mitigate the above-mentioned challenges by using the Hyperledger Fabric (HLF) platform for distributed ledger solutions. The need for a centralized server and a third-party auditor was eliminated by leveraging HLF peers performing transaction verifications and records audits in a big data system with the help of blockchain technology. The HLF blockchain facilitates storing the lightweight verification tags on the blockchain ledger. In contrast, the actual metadata are stored in the off-chain big data system to reduce the communication overheads and enhance data integrity. Additionally, a prototype has been implemented on embedded hardware showing the feasibility of deploying the proposed solution in IoT edge computing and big data ecosystems. Finally, experiments have been conducted to evaluate the performance of the proposed scheme in terms of its throughput, latency, communication, and computation costs. The obtained results have indicated the feasibility of the proposed solution to retrieve and store the provenance of large-scale IoT data within the Big Data ecosystem using the HLF blockchain. The experimental results show the throughput of about 600 transactions, 500 ms average response time, about 2–3% of the CPU consumption at the peer process and approximately 10–20% at the client node. The minimum latency remained below 1 s however, there is an increase in the maximum latency when the sending rate reached around 200 transactions per second (TPS).

Download Full-text

Grid Data Handling

IT Policy and Ethics ◽

10.4018/978-1-4666-2919-6.ch014 ◽

2013 ◽

pp. 294-321

Author(s):

Alexandru Costan

Keyword(s):

Fault Tolerance ◽

Data Storage ◽

Large Scale ◽

File Systems ◽

Future Research ◽

Distributed Data ◽

Data Handling ◽

Grid Data ◽

Distributed Data Storage ◽

Grid Environments

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.

Download Full-text

Towards Distributed Data Management in Fog Computing

Wireless Communications and Mobile Computing ◽

10.1155/2018/7597686 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 25

Author(s):

Vasileios Moysiadis ◽

Panagiotis Sarigiannidis ◽

Ioannis Moscholios

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Large Scale ◽

Fog Computing ◽

Cloud Services ◽

Smart Devices ◽

Distributed Data ◽

Storage Allocation ◽

Huge Amount ◽

Distributed Data Storage

In the emerging area of the Internet of Things (IoT), the exponential growth of the number of smart devices leads to a growing need for efficient data storage mechanisms. Cloud Computing was an efficient solution so far to store and manipulate such huge amount of data. However, in the next years it is expected that Cloud Computing will be unable to handle the huge amount of the IoT devices efficiently due to bandwidth limitations. An arising technology which promises to overwhelm many drawbacks in large-scale networks in IoT is Fog Computing. Fog Computing provides high-quality Cloud services in the physical proximity of mobile users. Computational power and storage capacity could be offered from the Fog, with low latency and high bandwidth. This survey discusses the main features of Fog Computing, introduces representative simulators and tools, highlights the benefits of Fog Computing in line with the applications of large-scale IoT networks, and identifies various aspects of issues we may encounter when designing and implementing social IoT systems in the context of the Fog Computing paradigm. The rationale behind this work lies in the data storage discussion which is performed by taking into account the importance of storage capabilities in modern Fog Computing systems. In addition, we provide a comprehensive comparison among previously developed distributed data storage systems which consist of a promising solution for data storage allocation in Fog Computing.

Download Full-text

A Framework for Secure Data Storage and Retrieval in Cloud Environment

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3794.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2511-2520

Keyword(s):

Big Data ◽

Data Storage ◽

Large Volume ◽

Service Providers ◽

Research Work ◽

Software Framework ◽

Software System ◽

Efficient Computation ◽

Data Framework ◽

Symmetric Key

Plenty of research work is going on for efficient storage, processing, and analysis of large volume of data generated in real time and having varying nature and quality. The most common open-source framework for efficient computation of such large volume of data is Hadoop which processes big data sets by employing clusters of networked computers. On the other hand, cloud computing refers to storage of data and applications in cloud servers and accessing of the data of applications over the Internet following an on demand scheme. So the organizations who want to reduce costs and complexities associated with big data framework, the most suitable option for them is to take help of cloud infrastructure. But one biggest concern in this regard is the security of data and applications in cloud. Though Hadoop provides in-built encryption scheme and secured HTTP protocol, once data and applications are stored in public cloud, they become vulnerable to various security breaches still remain uncontrolled by the cloud service providers giving rise of a feeling of untrust. In this scenario, encrypting sensitive business data before cloud uploading may help in preventing access of data by evil intruders. In this paper, an extension to Hadoop security with respect to shared cloud has been proposed by designing a software framework where files are encrypted before uploading to cloud. Security performance of this framework for securing data in storage as well as in transit has been implemented such that without using the framework retrieval of data is not at all possible. Extra layer of security aided by symmetric key cryptographic technique has been proposed which will enhance the security of customers’ resources along with the present standard security measures of a cloud system. A software system performs symmetric encryption before transmitting a file of any format to cloud. To access this encrypted file, the same software system has to be used to download and decrypt the file. This paper also investigates the performances of most common symmetric key techniques AES, DES and triple DES cryptography with respect to the successful encryption of the customer data. This software framework can be applied to provide an extra security layer at the client’s end for users availing service of the cloud platform.

Download Full-text

Client-centric consistency formalization and verification for system with large-scale distributed data storage

Future Generation Computer Systems ◽

10.1016/j.future.2010.06.006 ◽

2010 ◽

Vol 26 (8) ◽

pp. 1180-1188 ◽

Cited By ~ 7

Author(s):

Yuqing Zhu ◽

Jianmin Wang

Keyword(s):

Data Storage ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Storage

Download Full-text

Blockchain-Based Data Market (BCBDM) Framework for Security and Privacy

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch012 ◽

2021 ◽

pp. 186-205

Author(s):

Shailesh Pancham Khapre ◽

Chandramohan Dhasarathan ◽

Puviyarasi T. ◽

Sam Goundar

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Storage ◽

Data Privacy ◽

Security And Privacy ◽

Research Progress ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Market ◽

Privacy Issues

In the internet era, incalculable data is generated every day. In the process of data sharing, complex issues such as data privacy and ownership are emerging. Blockchain is a decentralized distributed data storage technology. The introduction of blockchain can eliminate the disadvantages of the centralized data market, but at the same time, distributed data markets have created security and privacy issues. It summarizes the industry status and research progress of the domestic and foreign big data trading markets and refines the nature of the blockchain-based big data sharing and circulation platform. Based on these properties, a blockchain-based data market (BCBDM) framework is proposed, and the security and privacy issues as well as corresponding solutions in this framework are analyzed and discussed. Based on this framework, a data market testing system was implemented, and the feasibility and security of the framework were confirmed.

Download Full-text

Distributed Data Storage Technique for Big Data using Hadoop

Bonfring International Journal of Software Engineering and Soft Computing ◽

10.9756/bijsesc.8240 ◽

2016 ◽

Vol 6 (Special Issue) ◽

pp. 43-48

Author(s):

M.M. Kodabagi ◽

Savita Rathod ◽

Vilas Naik

Keyword(s):

Big Data ◽

Data Storage ◽

Distributed Data ◽

Distributed Data Storage ◽

Storage Technique

Download Full-text

Grid Data Handling

Computational and Data Grids ◽

10.4018/978-1-61350-113-9.ch005 ◽

2011 ◽

pp. 112-139

Author(s):

Alexandru Costan

Keyword(s):

Fault Tolerance ◽

Data Storage ◽

Large Scale ◽

File Systems ◽

Future Research ◽

Distributed Data ◽

Data Handling ◽

Grid Data ◽

Distributed Data Storage ◽

Grid Environments

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.

Download Full-text

Detailed black-box monitoring of distributed systems

ACM SIGAPP Applied Computing Review ◽

10.1145/3477133.3477135 ◽

2021 ◽

Vol 21 (1) ◽

pp. 24-36

Author(s):

Francisco Neves ◽

Ricardo Vilaça ◽

José Pereira

Keyword(s):

Big Data ◽

Distributed Systems ◽

Data Storage ◽

Distributed System ◽

Structural Information ◽

Black Box ◽

Distributed Data ◽

Resource Usage ◽

Distributed Data Storage ◽

Big Data Storage

Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, interapplication traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

Download Full-text