Hadoop Performance Analysis Model with Deep Data Locality

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

Download Full-text

Performance analysis model for big data applications in cloud computing

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-014-0019-z ◽

2014 ◽

Vol 3 (1) ◽

Cited By ~ 8

Author(s):

Luis Eduardo Bautista Villalpando ◽

Alain April ◽

Alain Abran

Keyword(s):

Cloud Computing ◽

Big Data ◽

Performance Analysis ◽

Analysis Model ◽

Big Data Applications

Download Full-text

Multi-Layered Security Model for Hadoop Environment

International Journal of Handheld Computing Research ◽

10.4018/ijhcr.2017100106 ◽

2017 ◽

Vol 8 (4) ◽

pp. 58-71

Author(s):

P. Victer Paul ◽

D. Veeraiah

Keyword(s):

Big Data ◽

Access Control ◽

Data System ◽

Security Model ◽

Proposed Model ◽

Access Rights ◽

Malicious Intent ◽

Hadoop System

In this article, a novel security model for the Hadoop environment has been developed to enhance security credentials of handheld systems. The proposed system deals with enabling Hadoop security in terms of a dataset and a user which is willing to access the content inside the Hadoop system. It deals with security in terms of three different features: encryption, confidentiality and authentication. The significance of the proposed model is it provides protection against malicious intent which allows only valid content into the Big data system; it enables authenticated users and people to enter into the system and make the dataset more secure; and if authentication is enhanced, then authorization can be easily gained in the Hadoop system which provides access control and access rights to resource which the user is willing to perform its function or operation. This model is implemented, and the performance has been validated using existing security variants.

Download Full-text

Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique

Procedia Computer Science ◽

10.1016/j.procs.2020.01.030 ◽

2019 ◽

Vol 165 ◽

pp. 415-420

Author(s):

V. Santhana Marichamy ◽

V. Natarajan

Keyword(s):

Big Data ◽

Performance Analysis ◽

File System ◽

Perturbation Technique ◽

Distributed File System ◽

Data Perturbation ◽

Geometric Data ◽

Hadoop Distributed File System

Download Full-text

Big Data Performance Analysis on a Hadoop Distributed File System Based on Modified Partitional Clustering Algorithm

Sustainable Communication Networks and Application - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-34515-0_48 ◽

2019 ◽

pp. 461-468

Author(s):

V. Santhana Marichamy ◽

V. Natarajan

Keyword(s):

Big Data ◽

Performance Analysis ◽

Clustering Algorithm ◽

File System ◽

Distributed File System ◽

Partitional Clustering ◽

Hadoop Distributed File System

Download Full-text

The Performance Analysis Model of Social Enterprise Using Balanced Scorecard: A Dounuri Case

Journal of Korea Service Management Society ◽

10.15706/jksms.2016.17.5.010 ◽

2016 ◽

Vol 17 (5) ◽

pp. 225-248

Author(s):

Jisung Park ◽

Seongmin Ryu

Keyword(s):

Performance Analysis ◽

Balanced Scorecard ◽

Social Enterprise ◽

Analysis Model

Download Full-text

Performance analysis of classification Algorithms: A case study of Naïve Bayes and J48 in Big Data

Journal of Applied Mathematics and Computation ◽

10.26855/jamc.2018.02.003 ◽

2018 ◽

Vol 2 (2) ◽

Keyword(s):

Big Data ◽

Performance Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithms

Download Full-text

Control Algorithm and Performance Analysis Model of Multi-log2N Switching Networks

Journal of Software ◽

10.3724/sp.j.1001.2013.04251 ◽

2014 ◽

Vol 24 (3) ◽

pp. 593-603

Author(s):

Xiao-Feng LIU ◽

You-Jian ZHAO ◽

Ya-Juan WU

Keyword(s):

Performance Analysis ◽

Control Algorithm ◽

Analysis Model ◽

Switching Networks ◽

And Performance

Download Full-text

An Optimal Framework for Spatial Query Optimization Using Hadoop in Big Data Analytics

Recent Patents on Computer Science ◽

10.2174/2213275912666190419215231 ◽

2019 ◽

Vol 12 ◽

Author(s):

Pankaj Dadheech ◽

Dinesh Goyal ◽

Sumit Srivastava ◽

Ankit Kumar

Keyword(s):

Big Data ◽

Query Optimization ◽

Spatial Data ◽

Spatial Information ◽

Big Data Analytics ◽

Spatial Query ◽

Data Process ◽

Boolean Queries ◽

Spatial Query Optimization ◽

Hadoop System

Spatial queries frequently used in Hadoop for significant data process. However, vast and massive size of spatial information makes it difficult to process the spatial inquiries proficiently, so they utilized the Hadoop system for process Big Data. We have used Boolean Queries & Geometry Boolean Spatial Data for Query Optimization using Hadoop System. In this paper, we show a lightweight and adaptable spatial data index for big data which will process in Hadoop frameworks. Results demonstrate the proficiency and adequacy of our spatial ordering system for various spatial inquiries.

Download Full-text

Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190822095422 ◽

2020 ◽

Vol 13 (4) ◽

pp. 790-797

Author(s):

Gurjit Singh Bhathal ◽

Amardeep Singh Dhiman

Keyword(s):

Big Data ◽

Data Security ◽

Data Sets ◽

Security Framework ◽

Hadoop Distributed File System ◽

Current Scenario ◽

Hadoop Cluster ◽

Ciphertext Policy ◽

In Transit ◽

Hadoop Framework

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Download Full-text

FPGA medical big data system and ischemic stroke rehabilitation nursing

Microprocessors and Microsystems ◽

10.1016/j.micpro.2021.104014 ◽

2021 ◽

Vol 83 ◽

pp. 104014

Author(s):

Shanshan Liu ◽

Dongchuan Zhai ◽

Baoxue Han

Keyword(s):

Big Data ◽

Ischemic Stroke ◽

Stroke Rehabilitation ◽

Data System ◽

Medical Big Data

Download Full-text