scholarly journals Hadoop Performance Analysis Model with Deep Data Locality

Information ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 222 ◽  
Author(s):  
Sungchul Lee ◽  
Ju-Yeon Jo ◽  
Yoohwan Kim

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

2017 ◽  
Vol 8 (4) ◽  
pp. 58-71
Author(s):  
P. Victer Paul ◽  
D. Veeraiah

In this article, a novel security model for the Hadoop environment has been developed to enhance security credentials of handheld systems. The proposed system deals with enabling Hadoop security in terms of a dataset and a user which is willing to access the content inside the Hadoop system. It deals with security in terms of three different features: encryption, confidentiality and authentication. The significance of the proposed model is it provides protection against malicious intent which allows only valid content into the Big data system; it enables authenticated users and people to enter into the system and make the dataset more secure; and if authentication is enhanced, then authorization can be easily gained in the Hadoop system which provides access control and access rights to resource which the user is willing to perform its function or operation. This model is implemented, and the performance has been validated using existing security variants.


Author(s):  
Pankaj Dadheech ◽  
Dinesh Goyal ◽  
Sumit Srivastava ◽  
Ankit Kumar

Spatial queries frequently used in Hadoop for significant data process. However, vast and massive size of spatial information makes it difficult to process the spatial inquiries proficiently, so they utilized the Hadoop system for process Big Data. We have used Boolean Queries & Geometry Boolean Spatial Data for Query Optimization using Hadoop System. In this paper, we show a lightweight and adaptable spatial data index for big data which will process in Hadoop frameworks. Results demonstrate the proficiency and adequacy of our spatial ordering system for various spatial inquiries.


2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


Sign in / Sign up

Export Citation Format

Share Document