Background:
In current scenario of internet, large amounts of data are generated and
processed. Hadoop framework is widely used to store and process big data in a highly distributed
manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks
on the data.
Objective:
The main objective of the proposed work is to provide a complete security approach
comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure
the data at rest as well as in transit.
Methods:
The proposed algorithm uses Kerberos network authentication protocol for authorisation
and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute-
Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with
their own set of attributes and stores on Hadoop Distributed File System. Only intended users can
decrypt that file with matching parameters.
Results:
The proposed algorithm was implemented with data sets of different sizes. The data was
processed with and without encryption. The results show little difference in processing time. The
performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also,
like system configuration, the number of parallel jobs running and virtual environment.
Conclusion:
The solutions available for handling the big data security problems faced in Hadoop
framework are inefficient or incomplete. A complete security framework is proposed for Hadoop
Environment. The solution is experimentally proven to have little effect on the performance of the
system for datasets of different sizes.