Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

Shubh Goyal

doi:10.22214/ijraset.2021.38877

Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38877 ◽

2021 ◽

Vol 9 (11) ◽

pp. 656-659

Author(s):

Shubh Goyal

Keyword(s):

File System ◽

Distributed File System ◽

Distributed Environment ◽

Local Data ◽

Major Purpose ◽

Data Redundancy ◽

Hadoop Distributed File System

Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy

Download Full-text

Improving downloading performance in hadoop distributed file system

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02060 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2060-2065 ◽

Cited By ~ 4

Author(s):

Ning CAO ◽

Zhong-hai WU ◽

Hong-zhi LIU ◽

Qi-xun ZHANG

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

A Technique For Big Statistics Security Based on Hadoop Distributed File System

SSRN Electronic Journal ◽

10.2139/ssrn.3508526 ◽

2019 ◽

Author(s):

Sindhu D M ◽

DR.Ravikumar G.K ◽

Manu Y.M

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Data protection on hadoop distributed file system by using encryption algorithms: a systematic literature review

Journal of Physics Conference Series ◽

10.1088/1742-6596/1444/1/012012 ◽

2020 ◽

Vol 1444 ◽

pp. 012012

Author(s):

Meisuchi Naisuty ◽

Achmad Nizar Hidayanto ◽

Nabila Clydea Harahap ◽

Ahmad Rosyiq ◽

Agus Suhanto ◽

...

Keyword(s):

Literature Review ◽

Systematic Literature Review ◽

Data Protection ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Encryption Algorithms

Download Full-text

A Study on Security Approaches for Big Data Hadoop Distributed File System

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.8266.8272 ◽

2019 ◽

Vol 14 (22) ◽

pp. 8266-8272

Author(s):

Leelavathi . ◽

M. Elshayeb

Keyword(s):

Big Data ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

The Evolution of the Hadoop Distributed File System

2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA) ◽

10.1109/waina.2018.00065 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stathis Maneas ◽

Bianca Schroeder

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text