scholarly journals Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes

Author(s):  
Shubh Goyal

Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy

2010 ◽  
Vol 30 (8) ◽  
pp. 2060-2065 ◽  
Author(s):  
Ning CAO ◽  
Zhong-hai WU ◽  
Hong-zhi LIU ◽  
Qi-xun ZHANG

2020 ◽  
Vol 1444 ◽  
pp. 012012
Author(s):  
Meisuchi Naisuty ◽  
Achmad Nizar Hidayanto ◽  
Nabila Clydea Harahap ◽  
Ahmad Rosyiq ◽  
Agus Suhanto ◽  
...  

2016 ◽  
pp. 1220-1243
Author(s):  
Ilias K. Savvas ◽  
Georgia N. Sofianidou ◽  
M-Tahar Kechadi

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.


Sign in / Sign up

Export Citation Format

Share Document