Sqoop usage in Hadoop Distributed File System and Observations to Handle Common Errors

The Hadoop framework provides a way of storing and processing the huge amounts of the data. The social media like Facebook, twitter and amazon uses Hadoop eco system tools so as to store the data in Hadoop distributed file system and to process the data Map Reduce (MR). The current work describes the usage of Sqoop in the process of import and export with HDFS. The work involves various possible import/export commands supported by the tool Sqoop in the eco system of Hadoop. The importance of the work is to highlight the common errors while installing Sqoop and working with Sqoop. Many developers and researchers were using Sqoop so as to perform the import/export process and to handle the source data in the relational format. In the current work the connectivity between mysql and sqoop were presented and various commands usage along with the results were presented. The outcome of the work is for each command the possible errors encountered and the corresponding solution is mentioned. The common configuration settings we have to follow so as to handle the Sqoop without any errors is also mentioned

Download Full-text

The File System Recommendations to Reduce the Space and Time Parameters in Hadoop File Storage and Map Reduce Processing of Big Data Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7579.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 353-356

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

File System ◽

Distributed File System ◽

Map Reduce ◽

Space And Time ◽

File Storage ◽

Hadoop Distributed File System ◽

Hadoop Framework

The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).

Download Full-text

Improving downloading performance in hadoop distributed file system

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02060 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2060-2065 ◽

Cited By ~ 4

Author(s):

Ning CAO ◽

Zhong-hai WU ◽

Hong-zhi LIU ◽

Qi-xun ZHANG

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

A Technique For Big Statistics Security Based on Hadoop Distributed File System

SSRN Electronic Journal ◽

10.2139/ssrn.3508526 ◽

2019 ◽

Author(s):

Sindhu D M ◽

DR.Ravikumar G.K ◽

Manu Y.M

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Data protection on hadoop distributed file system by using encryption algorithms: a systematic literature review

Journal of Physics Conference Series ◽

10.1088/1742-6596/1444/1/012012 ◽

2020 ◽

Vol 1444 ◽

pp. 012012

Author(s):

Meisuchi Naisuty ◽

Achmad Nizar Hidayanto ◽

Nabila Clydea Harahap ◽

Ahmad Rosyiq ◽

Agus Suhanto ◽

...

Keyword(s):

Literature Review ◽

Systematic Literature Review ◽

Data Protection ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Encryption Algorithms

Download Full-text

A Study on Security Approaches for Big Data Hadoop Distributed File System

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.8266.8272 ◽

2019 ◽

Vol 14 (22) ◽

pp. 8266-8272

Author(s):

Leelavathi . ◽

M. Elshayeb

Keyword(s):

Big Data ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

The Evolution of the Hadoop Distributed File System

2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA) ◽

10.1109/waina.2018.00065 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stathis Maneas ◽

Bianca Schroeder

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text