The File System Recommendations to Reduce the Space and Time Parameters in Hadoop File Storage and Map Reduce Processing of Big Data Applications

The study of Hadoop Distributed File System (HDFS) and Map Reduce (MR) are the key aspects of the Hadoop framework. The big data scenarios like Face Book (FB) data processing or the twitter analytics such as storing the tweets and processing the tweets is other scenario of big data which can depends on Hadoop framework to perform the storage and processing through which further analytics can be done. The point here is the usage of space and time in the processing of the above-mentioned huge amounts of the data definitely leads to higher amounts of space and time consumption of the Hadoop framework. The problem here is usage of huge amounts of the space and at the same time the processing time is also high which need to be reduced so as to get the fastest response from the framework. The attempt is important as all the other eco system tools also depends on HDFS and MR so as to perform the data storage and processing of the data and alternative architecture so as to improve the usage of the space and effective utilization of the resources so as to reduce the time requirements of the framework. The outcome of the work is faster data processing and less space utilization of the framework in the processing of MR along with other eco system tools like Hive, Flume, Sqoop and Pig Latin. The work is proposing an alternative framework of the HDFS and MR and the name we are assigning is Unified Space Allocation and Data Processing with Metadata based Distributed File System (USAMDFS).

Download Full-text

Research of Cloud Storage Based on Hadoop Distributed File System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2472 ◽

2014 ◽

Vol 513-517 ◽

pp. 2472-2475

Author(s):

Yong Qi Han ◽

Yun Zhang ◽

Shui Yu

Keyword(s):

Data Storage ◽

File System ◽

Multimedia Data ◽

Distributed File System ◽

Cloud Platform ◽

Computing Technology ◽

Training Video ◽

File Storage ◽

Remote Training ◽

Hadoop Distributed File System

This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.

Download Full-text

HDFS Security Approaches and Visualization Tracking

Journal of Engineering & Technological Advances ◽

10.35934/segi.v3i1.49 ◽

2018 ◽

Vol 3 (1) ◽

pp. 49-60

Author(s):

M. Elshayeb ◽

◽

Leelavathi Rajamanickam ◽

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

File System ◽

Leading Edge ◽

Distributed File System ◽

Complex Data ◽

Processing Technologies ◽

Big Data Visualization ◽

Hadoop Distributed File System

Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. In order to analyse complex data and to identify patterns it is very important to securely store, manage, and share large amounts of complex data. In recent years an increasing of database size according to the various forms (text, images and videos), in huge volumes and with high velocity, the services issues that use internet and desires big data come to leading edge (data-intensive services), (HDFS) Apache’s Hadoop distributed file system is in progress as outstanding software component for cloud computing joint with integrated pieces such as MapReduce. GoogleMapReduce implemented an open source which is Hadoop, having a distributed file system, present to software programmers the perception of the map and reduce. The research shows the security approaches for Big Data Hadoop distributed file system and the best security solution, also this research will help business by big data visualization which will help in better data analysis. In today’s data-centric world, big-data processing and analytics have become critical to most enterprise and government applications.

Download Full-text

Sqoop usage in Hadoop Distributed File System and Observations to Handle Common Errors

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4980.119420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 452-454

Keyword(s):

File System ◽

Distributed File System ◽

Current Work ◽

Map Reduce ◽

The Social ◽

Source Data ◽

Hadoop Distributed File System ◽

Import And Export ◽

The Common ◽

Hadoop Framework

The Hadoop framework provides a way of storing and processing the huge amounts of the data. The social media like Facebook, twitter and amazon uses Hadoop eco system tools so as to store the data in Hadoop distributed file system and to process the data Map Reduce (MR). The current work describes the usage of Sqoop in the process of import and export with HDFS. The work involves various possible import/export commands supported by the tool Sqoop in the eco system of Hadoop. The importance of the work is to highlight the common errors while installing Sqoop and working with Sqoop. Many developers and researchers were using Sqoop so as to perform the import/export process and to handle the source data in the relational format. In the current work the connectivity between mysql and sqoop were presented and various commands usage along with the results were presented. The outcome of the work is for each command the possible errors encountered and the corresponding solution is mentioned. The common configuration settings we have to follow so as to handle the Sqoop without any errors is also mentioned

Download Full-text

A Distribution of Nodes in Big Data using Hadoop Open Source System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8459.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 106-110

Keyword(s):

Big Data ◽

Open Source ◽

Data Storage ◽

High Speed ◽

File System ◽

Fault Tolerant ◽

Heart Beat ◽

Distributed File System ◽

Process Data ◽

Hadoop Distributed File System

Apache Hadoop is an free open source Java framework under Apache Software Foundation. It provides storage of large amount of data efficiently with low costing. Hadoop has two main core components one is HDFS (Hadoop Distributed File System) and second Map Reduce. It is basically a file system and has capability of high fault-tolerant and while deploying supports less cost hardware. It. provides the high speed admittance to the relevance data. The Hadoop architecture is based on cluster, which consist of two nodes named as Data -Node and Name-Node which perform the internal activity known as heart beat to process data storage on distributed file system and Map reducing is performed internally to show the clustering of distributed data on localhost of ssh serverwebsite. Large quantity of data is needed to store in distributed file structure, for this Hadoop has played important role. Maintaining the large volume storage, making data duplicity for providing security and recovery of big data for its analysis and prediction.

Download Full-text

A Study on Security Approaches for Big Data Hadoop Distributed File System

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.8266.8272 ◽

2019 ◽

Vol 14 (22) ◽

pp. 8266-8272

Author(s):

Leelavathi . ◽

M. Elshayeb

Keyword(s):

Big Data ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text

Small Sized File Storage Problems in Hadoop Distributed File System

2019 International Conference on Smart Systems and Inventive Technology (ICSSIT) ◽

10.1109/icssit46314.2019.8987739 ◽

2019 ◽

Author(s):

Neeta Alange ◽

Anjali Mathur

Keyword(s):

File System ◽

Distributed File System ◽

File Storage ◽

Hadoop Distributed File System

Download Full-text

Big Data Clustering and Hadoop Distributed File System Architecture

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8256 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3824-3829

Author(s):

Deepak Ahlawat ◽

Deepali Gupta

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Early Stage ◽

Large Data ◽

Data File ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Data Files ◽

Technological World

Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.

Download Full-text

Ensure Security for Mapreduce-Hadoop Distributed File System using Encryption Method Over Big Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8883.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 733-740

Keyword(s):

Big Data ◽

File System ◽

Data Encryption ◽

Previous Method ◽

Distributed File System ◽

Sensitive Data ◽

Encryption And Decryption ◽

Hadoop Distributed File System ◽

Encryption Method ◽

Mapreduce Paradigm

Big data security is the most focused research issue nowadays due to their increased size and the complexity involved in handling of large volume of data. It is more difficult to ensure security on big data handling due to its characteristics 4V’s. With the aim of ensuring security and flexible encryption computation on big data with reduced computation overhead in this work, framework with encryption (MRS) is presented with Hadoop Distributed file System (HDFS). Development of the MapReduce paradigm needs networked attached storage in addition to parallel processing. For storing as well as handling big data, HDFS are extensively utilized. This proposed method creates a framework for obtaining data from client and after that examining the received data, excerpt privacy policy and after that find the sensitive data. The security is guaranteed in this framework using key rotation algorithm which is an efficient encryption and decryption technique for safeguarding the data over big data. Data encryption is a means to protect data in storage with containing a key encryption saved and accessible to reuse the data while required. The outcome shows that the research method guarantees greater security for enormous amount of data and gives beneficial info to related clients. Therefore the outcome concluded that the proposed method is superior to the previous method. Finally, this research can be applied effectively on the various domains such as health care domains, educational domains, social networking domains, etc which require more security and increased volume of data.

Download Full-text

Peer Review #1 of "Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective (v0.3)"

10.7287/peerj-cs.259v0.3/reviews/1 ◽

2020 ◽

Author(s):

P Derbeko

Keyword(s):

Big Data ◽

Peer Review ◽

File System ◽

Encryption Algorithm ◽

Distributed File System ◽

System Perspective ◽

Hadoop Distributed File System

Download Full-text