Efetividade da Política de Posicionamento de Blocos no Balanceamento de Réplicas do HDFS

Mapping Intimacies ◽

10.5753/wtf.2019.7716 ◽

2019 ◽

Author(s):

Rhauani W. Fazul ◽

Patricia Pitthan Barcelos

Keyword(s):

Comparative Study ◽

Large Scale ◽

Data Replication ◽

Distributed File System ◽

Replica Placement ◽

System Behavior ◽

Tolerance Mechanism ◽

Transfer Data ◽

Hadoop Distributed File System ◽

Default Data

The Hadoop Distributed File System (HDFS) is designed to store and transfer data in large scale. To ensure availability and reliability, it uses data replication as a fault tolerance mechanism. However, this strategy can significantly affect replication balancing in the cluster. This paper provides an analysis of the default data replication policy used by HDFS and measures its impacts on the system behavior, while presenting different strategies for cluster balancing and rebalancing. In order to highlight the required requirements for efficient replica placement, a comparative study of the HDFS performance has been conduct considering a variety of factors that may result in cluster imbalance.

Download Full-text

Large-scale simulation of replica placement algorithms for a serverless distributed file system

MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems ◽

10.1109/mascot.2001.948882 ◽

2002 ◽

Cited By ~ 12

Author(s):

J.R. Douceur ◽

R.P. Wattenhofer

Keyword(s):

Large Scale ◽

File System ◽

Distributed File System ◽

Replica Placement ◽

Large Scale Simulation ◽

Placement Algorithms

Download Full-text

Política Customizada de Balanceamento de Réplicas para o HDFS Balancer do Apache Hadoop

10.5753/wtf.2019.7717 ◽

2019 ◽

Author(s):

Rhauani W. Fazul ◽

Patricia Pitthan Barcelos

Keyword(s):

File System ◽

Data Replication ◽

Distributed File System ◽

Apache Hadoop ◽

Hadoop Distributed File System ◽

Fundamental Mechanism ◽

The Way

Data replication is a fundamental mechanism of the Hadoop Distributed File System (HDFS). However, the way data is spread across the cluster directly affects the replication balancing. The HDFS Balancer is a Hadoop integrated tool which can balance the storage load on each machine by moving data between nodes, although its operation does not address the specific needs of applications while performing block rearrangement. This paper proposes a customized balancing policy for HDFS Balancer based on a system of priorities, which can be adapted and configured according to usage demands. The priorities define whether HDFS parameters, or whether cluster topology should be considered during the operation, thus making the balancing more flexible.

Download Full-text

A New Replica Placement Policy for Hadoop Distributed File System

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS) ◽

10.1109/bigdatasecurity-hpsc-ids.2016.30 ◽

2016 ◽

Cited By ~ 15

Author(s):

Wei Dai ◽

Ibrahim Ibrahim ◽

Mostafa Bassiouni

Keyword(s):

File System ◽

Distributed File System ◽

Replica Placement ◽

Hadoop Distributed File System ◽

Placement Policy

Download Full-text

Research on parallel data processing of data mining platform in the background of cloud computing

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0113 ◽

2021 ◽

Vol 30 (1) ◽

pp. 479-486

Author(s):

Lingrui Bu ◽

Hui Zhang ◽

Haiyan Xing ◽

Lijun Wu

Keyword(s):

Data Mining ◽

Data Processing ◽

Parallel Algorithm ◽

Large Scale ◽

File System ◽

Large Data ◽

Distributed File System ◽

Data Set ◽

Traditional Algorithm ◽

Hadoop Distributed File System

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.

Download Full-text

Large-Scale Web Traffic Log Analyzer on Hadoop Distributed File System

Proceedings of The 3rd International Conference on Intelligent Systems and Image Processing 2015 ◽

10.12792/icisip2015.012 ◽

2015 ◽

Author(s):

Choopan Rattanapoka ◽

Prasertsak Tiawongsombat

Keyword(s):

Large Scale ◽

File System ◽

Distributed File System ◽

Web Traffic ◽

Hadoop Distributed File System

Download Full-text

HDFS Security Approaches and Visualization Tracking

Journal of Engineering & Technological Advances ◽

10.35934/segi.v3i1.49 ◽

2018 ◽

Vol 3 (1) ◽

pp. 49-60

Author(s):

M. Elshayeb ◽

◽

Leelavathi Rajamanickam ◽

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

File System ◽

Leading Edge ◽

Distributed File System ◽

Complex Data ◽

Processing Technologies ◽

Big Data Visualization ◽

Hadoop Distributed File System

Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. In order to analyse complex data and to identify patterns it is very important to securely store, manage, and share large amounts of complex data. In recent years an increasing of database size according to the various forms (text, images and videos), in huge volumes and with high velocity, the services issues that use internet and desires big data come to leading edge (data-intensive services), (HDFS) Apache’s Hadoop distributed file system is in progress as outstanding software component for cloud computing joint with integrated pieces such as MapReduce. GoogleMapReduce implemented an open source which is Hadoop, having a distributed file system, present to software programmers the perception of the map and reduce. The research shows the security approaches for Big Data Hadoop distributed file system and the best security solution, also this research will help business by big data visualization which will help in better data analysis. In today’s data-centric world, big-data processing and analytics have become critical to most enterprise and government applications.

Download Full-text

Parallelizing Large-Scale Graph Algorithms Using the Apache Spark-Distributed Memory System

Advances in Wireless Technologies and Telecommunication - Graph Theoretic Approaches for Analyzing Large-Scale Social Networks ◽

10.4018/978-1-5225-2814-2.ch009 ◽

2018 ◽

pp. 146-163

Author(s):

Ahmad Askarian ◽

Rupei Xu ◽

Andras Farago

Keyword(s):

Large Scale ◽

Real Life ◽

Distributed File System ◽

Graph Models ◽

Large Graphs ◽

Graph Representations ◽

Hadoop Distributed File System ◽

Fundamental Research ◽

The Right ◽

Spark Framework

The rapidly emerging area of Social Network Analysis is typically based on graph models. They include directed/undirected graphs, as well as a multitude of random graph representations that reflect the inherent randomness of social networks. A large number of parameters and metrics are derived from these graphs. Overall, this gives rise to two fundamental research/development directions: (1) advancements in models and algorithms, and (2) implementing the algorithms for huge real-life systems. The model and algorithm development part deals with finding the right graph models for various applications, along with algorithms to treat the associated tasks, as well as computing the appropriate parameters and metrics. In this chapter we would like to focus on the second area: on implementing the algorithms for very large graphs. The approach is based on the Spark framework and the GraphX API which runs on top of the Hadoop distributed file system.

Download Full-text

Efficient Data Replication Scheme based on Hadoop Distributed File System

International Journal of Software Engineering and Its Applications ◽

10.14257/ijseia.2015.9.12.16 ◽

2015 ◽

Vol 9 (12) ◽

pp. 177-186 ◽

Cited By ~ 1

Author(s):

Jungha Lee ◽

Jaehwa Chung ◽

Daewon Lee

Keyword(s):

File System ◽

Data Replication ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Replication Scheme ◽

Efficient Data

Download Full-text

3A1-T03 Large-scale database using the Hadoop Distributed File System and RT-Middleware(RT Middleware and Open Systems)

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2014._3a1-t03_1 ◽

2014 ◽

Vol 2014 (0) ◽

pp. _3A1-T03_1-_3A1-T03_2

Author(s):

Isao HARA ◽

Seisho IRIE ◽

Mamoru SEKIYAMA ◽

Tamio TANIKAWA

Keyword(s):

Large Scale ◽

File System ◽

Open Systems ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

EDAS: Efficient Data Access Scheme of Data Replication for Hadoop Distributed File System (HDFS)

March 29-30, 2015 Singapore ◽

10.17758/ur.u0315253 ◽

2015 ◽

Keyword(s):

File System ◽

Data Replication ◽

Data Access ◽

Distributed File System ◽

Access Scheme ◽

Hadoop Distributed File System ◽

Efficient Data

Download Full-text