Efetividade da Política de Posicionamento de Blocos no Balanceamento de Réplicas do HDFS
The Hadoop Distributed File System (HDFS) is designed to store and transfer data in large scale. To ensure availability and reliability, it uses data replication as a fault tolerance mechanism. However, this strategy can significantly affect replication balancing in the cluster. This paper provides an analysis of the default data replication policy used by HDFS and measures its impacts on the system behavior, while presenting different strategies for cluster balancing and rebalancing. In order to highlight the required requirements for efficient replica placement, a comparative study of the HDFS performance has been conduct considering a variety of factors that may result in cluster imbalance.