Hadoop History and Architecture

As the name indicates, this chapter explains the evolution of Hadoop. Doug Cutting started a text search library called Lucene. After joining Apache Software Foundation, he modified it into a web crawler called Apache Nutch. Then Google File System was taken as reference and modified as Nutch Distributed File System. Then Google's MapReduce features were also integrated and Hadoop was framed. The whole path from Lucene to Apache Hadoop is illustrated in this chapter. Also, the different versions of Hadoop are explained. The procedure to download the software is explained. The mechanism to verify the downloaded software is shown. Then the architecture of Hadoop is detailed. The Hadoop cluster is a set of commodity machines grouped together. The arrangement of Hadoop machines in different racks is shown. After reading this chapter, the reader will understand how Hadoop has evolved and its entire architecture.

2019 ◽  
Author(s):  
Rhauani W. Fazul ◽  
Patricia Pitthan Barcelos

Data replication is a fundamental mechanism of the Hadoop Distributed File System (HDFS). However, the way data is spread across the cluster directly affects the replication balancing. The HDFS Balancer is a Hadoop integrated tool which can balance the storage load on each machine by moving data between nodes, although its operation does not address the specific needs of applications while performing block rearrangement. This paper proposes a customized balancing policy for HDFS Balancer based on a system of priorities, which can be adapted and configured according to usage demands. The priorities define whether HDFS parameters, or whether cluster topology should be considered during the operation, thus making the balancing more flexible.


Author(s):  
Mariam J. AlKandari ◽  
Huda F. Al Rasheedi ◽  
Ayed A. Salman

Abstract—Cloud computing has been the trending model for storing, accessing and modifying the data over the Internet in the recent years. Rising use of the cloud has generated a new concept related to the cloud which is cloud forensics. Cloud forensics can be defined as investigating for evidence over the cloud, so it can be viewed as a combination of both cloud computing and digital forensics. Many issues of applying forensics in the cloud have been addressed. Isolating the location of the incident has become an essential part of forensic process. This is done to ensure that evidence will not be modified or changed.  Isolating an instant in the cloud computing has become even more challenging, due to the nature of the cloud environment. In the cloud, the same storage or virtual machine have been used by many users. Hence, the evidence is most likely will be overwritten and lost. The proposed solution in this paper is to isolate a cloud instance. This can be achieved by marking the instant that reside in the servers as "Under Investigation". To do so, cloud file system must be studied. One of the well-known file systems used in the cloud is Apache Hadoop Distributed File System (HDFS). Thus, in this paper the methodology used for isolating a cloud instance would be based on the HDFS architecture. Keywords: cloud computing; digital forensics; cloud forensics


2014 ◽  
Vol 36 (5) ◽  
pp. 1047-1064 ◽  
Author(s):  
Bin LIAO ◽  
Jiong YU ◽  
Tao ZHANG ◽  
Xing-Yao YANG

2010 ◽  
Vol 33 (10) ◽  
pp. 1873-1880 ◽  
Author(s):  
Chun-Cong XU ◽  
Xiao-Meng HUANG ◽  
Nuo WU ◽  
Ning-Wei SUN ◽  
Guang-Wen YANG

2010 ◽  
Vol 30 (8) ◽  
pp. 2060-2065 ◽  
Author(s):  
Ning CAO ◽  
Zhong-hai WU ◽  
Hong-zhi LIU ◽  
Qi-xun ZHANG

2020 ◽  
Vol 1444 ◽  
pp. 012012
Author(s):  
Meisuchi Naisuty ◽  
Achmad Nizar Hidayanto ◽  
Nabila Clydea Harahap ◽  
Ahmad Rosyiq ◽  
Agus Suhanto ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document