scholarly journals Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Kyong-Ha Lee ◽  
Woo Lam Kang ◽  
Young-Kyoon Suh

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

Author(s):  
Anushree Raj ◽  
Rio D’Souza

In this era of information age, a huge amount of data generates every moment through various sources. This enormous data is beyond the processing capability of traditional data management system to manage and analyse the data in a specified time span. This huge amount of data refers to Big Data. Big Data faces numerous challenges in various operations on data such as capturing data, data analysis, data searching, data sharing, data filtering etc. HADOOP has showed a big way of various enterprises for big data management. Big data hadoop deals with the implementation of various industry use cases. To master the Apache Hadoop, we need to understand the hadoop eco system and hadoop architecture. In this paper we brief on the Hadoop architecture and hadoop eco system.


Author(s):  
Jaimin N. Undavia ◽  
Atul Patel ◽  
Sheenal Patel

Availability of huge amount of data has opened up a new area and challenge to analyze these data. Analysis of these data become essential for each organization and these analyses may yield some useful information for their future prospectus. To store, manage and analyze such huge amount of data traditional database systems are not adequate and not capable also, so new data term is introduced – “Big Data”. This term refers to huge amount of data which are used for analytical purpose and future prediction or forecasting. Big Data may consist of combination of structured, semi structured or unstructured data and managing such data is a big challenge in current time. Such heterogeneous data is required to maintained in very secured and specific way. In this chapter, we have tried to identify such challenges and issues and also tried to resolve it with specific tools.


Author(s):  
Rupali Ahuja

The data generated today has outgrown the storage as well as computing capabilities of traditional software frameworks. Large volumes of data if aggregated and analyzed properly may provide useful insights to predict human behavior, to increase revenues, get or retain customers, improve operations, combat crime, cure diseases, etc. In conclusion, the results of effective Big Data analysis can be used to provide actionable intelligence for humans, as well as for machine consumption. New tools, techniques, technologies and methods are being developed to store, retrieve, manage, aggregate, correlate and analyze Big Data. Hadoop is a popular software framework for handling Big Data needs. Hadoop provides a distributed framework for processing and storage of large datasets. This chapter discusses in detail the Hadoop framework, its features, applications and popular distributions, and its Storage and Visualization tools.


Sign in / Sign up

Export Citation Format

Share Document