Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

Download Full-text

A Review on Hadoop Eco System for Big Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195172 ◽

2019 ◽

pp. 343-348 ◽

Cited By ~ 3

Author(s):

Anushree Raj ◽

Rio D’Souza

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Management ◽

Information Age ◽

Analysis Data ◽

Time Span ◽

Data Management System ◽

Data Filtering ◽

Huge Amount ◽

Apache Hadoop

In this era of information age, a huge amount of data generates every moment through various sources. This enormous data is beyond the processing capability of traditional data management system to manage and analyse the data in a specified time span. This huge amount of data refers to Big Data. Big Data faces numerous challenges in various operations on data such as capturing data, data analysis, data searching, data sharing, data filtering etc. HADOOP has showed a big way of various enterprises for big data management. Big data hadoop deals with the implementation of various industry use cases. To master the Apache Hadoop, we need to understand the hadoop eco system and hadoop architecture. In this paper we brief on the Hadoop architecture and hadoop eco system.

Download Full-text

Research and Practice of Big Data Analysis Process Based on Hadoop Framework

2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) ◽

10.1109/itnec.2019.8729522 ◽

2019 ◽

Author(s):

Hui Jiang

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Research And Practice ◽

Analysis Process ◽

Hadoop Framework

Download Full-text

A Novel K-Means Clustering-Based FPGA Parallel Processing in Big Data Analysis

Applied Mathematics & Information Sciences ◽

10.18576/amis/130510 ◽

2019 ◽

Vol 13 (5) ◽

pp. 777-782

Author(s):

Castro, R. S., Pushpalakshmi

Keyword(s):

Big Data ◽

Data Analysis ◽

Parallel Processing ◽

Big Data Analysis

Download Full-text

Analyzing and scripting indian election strategies using big data via Apache Hadoop framework

2016 5th International Conference on Wireless Networks and Embedded Systems (WECON) ◽

10.1109/wecon.2016.7993431 ◽

2016 ◽

Author(s):

Gagandeep Jagdev ◽

Amandeep Kaur

Keyword(s):

Big Data ◽

Apache Hadoop ◽

Election Strategies ◽

Hadoop Framework

Download Full-text

Security Issues and Challenges Related to Big Data

Big Data Management and the Internet of Things for Improved Health Systems - Advances in Healthcare Information Systems and Administration ◽

10.4018/978-1-5225-5222-2.ch006 ◽

2018 ◽

pp. 86-101

Author(s):

Jaimin N. Undavia ◽

Atul Patel ◽

Sheenal Patel

Keyword(s):

Big Data ◽

Data Analysis ◽

Database Systems ◽

Heterogeneous Data ◽

Unstructured Data ◽

Huge Amount ◽

Current Time ◽

Security Issues ◽

Future Prediction ◽

Data Term

Availability of huge amount of data has opened up a new area and challenge to analyze these data. Analysis of these data become essential for each organization and these analyses may yield some useful information for their future prospectus. To store, manage and analyze such huge amount of data traditional database systems are not adequate and not capable also, so new data term is introduced – “Big Data”. This term refers to huge amount of data which are used for analytical purpose and future prediction or forecasting. Big Data may consist of combination of structured, semi structured or unstructured data and managing such data is a big challenge in current time. Such heterogeneous data is required to maintained in very secured and specific way. In this chapter, we have tried to identify such challenges and issues and also tried to resolve it with specific tools.

Download Full-text

Big Data Analysis Using Hadoop Framework and Machine Learning as Decision Support System (DSS) (Case Study: Knowledge of Islam Mindset)

2018 6th International Conference on Cyber and IT Service Management (CITSM) ◽

10.1109/citsm.2018.8674354 ◽

2018 ◽

Author(s):

Nurhayati ◽

Busman ◽

Victor Amrizal

Keyword(s):

Machine Learning ◽

Big Data ◽

Decision Support ◽

Data Analysis ◽

Decision Support System ◽

Support System ◽

Big Data Analysis ◽

Hadoop Framework

Download Full-text

Scrutinizing and Executing the Positive Aspects of Big Data in World of Sports via Apache Hadoop Framework

International Journal of Research Studies in Computer Science and Engineering ◽

10.20431/2349-4859.0404005 ◽

2017 ◽

Vol 4 (4) ◽

Keyword(s):

Big Data ◽

Apache Hadoop ◽

Hadoop Framework

Download Full-text

Big Data Analysis: Recommendation System with Hadoop Framework

2015 IEEE International Conference on Computational Intelligence & Communication Technology ◽

10.1109/cict.2015.86 ◽

2015 ◽

Cited By ~ 30

Author(s):

Jai Prakash Verma ◽

Bankim Patel ◽

Atul Patel

Keyword(s):

Big Data ◽

Data Analysis ◽

Recommendation System ◽

Big Data Analysis ◽

Hadoop Framework

Download Full-text

Hadoop Framework for Handling Big Data Needs

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch004 ◽

2018 ◽

pp. 101-122

Author(s):

Rupali Ahuja

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Large Datasets ◽

Software Frameworks ◽

Distributed Framework ◽

Visualization Tools ◽

Processing And Storage ◽

And Storage ◽

Hadoop Framework

The data generated today has outgrown the storage as well as computing capabilities of traditional software frameworks. Large volumes of data if aggregated and analyzed properly may provide useful insights to predict human behavior, to increase revenues, get or retain customers, improve operations, combat crime, cure diseases, etc. In conclusion, the results of effective Big Data analysis can be used to provide actionable intelligence for humans, as well as for machine consumption. New tools, techniques, technologies and methods are being developed to store, retrieve, manage, aggregate, correlate and analyze Big Data. Hadoop is a popular software framework for handling Big Data needs. Hadoop provides a distributed framework for processing and storage of large datasets. This chapter discusses in detail the Hadoop framework, its features, applications and popular distributions, and its Storage and Visualization tools.

Download Full-text

A fast and scalable FPGA-based parallel processing architecture for K-means clustering for big data analysis

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) ◽

10.1109/pacrim.2017.8121905 ◽

2017 ◽

Cited By ~ 4

Author(s):

Ramprasad Raghavan ◽

Darshika G. Perera

Keyword(s):

Big Data ◽

Data Analysis ◽

Parallel Processing ◽

Big Data Analysis ◽

Processing Architecture

Download Full-text