scholarly journals A Survey on Accelerated Mapreduce for Hadoop

2017 ◽  
Vol 10 (3) ◽  
pp. 597-602
Author(s):  
Jyotindra Tiwari ◽  
Dr. Mahesh Pawar ◽  
Dr. Anjajana Pandey

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/reduce acceleration and energy efficient computation in quick time.

2021 ◽  
Vol 11 (18) ◽  
pp. 8651
Author(s):  
Vladimir Belov ◽  
Alexander N. Kosenkov ◽  
Evgeny Nikulchev

One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts.


Author(s):  
Santosh Jankatti ◽  
Raghavendra B. K. ◽  
Raghavendra S. ◽  
Meenakshi Meenakshi

Big data is the biggest challenges as we need huge processing power system and good algorithms to make an decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.


Author(s):  
Jayshree Ghorpade-Aher ◽  
Reena Pagare ◽  
Anita Thengade ◽  
Santaji Ghorpade ◽  
Manik Kadam

Today is the Computer Era, where the data is increasing exponentially. Managing such a huge data is a challenging job. Under the explosive increase of global data, the term of big data is mainly used to describe enormous datasets. The state-of-the-art of big data is discussed here. The discussions aim to provide a comprehensive overview and big-picture to readers of this existing research area. This chapter discusses the different models and technologies for Big Data; It also introduces Big data Storage. Big data has been a potential topic in various research fields and areas like healthcare, public sector, retail, manufacturing personal data, etc.


2015 ◽  
Vol 12 (6) ◽  
pp. 106-115 ◽  
Author(s):  
Hongbing Cheng ◽  
Chunming Rong ◽  
Kai Hwang ◽  
Weihong Wang ◽  
Yanyan Li

Sign in / Sign up

Export Citation Format

Share Document