Simplified Mapreduce Mechanism for Large Scale Data Processing

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

Download Full-text

Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016010104 ◽

2016 ◽

Vol 12 (1) ◽

pp. 49-68 ◽

Cited By ~ 7

Author(s):

Christian Esposito ◽

Massimo Ficco

Keyword(s):

Data Processing ◽

Large Scale ◽

Programming Model ◽

Big Data Analytics ◽

Large Scale Data ◽

Recent Developments ◽

Security And Reliability ◽

Large Scale Data Processing ◽

The Right ◽

Scale Data

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.

Download Full-text

A Survey on MapReduce Implementations

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2016010104 ◽

2016 ◽

Vol 6 (1) ◽

pp. 59-87 ◽

Cited By ~ 2

Author(s):

Amer Al-Badarneh ◽

Amr Mohammad ◽

Salah Harb

Keyword(s):

Large Scale ◽

Programming Model ◽

Data Sets ◽

Mapreduce Framework ◽

Large Scale Data ◽

Parallel Data ◽

Efficiency Performance ◽

Scale Data ◽

Large Scale Data Sets ◽

Open Issues

A distinguished successful platform for parallel data processing MapReduce is attracting a significant momentum from both academia and industry as the volume of data to capture, transform, and analyse grows rapidly. Although MapReduce is used in many applications to analyse large scale data sets, there is still a lot of debate among scientists and researchers on its efficiency, performance, and usability to support more classes of applications. This survey presents a comprehensive review of various implementations of MapReduce framework. Initially the authors give an overview of MapReduce programming model. They then present a broad description of various technical aspects of the most successful implementations of MapReduce framework reported in the literature and discuss their main strengths and weaknesses. Finally, the authors conclude by introducing a comparison between MapReduce implementations and discuss open issues and challenges on enhancing MapReduce.

Download Full-text

Study of Map-Reduce over Hadoop Based Cloud Computing Environment

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.509.175 ◽

2014 ◽

Vol 509 ◽

pp. 175-181

Author(s):

Wu Min Pan ◽

Li Bai Ha

Keyword(s):

Cloud Computing ◽

Data Processing ◽

Large Scale ◽

Programming Model ◽

Sql Server ◽

Map Reduce ◽

Data Set ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Popularity for the term Cloud-Computing has been increasing in recent years. In addition to the SQL technique, Map-Reduce, a programming model that realizes implementing large-scale data processing, has been a hot topic that is widely discussed through many studies. Many real-world tasks such as data processing for search engines can be parallel-implemented through a simple interface with two functions called Map and Reduce. We focus on comparing the performance of the Hadoop implementation of Map-Reduce with SQL Server through simulations. Hadoop can complete the same query faster than SQL Server. On the other hand, some concerned factors are also tested to see whether they would affect the performance for Hadoop or not. In fact more machines included for data processing can make Hadoop achieve a better performance, especially for a large-scale data set.

Download Full-text

Extracting Functional Dependencies in Large Datasets Using MapReduce Model

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014070102 ◽

2014 ◽

Vol 10 (3) ◽

pp. 19-35 ◽

Cited By ~ 8

Author(s):

K. Amshakala ◽

R. Nedunchezhian ◽

M. Rajalakshmi

Keyword(s):

Data Processing ◽

Data Quality ◽

Large Scale ◽

Programming Model ◽

Large Data ◽

Large Datasets ◽

Functional Dependencies ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.

Download Full-text