scholarly journals Simplified Mapreduce Mechanism for Large Scale Data Processing

2018 ◽  
Vol 7 (3.8) ◽  
pp. 16
Author(s):  
Md Tahsir Ahmed Munna ◽  
Shaikh Muhammad Allayear ◽  
Mirza Mohtashim Alam ◽  
Sheikh Shah Mohammad Motiur Rahman ◽  
Md Samadur Rahman ◽  
...  

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.  

2016 ◽  
Vol 12 (1) ◽  
pp. 49-68 ◽  
Author(s):  
Christian Esposito ◽  
Massimo Ficco

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.


2016 ◽  
Vol 6 (1) ◽  
pp. 59-87 ◽  
Author(s):  
Amer Al-Badarneh ◽  
Amr Mohammad ◽  
Salah Harb

A distinguished successful platform for parallel data processing MapReduce is attracting a significant momentum from both academia and industry as the volume of data to capture, transform, and analyse grows rapidly. Although MapReduce is used in many applications to analyse large scale data sets, there is still a lot of debate among scientists and researchers on its efficiency, performance, and usability to support more classes of applications. This survey presents a comprehensive review of various implementations of MapReduce framework. Initially the authors give an overview of MapReduce programming model. They then present a broad description of various technical aspects of the most successful implementations of MapReduce framework reported in the literature and discuss their main strengths and weaknesses. Finally, the authors conclude by introducing a comparison between MapReduce implementations and discuss open issues and challenges on enhancing MapReduce.


2014 ◽  
Vol 509 ◽  
pp. 175-181
Author(s):  
Wu Min Pan ◽  
Li Bai Ha

Popularity for the term Cloud-Computing has been increasing in recent years. In addition to the SQL technique, Map-Reduce, a programming model that realizes implementing large-scale data processing, has been a hot topic that is widely discussed through many studies. Many real-world tasks such as data processing for search engines can be parallel-implemented through a simple interface with two functions called Map and Reduce. We focus on comparing the performance of the Hadoop implementation of Map-Reduce with SQL Server through simulations. Hadoop can complete the same query faster than SQL Server. On the other hand, some concerned factors are also tested to see whether they would affect the performance for Hadoop or not. In fact more machines included for data processing can make Hadoop achieve a better performance, especially for a large-scale data set.


2014 ◽  
Vol 10 (3) ◽  
pp. 19-35 ◽  
Author(s):  
K. Amshakala ◽  
R. Nedunchezhian ◽  
M. Rajalakshmi

Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.


2008 ◽  
Vol 25 (5) ◽  
pp. 287-300 ◽  
Author(s):  
B. Martin ◽  
A. Al‐Shabibi ◽  
S.M. Batraneanu ◽  
Ciobotaru ◽  
G.L. Darlea ◽  
...  

2014 ◽  
Vol 26 (6) ◽  
pp. 1316-1331 ◽  
Author(s):  
Gang Chen ◽  
Tianlei Hu ◽  
Dawei Jiang ◽  
Peng Lu ◽  
Kian-Lee Tan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document