Analysis of hadoop MapReduce scheduling in heterogeneous environment

The advent of social networking and internet of things (IoT) has resulted in exponential growth of data in the last few years. This, in turn, has increased the need to process and analyze such data for optimal decision making. In order to achieve better results, there is an emergence of newly-built architectures for parallel processing. Hadoop MapReduce (MR) is a programming model that is considered as one of the most powerful computation tools for processing the data on a given cluster of commodity nodes. However, the management of clusters along with various quality requirements necessitates the use of efficient MR scheduling. The chapter discusses the classification of MR scheduling algorithms based on their applicability with required parameters of quality of service (QoS). After classification, a detailed study of MR schedulers has been presented along with their comparison on various parameters.

A MapReduce scheduling algorithm for time constraints in heterogeneous environment

2014 10th International Conference on Natural Computation (ICNC) ◽

10.1109/icnc.2014.6975992 ◽

2014 ◽

Author(s):

Tan Deng ◽

Kenli Li

Keyword(s):

Scheduling Algorithm ◽

Time Constraints ◽

Jargon of Hadoop MapReduce scheduling techniques: a scientific categorization

The Knowledge Engineering Review ◽

10.1017/s0269888918000371 ◽

2019 ◽

Vol 34 ◽

Author(s):

Muhammad Hanif ◽

Choonhwa Lee

Keyword(s):

Critical Role ◽

Data Locality ◽

Research Community ◽

Process Data ◽

Data Set ◽

Apache Hadoop ◽

Hadoop Mapreduce ◽

Operating Environments ◽

Commodity Clusters ◽

Abstract Recently, valuable knowledge that can be retrieved from a huge volume of datasets (called Big Data) set in motion the development of frameworks to process data based on parallel and distributed computing, including Apache Hadoop, Facebook Corona, and Microsoft Dryad. Apache Hadoop is an open source implementation of Google MapReduce that attracted strong attention from the research community both in academia and industry. Hadoop MapReduce scheduling algorithms play a critical role in the management of large commodity clusters, controlling QoS requirements by supervising users, jobs, and tasks execution. Hadoop MapReduce comprises three schedulers: FIFO, Fair, and Capacity. However, the research community has developed new optimizations to consider advances and dynamic changes in hardware and operating environments. Numerous efforts have been made in the literature to address issues of network congestion, straggling, data locality, heterogeneity, resource under-utilization, and skew mitigation in Hadoop scheduling. Recently, the volume of research published in journals and conferences about Hadoop scheduling has consistently increased, which makes it difficult for researchers to grasp the overall view of research and areas that require further investigation. A scientific literature review has been conducted in this study to assess preceding research contributions to the Apache Hadoop scheduling mechanism. We classify and quantify the main issues addressed in the literature based on their jargon and areas addressed. Moreover, we explain and discuss the various challenges and open issue aspects in Hadoop scheduling optimizations.

2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) ◽

Hadoop MapReduce scheduling paradigms

10.1109/icccbda.2017.7951906 ◽

2017 ◽

Cited By ~ 3

Author(s):

Roger Johannessen ◽

Anis Yazidi ◽

Boning Feng

Keyword(s):

Hadoop Mapreduce ◽

SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment

2010 10th IEEE International Conference on Computer and Information Technology ◽

10.1109/cit.2010.458 ◽

2010 ◽

Cited By ~ 97

Author(s):

Quan Chen ◽

Daqiang Zhang ◽

Minyi Guo ◽

Qianni Deng ◽

Song Guo

Keyword(s):

Scheduling Algorithm ◽

Mapreduce Scheduling ◽

Self Adaptive

A Comprehensive View of Hadoop MapReduce Scheduling Algorithms

International Journal of Computer Networks and Communications Security ◽

10.47277/ijcncs/2(9)5 ◽

2014 ◽

Vol 1 (2013) ◽

Keyword(s):

Scheduling Algorithms ◽

Hadoop Mapreduce ◽

Comprehensive View ◽

Chaotization and Synchronization of Rotifer Dynamics in a Heterogeneous Environment

PsycEXTRA Dataset ◽

10.1037/e667262007-001 ◽

2006 ◽

Author(s):

Maria Gonik ◽

Alexander Medvinsky

Keyword(s):

Heterogeneous Environment

DISTRIBUTED PROCESSING OF LARGE VOLUMES OF TRANSACTIONAL DATA

Naukovyi visnyk Donetskoho natsionalnoho tekhnichnoho universytetu ◽

10.31474/2415-7902-2020-1(4)-2(5)-27-36 ◽

2020 ◽

pp. 27-36

Author(s):

O. Dmytriieva ◽

◽

D. Nikulin

Keyword(s):

Distributed Processing ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Transactional Data

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark

Network Selection in Wireless Heterogeneous Environment Based on Available Bandwidth Estimation

Recent Patents on Computer Science ◽

10.2174/2213275912666191018112959 ◽

2019 ◽

Vol 12 ◽

Author(s):

Kiran Ahuja ◽

Brahmjit Singh ◽

Rajesh Khanna

Keyword(s):

Real Time ◽

Estimation Error ◽

Network Selection ◽

Bandwidth Estimation ◽

Bootstrap Approximation ◽

Available Bandwidth ◽

Internet Connection ◽

Available Bandwidth Estimation ◽

Selection Of

Background: With the availability of multiple options in wireless network simultaneously, Always Best Connected (ABC) requires dynamic selection of the best network and access technologies. Objective: In this paper, a novel dynamic access network selection algorithm based on the real time is proposed. The available bandwidth (ABW) of each network is required to be estimated to solve the network selection problem. Method: Proposed algorithm estimates available bandwidth by taking averages, peaks, low points and bootstrap approximation for network selection. It monitors real-time internet connection and resolves the selection issue in internet connection. The proposed algorithm is capable of adapting to prevailing network conditions in heterogeneous environment of 2G, 3G and WLAN networks without user intervention. It is implemented in temporal and spatial domains to check its robustness. Estimation error, overhead, estimation time with the varying size of traffic and reliability are used as the performance metrics. Results: Through numerical results, it is shown that the proposed algorithm’s ABW estimation based on bootstrap approximation gives improved performance in terms of estimation error (less than 20%), overhead (varies from 0.03% to 83%) and reliability (approx. 99%) with respect to existing techniques. Conclusion: Our proposed methodology of network selection criterion estimates the available bandwidth by taking averages, peaks, and low points and bootstrap approximation method (standard deviation) for the selection of network in the wireless heterogeneous environment. It monitors real-time internet connection and resolves internet connections selection issue. All the real-time usage and test results demonstrate the productivity and adequacy of available bandwidth estimation with bootstrap approximation as a practical solution for consistent correspondence among heterogeneous wireless networks by precise network selection for multimedia services.

Probing of mortality rate by staying alive: The growth‐reproduction trade‐off in a spatially heterogeneous environment

Functional Ecology ◽

10.1111/1365-2435.13442 ◽

2019 ◽

Vol 33 (12) ◽

pp. 2327-2337 ◽

Cited By ~ 2

Author(s):

Anna Ejsmond ◽

Jan Kozłowski ◽

Maciej J. Ejsmond

Keyword(s):

Mortality Rate ◽

Trade Off