Dynamic Load Aware Scheduler of Map Reduce Tasks for Cloud Environments

Most of the current day applications are data and compute intensive which led to invention of technologies like Hadoop. Hadoop uses Map Reduce framework for parallel processing of big data applications using the computing resources of multiple nodes. Hadoop is designed for cluster environments and has few limitations when executed in cloud environments. Hadoop on cloud has become a common choice due to its easy establishment of infrastructure and pay as you use model. Hadoop performance on cloud infrastructures is affected by the virtualization overhead of cloud environment. The execution times of Hadoop on cloud can be improved if the virtual resources are effectively used to schedule the tasks by studying the resource usage characteristics of the tasks and resource availability of the nodes. The proposed work is to build a dynamic scheduler for Hadoop framework which can make scheduling decision dynamically based on job resource usage and node load. The results of the proposed work indicate an improvement of up to 23% in execution time of the Hadoop Map Reduce applications.

Download Full-text

Cloud Monitoring

Achieving Federated and Self-Manageable Cloud Infrastructures ◽

10.4018/978-1-4666-1631-8.ch006 ◽

2012 ◽

pp. 97-116

Author(s):

Peer Hasselmeyer ◽

Gregory Katsaros ◽

Bastian Koller ◽

Philipp Wieder

Keyword(s):

State Of The Art ◽

The State ◽

Cloud Environment ◽

State Health ◽

Cloud Monitoring ◽

Cloud Environments ◽

And Performance ◽

Cloud Infrastructures ◽

Different Levels ◽

The Way

The management of the entire service landscape comprising a Cloud environment is a complex and challenging venture. There, one task of utmost importance, is the generation and processing of information about the state, health, and performance of the various services and IT components, something which is generally referred to as monitoring. Such information is the foundation for proper assessment and management of the whole Cloud. This chapter pursues two objectives: first, to provide an overview of monitoring in Cloud environments and, second, to propose a solution for interoperable and vendor-independent Cloud monitoring. Along the way, the authors motivate the necessity of monitoring at the different levels of Cloud infrastructures, introduce selected state-of-the-art, and extract requirements for Cloud monitoring. Based on these requirements, the following sections depict a Cloud monitoring solution and describe current developments towards interoperable, open, and extensible Cloud monitoring frameworks.

Download Full-text

CAOS: a tool for OpenStack accounting management

EPJ Web of Conferences ◽

10.1051/epjconf/201921407006 ◽

2019 ◽

Vol 214 ◽

pp. 07006

Author(s):

Paolo Andreetto ◽

Fabrizio Chiarello ◽

Sergio Traldi

Keyword(s):

Capacity Planning ◽

Accounting Information ◽

Resource Usage ◽

Data Retention ◽

Accounting Management ◽

Cloud Environments ◽

Cloud Infrastructures ◽

Gathering Data ◽

Different Levels ◽

The Way

The analysis and understanding of resources utilization in shared infrastructures, such as cloud environments, is crucial in order to provide better performance, administration and capacity planning. The management of resource usage of the OpenStack-based cloud infrastructures hosted at INFN-Padova, the Cloud Area Padovana and the INFN-PADOVA-STACK instance of the EGI Federated cloud, started with the deployment of Ceilometer, the OpenStack component responsible for collecting and managing accounting information. However, by using Ceilometer alone we found some limiting problems related to the way it handles information: among others, the imbalance between storage and data retention requirements, and the complexity in computing custom metrics. In this contribution we present a tool, called CAOS, which we have been implementing to overcome the aforementioned issues. CAOS collects, manages and presents the data concerning resource usage of our OpenStack-based cloud infrastructures. By gathering data from both the Ceilometer service and Open-Stack API, CAOS enables us to track resource usage at different levels (e.g. per project), in such a way that both current and past consumption ofresources can be easily determined, stored and presented.

Download Full-text

A Survey on Implementation of Word-Count with Map Reduce Programming Oriented Model using Hadoop Framework

SSRN Electronic Journal ◽

10.2139/ssrn.3351074 ◽

2019 ◽

Author(s):

Santosh Yadav ◽

Jay Prakash

Keyword(s):

Map Reduce ◽

Word Count ◽

Hadoop Framework

Download Full-text

Performance Improvisation through Resource Usage Optimization in Cloud Environment

Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies - ICTCS '14 ◽

10.1145/2677855.2677900 ◽

2014 ◽

Cited By ~ 1

Author(s):

Hardik A. Chaudhari ◽

Chirag S. Thaker ◽

Jay Patel

Keyword(s):

Resource Usage ◽

Cloud Environment

Download Full-text

Big Data

Security, Privacy, and Forensics Issues in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-9742-1.ch002 ◽

2020 ◽

pp. 24-65

Author(s):

P. Lalitha Surya Kumari

Keyword(s):

Big Data ◽

Life Cycle ◽

Data Collection ◽

Operating Systems ◽

Data Security ◽

Map Reduce ◽

Security Issues ◽

Big Data Applications

This chapter gives information about the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by big data applications. Big data is one area where we can store, extract, and process a large amount of data. All these data are very often unstructured. Using big data, security functions are required to work over the heterogeneous composition of diverse hardware, operating systems, and network domains. A clearly defined security boundary like firewalls and demilitarized zones (DMZs), conventional security solutions, are not effective for big data as it expands with the help of public clouds. This chapter discusses the different concepts like characteristics, risks, life cycle, and data collection of big data, map reduce components, issues and challenges in big data, cloud secure alliance, approaches to solve security issues, introduction of cybercrime, YARN, and Hadoop components.

Download Full-text

An API for Development of User-Defined Scheduling Algorithms in Aneka PaaS Cloud Software

Handbook of Research on Cloud Computing and Big Data Applications in IoT - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-8407-0.ch009 ◽

2019 ◽

pp. 170-187 ◽

Cited By ~ 1

Author(s):

Rajinder Sandhu ◽

Adel Nadjaran Toosi ◽

Rajkumar Buyya

Keyword(s):

Cloud Computing ◽

Real Time ◽

Scheduling Algorithms ◽

Research Area ◽

Programming Models ◽

Hybrid Cloud ◽

Main Research ◽

Cloud Application ◽

Cloud Environments ◽

Cloud Infrastructures

Cloud computing provides resources using multitenant architecture where infrastructure is created from one or more distributed datacenters. Scheduling of applications in cloud infrastructures is one of the main research area in cloud computing. Researchers have developed many scheduling algorithms and evaluated them using simulators such as CloudSim. Their performance needs to be validated in real-time cloud environments to improve their usefulness. Aneka is one of the prominent PaaS software which allows users to develop cloud application using various programming models and underline infrastructure. This chapter presents a scheduling API developed for the Aneka software platform. Users can develop their own scheduling algorithms using this API and integrate it with Aneka to test their scheduling algorithms in real cloud environments. The proposed API provides all the required functionalities to integrate and schedule private, public, or hybrid cloud with the Aneka software.

Download Full-text

Securely Communicating with an Optimal Cloud for Intelligently Enhancing a Cloud's Elasticity

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018040103 ◽

2018 ◽

Vol 14 (2) ◽

pp. 43-58 ◽

Cited By ~ 2

Author(s):

S. Kirthica ◽

Rajeswari Sridhar

Keyword(s):

Success Rate ◽

Real Time ◽

Scaling Up ◽

Turnaround Time ◽

Third Parties ◽

Security Threats ◽

Cloud Environment ◽

Physical Resources ◽

Additional Costs ◽

Cloud Environments

One of the principle features on which cloud environments operate is the scaling up and down of resources based on users' needs, called elasticity. This feature is limited to the cloud's physical resources. This article proposes to enhance the elasticity of a cloud in an intelligent manner by communicating with an optimal external cloud (EC) and borrowing additional resources from it when the cloud runs out of resources. This inter-cloud communication is secured by a model whose structure is similar to the Kerberos protocol. To choose the optimal EC for a particular request of a user, a list of parameters, collectively termed as RePVoCRaD, are enumerated. Once chosen, trust is established with the chosen EC and inter-cloud communication begins. While existing works deal with third parties to establish or secure inter-cloud communication, this work is novel in that there is absence of third parties in the entire process, thereby reducing security threats and additional costs involved. Evaluating this work based on turnaround time and transaction success rate, in a real-time cloud environment, it is seen that the cloud's elasticity is so enhanced that it successfully accommodates its users' additional demands by the fastest means possible.

Download Full-text