SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).

Download Full-text

Cost-efficient enactment of stream processing topologies

PeerJ Computer Science ◽

10.7717/peerj-cs.141 ◽

2017 ◽

Vol 3 ◽

pp. e141 ◽

Cited By ~ 6

Author(s):

Christoph Hochreiner ◽

Michael Vögler ◽

Stefan Schulte ◽

Schahram Dustdar

Keyword(s):

Virtual Machines ◽

Service Level Agreement ◽

Stream Processing ◽

Service Level ◽

Resource Provisioning ◽

Streaming Data ◽

Software Systems ◽

Continuous Increase ◽

Data Volume ◽

Cost Efficient

The continuous increase of unbound streaming data poses several challenges to established data stream processing engines. One of the most important challenges is the cost-efficient enactment of stream processing topologies under changing data volume. These data volume pose different loads to stream processing systems whose resource provisioning needs to be continuously updated at runtime. First approaches already allow for resource provisioning on the level of virtual machines (VMs), but this only allows for coarse resource provisioning strategies. Based on current advances and benefits for containerized software systems, we have designed a cost-efficient resource provisioning approach and integrated it into the runtime of the Vienna ecosystem for elastic stream processing. Our resource provisioning approach aims to maximize the resource usage for VMs obtained from cloud providers. This strategy only releases processing capabilities at the end of the VMs minimal leasing duration instead of releasing them eagerly as soon as possible as it is the case for threshold-based approaches. This strategy allows us to improve the service level agreement compliance by up to 25% and a reduction for the operational cost of up to 36%.

Download Full-text

Towards autoscaling of Apache Flink jobs

Acta Universitatis Sapientiae Informatica ◽

10.2478/ausi-2021-0003 ◽

2021 ◽

Vol 13 (1) ◽

pp. 39-59

Author(s):

Balázs Varga ◽

Márton Balassi ◽

Attila Kiss

Keyword(s):

Data Stream ◽

Service Level Agreement ◽

Stream Processing ◽

Service Level ◽

Automatic Scaling ◽

The Past ◽

Simple Scaling ◽

State Size ◽

Distributed Stream Processing ◽

Processing Engine

Abstract Data stream processing has been gaining attention in the past decade. Apache Flink is an open-source distributed stream processing engine that is able to process a large amount of data in real time with low latency. Computations are distributed among a cluster of nodes. Currently, provisioning the appropriate amount of cloud resources must be done manually ahead of time. A dynamically varying workload may exceed the capacity of the cluster, or leave resources underutilized. In our paper, we describe an architecture that enables the automatic scaling of Flink jobs on Kubernetes based on custom metrics, and describe a simple scaling policy. We also measure the e ects of state size and target parallelism on the duration of the scaling operation, which must be considered when designing an autoscaling policy, so that the Flink job respects a Service Level Agreement.

Download Full-text

Green Cloud Software Engineering for Big Data Processing

Sustainability ◽

10.3390/su12219255 ◽

2020 ◽

Vol 12 (21) ◽

pp. 9255

Author(s):

Madhubala Ganesan ◽

Ah-Lian Kor ◽

Colin Pattinson ◽

Eric Rondeau

Keyword(s):

Big Data ◽

High Performance ◽

Data Centers ◽

Research Work ◽

Service Level Agreement ◽

Big Data Analytics ◽

Service Level ◽

Cloud Infrastructure ◽

Communication Performance ◽

Vm Consolidation

Internet of Things (IoT) coupled with big data analytics is emerging as the core of smart and sustainable systems which bolsters economic, environmental and social sustainability. Cloud-based data centers provide high performance computing power to analyze voluminous IoT data to provide invaluable insights to support decision making. However, multifarious servers in data centers appear to be the black hole of superfluous energy consumption that contributes to 23% of the global carbon dioxide (CO2) emissions in ICT (Information and Communication Technology) industry. IoT-related energy research focuses on low-power sensors and enhanced machine-to-machine communication performance. To date, cloud-based data centers still face energy–related challenges which are detrimental to the environment. Virtual machine (VM) consolidation is a well-known approach to affect energy-efficient cloud infrastructures. Although several research works demonstrate positive results for VM consolidation in simulated environments, there is a gap for investigations on real, physical cloud infrastructure for big data workloads. This research work addresses the gap of conducting real physical cloud infrastructure-based experiments. The primary goal of setting up a real physical cloud infrastructure is for the evaluation of dynamic VM consolidation approaches which include integrated algorithms from existing relevant research. An open source VM consolidation framework, Openstack NEAT is adopted and experiments are conducted on a Multi-node Openstack Cloud with Apache Spark as the big data platform. Open sourced Openstack has been deployed because it enables rapid innovation, and boosts scalability as well as resource utilization. Additionally, this research work investigates the performance based on service level agreement (SLA) metrics and energy usage of compute hosts. Relevant results concerning the best performing combination of algorithms are presented and discussed.

Download Full-text

Information technology. Cloud computing. Service level agreement (SLA) framework

10.3403/30316174 ◽

2016 ◽

Keyword(s):

Information Technology ◽

Cloud Computing ◽

Service Level Agreement ◽

Service Level ◽

Cloud Computing Service

Download Full-text

Distance Aware VM allocation process to minimize energy consumption in cloud computing

Recent Patents on Computer Science ◽

10.2174/2213275912666191023143709 ◽

2019 ◽

Vol 12 ◽

Author(s):

Gurpreet Singh ◽

Manish Mahajan ◽

Rajni Mohana

Keyword(s):

Resource Allocation ◽

Cloud Computing ◽

Energy Consumption ◽

Virtual Machine ◽

Virtual Machines ◽

Research Work ◽

Service Level Agreement ◽

Service Level ◽

Physical Machine ◽

Allocation Process

BACKGROUND: Cloud computing is considered as an on-demand service resource with the applications towards data center on pay per user basis. For allocating the resources appropriately for the satisfaction of user needs, an effective and reliable resource allocation method is required. Because of the enhanced user demand, the allocation of resources has now considered as a complex and challenging task when a physical machine is overloaded, Virtual Machines share its load by utilizing the physical machine resources. Previous studies lack in energy consumption and time management while keeping the Virtual Machine at the different server in turned on state. AIM AND OBJECTIVE: The main aim of this research work is to propose an effective resource allocation scheme for allocating the Virtual Machine from an ad hoc sub server with Virtual Machines. EXECUTION MODEL: The execution of the research has been carried out into two sections, initially, the location of Virtual Machines and Physical Machine with the server has been taken place and subsequently, the cross-validation of allocation is addressed. For the sorting of Virtual Machines, Modified Best Fit Decreasing algorithm is used and Multi-Machine Job Scheduling is used while the placement process of jobs to an appropriate host. Artificial Neural Network as a classifier, has allocated jobs to the hosts. Measures, viz. Service Level Agreement violation and energy consumption are considered and fruitful results have been obtained with a 37.7 of reduction in energy consumption and 15% improvement in Service Level Agreement violation.

Download Full-text

An integrated heuristic and mathematical modelling method to optimize vehicle maintenance schedule under single dead-end track parking and service level agreement

Computers & Operations Research ◽

10.1016/j.cor.2021.105261 ◽

2021 ◽

pp. 105261

Author(s):

Murat Elhüseyni ◽

Ali Tamer Ünal

Keyword(s):

Mathematical Modelling ◽

Service Level Agreement ◽

Service Level ◽

Maintenance Schedule ◽

Modelling Method ◽

Dead End

Download Full-text

Internet of Things for Mental Health: Open Issues in Data Acquisition, Self-Organization, Service Level Agreement, and Identity Management

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18031327 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1327

Author(s):

Leonardo J. Gutierrez ◽

Kashif Rabbani ◽

Oluwashina Joseph Ajayi ◽

Samson Kahsay Gebresilassie ◽

Joseph Rafferty ◽

...

Keyword(s):

Mental Health ◽

Internet Of Things ◽

Data Acquisition ◽

Identity Management ◽

Service Level Agreement ◽

Service Level ◽

Self Organization ◽

Future Research ◽

Study Results ◽

Open Issues

The increase of mental illness cases around the world can be described as an urgent and serious global health threat. Around 500 million people suffer from mental disorders, among which depression, schizophrenia, and dementia are the most prevalent. Revolutionary technological paradigms such as the Internet of Things (IoT) provide us with new capabilities to detect, assess, and care for patients early. This paper comprehensively survey works done at the intersection between IoT and mental health disorders. We evaluate multiple computational platforms, methods and devices, as well as study results and potential open issues for the effective use of IoT systems in mental health. We particularly elaborate on relevant open challenges in the use of existing IoT solutions for mental health care, which can be relevant given the potential impairments in some mental health patients such as data acquisition issues, lack of self-organization of devices and service level agreement, and security, privacy and consent issues, among others. We aim at opening the conversation for future research in this rather emerging area by outlining possible new paths based on the results and conclusions of this work.

Download Full-text