Locality/Fairness-Aware Job Scheduling in Distributed Stream Processing Engines

Siwoon Son; Yang-Sae Moon

doi:10.3390/electronics9111857

Locality/Fairness-Aware Job Scheduling in Distributed Stream Processing Engines

Electronics ◽

10.3390/electronics9111857 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1857

Author(s):

Siwoon Son ◽

Yang-Sae Moon

Keyword(s):

Real World ◽

Data Streams ◽

Job Scheduling ◽

Stream Processing ◽

Process Data ◽

Parallel Tasks ◽

Stream Processing Engines ◽

Job Scheduler ◽

Distributed Stream Processing ◽

Apache Storm

Distributed stream processing engines (DSPEs) deploy multiple tasks on distributed servers to process data streams in real time. Many DSPEs have provided locality-aware stream partitioning (LSP) methods to reduce network communication costs. However, an even job scheduler provided by DSPEs deploys tasks far away from each other on the distributed servers, which cannot use the LSP properly. In this paper, we propose a Locality/Fairness-aware job scheduler (L/F job scheduler) that considers locality together to solve problems of the even job scheduler that only considers fairness. First, the L/F job scheduler increases cohesion of contiguous tasks that require message transmissions for the locality. At the same time, it reduces coupling of parallel tasks that do not require message transmissions for the fairness. Next, we connect the contiguous tasks into a stream pipeline and evenly deploy stream pipelines to the distributed servers so that the L/F job scheduler achieves high cohesion and low coupling. Finally, we implement the proposed L/F job scheduler in Apache Storm, a representative DSPE, and evaluate it in both synthetic and real-world workloads. Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the LSP applications and by up to 140.7% even for the non-LSP applications. The L/F job scheduler also improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications.

Download Full-text

A Containerized Approach for Allocating Distributed Stream Queries to Fog Nodes

10.36227/techrxiv.14151650.v1 ◽

2021 ◽

Author(s):

Hamed Hasibi ◽

Saeed Sedighian Kashi

Keyword(s):

Fog Computing ◽

Stream Processing ◽

Stream Data ◽

Process Data ◽

Stream Query Processing ◽

Tremendous Amount ◽

Stream Processing Engines ◽

Iot Devices ◽

Distributed Stream Processing

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.

Download Full-text

A Containerized Approach for Allocating Distributed Stream Queries to Fog Nodes

10.36227/techrxiv.14151650 ◽

2021 ◽

Author(s):

Hamed Hasibi ◽

Saeed Sedighian Kashi

Keyword(s):

Fog Computing ◽

Stream Processing ◽

Stream Data ◽

Process Data ◽

Stream Query Processing ◽

Tremendous Amount ◽

Stream Processing Engines ◽

Iot Devices ◽

Distributed Stream Processing

Download Full-text

Benchmarking Tool for Modern Distributed Stream Processing Engines

2019 International Conference on Information Networking (ICOIN) ◽

10.1109/icoin.2019.8718106 ◽

2019 ◽

Author(s):

Muhammad Hanif ◽

Hyeongdeok Yoon ◽

Choonhwa Lee

Keyword(s):

Stream Processing ◽

Stream Processing Engines ◽

Distributed Stream Processing

Download Full-text

An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110514 ◽

2020 ◽

Vol 11 (5) ◽

Author(s):

Sultan Alshamrani ◽

Quadri Waseem ◽

Abdullah Alharbi ◽

Wael Alosaimi ◽

Hamza Turabieh ◽

...

Keyword(s):

Big Data ◽

Data Streams ◽

Stream Processing ◽

Efficient Approach ◽

Big Data Streams ◽

Distributed Stream Processing

Download Full-text

Stream Processing Tools for Analyzing Objects in Motion Sending High-Volume Location Data

Business Information Systems ◽

10.52825/bis.v1i.41 ◽

2021 ◽

pp. 257-268

Author(s):

Krzysztof Wecel ◽

Marcin Szmydt ◽

Milena Stróżyna

Keyword(s):

Real World ◽

Data Streams ◽

Stream Processing ◽

High Volume ◽

Time Analysis ◽

Processing Technologies ◽

Location Data ◽

Real Time Analysis ◽

Velocity Magnitude ◽

Selection Of

Recently we observe a significant increase in the amount of easily accessible data on transport and mobility. This data is mostly massive streams of high velocity, magnitude, and heterogeneity, which represent a flow of goods, shipments and the movements of fleet. It is therefore necessary to develop a scalable framework and apply tools capable of handling these streams. In the paper we propose an approach for the selection of software for stream processing solutions that may be used in the transportation domain. We provide an overview of potential stream processing technologies, followed by the method for choosing the selected software for real-time analysis of data streams coming from objects in motion. We have selected two solutions: Apache Spark Streaming and Apache Flink, and benchmarked them on a real-world task. We identified the caveats and challenges when it comes to implementation of the solution in practice.

Download Full-text

Property-Based Testing for Spark Streaming

Theory and Practice of Logic Programming ◽

10.1017/s1471068419000012 ◽

2019 ◽

Vol 19 (04) ◽

pp. 574-602 ◽

Cited By ~ 1

Author(s):

A. RIESCO ◽

J. RODRÍGUEZ-HORTALÁ

Keyword(s):

Temporal Logic ◽

Data Streams ◽

Programming Model ◽

Stream Processing ◽

High Volume ◽

Functional Language ◽

Velocity Data ◽

Commodity Hardware ◽

Distributed Stream Processing ◽

New Generation

AbstractStream processing has reached the mainstream in the last years, as a new generation of open-source distributed stream processing systems, designed for scaling horizontally on commodity hardware, has brought the capability for processing high-volume and high-velocity data streams to companies of all sizes. In this work, we propose a combination of temporal logic and property-based testing (PBT) for dealing with the challenges of testing programs that employ this programming model. We formalize our approach in a discrete time temporal logic for finite words, with some additions to improve the expressiveness of properties, which includes timeouts for temporal operators and a binding operator for letters. In particular, we focus on testing Spark Streaming programs written with the Spark API for the functional language Scala, using the PBT library ScalaCheck. For that we add temporal logic operators to a set of new ScalaCheck generators and properties, as part of our testing library sscheck.

Download Full-text