scholarly journals Locality/Fairness-Aware Job Scheduling in Distributed Stream Processing Engines

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1857
Author(s):  
Siwoon Son ◽  
Yang-Sae Moon

Distributed stream processing engines (DSPEs) deploy multiple tasks on distributed servers to process data streams in real time. Many DSPEs have provided locality-aware stream partitioning (LSP) methods to reduce network communication costs. However, an even job scheduler provided by DSPEs deploys tasks far away from each other on the distributed servers, which cannot use the LSP properly. In this paper, we propose a Locality/Fairness-aware job scheduler (L/F job scheduler) that considers locality together to solve problems of the even job scheduler that only considers fairness. First, the L/F job scheduler increases cohesion of contiguous tasks that require message transmissions for the locality. At the same time, it reduces coupling of parallel tasks that do not require message transmissions for the fairness. Next, we connect the contiguous tasks into a stream pipeline and evenly deploy stream pipelines to the distributed servers so that the L/F job scheduler achieves high cohesion and low coupling. Finally, we implement the proposed L/F job scheduler in Apache Storm, a representative DSPE, and evaluate it in both synthetic and real-world workloads. Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the LSP applications and by up to 140.7% even for the non-LSP applications. The L/F job scheduler also improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications.

2021 ◽  
Author(s):  
Hamed Hasibi ◽  
Saeed Sedighian Kashi

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.


2021 ◽  
Author(s):  
Hamed Hasibi ◽  
Saeed Sedighian Kashi

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.


Author(s):  
Sultan Alshamrani ◽  
Quadri Waseem ◽  
Abdullah Alharbi ◽  
Wael Alosaimi ◽  
Hamza Turabieh ◽  
...  

2021 ◽  
pp. 257-268
Author(s):  
Krzysztof Wecel ◽  
Marcin Szmydt ◽  
Milena Stróżyna

Recently we observe a significant increase in the amount of easily accessible data on transport and mobility. This data is mostly massive streams of high velocity, magnitude, and heterogeneity, which represent a flow of goods, shipments and the movements of fleet. It is therefore necessary to develop a scalable framework and apply tools capable of handling these streams. In the paper we propose an approach for the selection of software for stream processing solutions that may be used in the transportation domain. We provide an overview of potential stream processing technologies, followed by the method for choosing the selected software for real-time analysis of data streams coming from objects in motion. We have selected two solutions: Apache Spark Streaming and Apache Flink, and benchmarked them on a real-world task. We identified the caveats and challenges when it comes to implementation of the solution in practice.


2019 ◽  
Vol 19 (04) ◽  
pp. 574-602 ◽  
Author(s):  
A. RIESCO ◽  
J. RODRÍGUEZ-HORTALÁ

AbstractStream processing has reached the mainstream in the last years, as a new generation of open-source distributed stream processing systems, designed for scaling horizontally on commodity hardware, has brought the capability for processing high-volume and high-velocity data streams to companies of all sizes. In this work, we propose a combination of temporal logic and property-based testing (PBT) for dealing with the challenges of testing programs that employ this programming model. We formalize our approach in a discrete time temporal logic for finite words, with some additions to improve the expressiveness of properties, which includes timeouts for temporal operators and a binding operator for letters. In particular, we focus on testing Spark Streaming programs written with the Spark API for the functional language Scala, using the PBT library ScalaCheck. For that we add temporal logic operators to a set of new ScalaCheck generators and properties, as part of our testing library sscheck.


Sign in / Sign up

Export Citation Format

Share Document