scholarly journals Big Data Workflows: Locality-Aware Orchestration Using Software Containers

Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8212
Author(s):  
Andrei-Alin Corodescu ◽  
Nikolay Nikolov ◽  
Akif Quddus Khan ◽  
Ahmet Soylu ◽  
Mihhail Matskin ◽  
...  

The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.

2018 ◽  
Vol 8 (11) ◽  
pp. 2216
Author(s):  
Jiahui Jin ◽  
Qi An ◽  
Wei Zhou ◽  
Jiakai Tang ◽  
Runqun Xiong

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.


2021 ◽  
Vol 12 (2) ◽  
pp. 53-72
Author(s):  
Rojalina Priyadarshini ◽  
Rabindra Kumar Barik ◽  
Harish Chandra Dubey ◽  
Brojo Kishore Mishra

Growing use of wearables within internet of things (IoT) creates ever-increasing multi-modal data from various smart health applications. The enormous volume of data generation creates new challenges in transmission, storage, and processing. There were challenges such as communication latency and data security associated with processing medical big data in cloud backend. Fog computing (FC) is an emerging distributed computing paradigm that solved these problems by leveraging local data processing, storage, filtering, and machine intelligence within an intermediate fog layer that resides between cloud and wearables devices. This paper focuses on doing survey on two major aspects of deploying fog computing for smart and connected health. Firstly, the role of machine learning-based edge intelligence in fog layer for data processing is investigated. A comprehensive analysis is provided during the survey, highlighting the strength and improvements in the existing literature. The paper ends with some open challenges and future research areas in the domain of fog-based healthcare.


2019 ◽  
Vol 5 (1) ◽  
pp. 60-80 ◽  
Author(s):  
Shlomi Dolev ◽  
Patricia Florissi ◽  
Ehud Gudes ◽  
Shantanu Sharma ◽  
Ido Singer

Author(s):  
Roman Čerešňák ◽  
Karol Matiaško ◽  
Adam Dudáš

The growth of big data processing market led to an increase in the overload of computation data centers, change of methods used in storing the data, communication between the computing units and computational time needed to process or edit the data. Methods of distributed or parallel data processing brought new problems related to computations with data which need to be examined. Unlike the conventional cloud services, a tight connection between the data and the computations is one of the main characteristics of the big data services. The computational tasks can be done only if relevant data are available. Three factors, which influence the speed and efficiency of data processing are - data duplicity, data integrity and data security. We are motivated to study the problems related to the growing time needed for data processing by optimizing these three factors in geographically distributed data centers.      


2019 ◽  
Vol 12 (1) ◽  
pp. 42 ◽  
Author(s):  
Andrey I. Vlasov ◽  
Konstantin A. Muraviev ◽  
Alexandra A. Prudius ◽  
Demid A. Uzenkov

Sign in / Sign up

Export Citation Format

Share Document