scholarly journals Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

2018 ◽  
Vol 19 (3) ◽  
pp. 223-244
Author(s):  
Sonia Ikken ◽  
Eric Renault ◽  
Abdelkamel Tari ◽  
Tahar Kechadi

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale.  Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising.  We also show  that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution.

Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2772 ◽  
Author(s):  
Aguinaldo Bezerra ◽  
Ivanovitch Silva ◽  
Luiz Affonso Guedes ◽  
Diego Silva ◽  
Gustavo Leitão ◽  
...  

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.


2018 ◽  
Vol 8 (1) ◽  
pp. 16-35 ◽  
Author(s):  
Mohammadhossein Barkhordari ◽  
Mahdi Niamanesh

When working with a high volume of information that follows an exponential pattern, the authors confront big data. This huge amount of information makes big data retrieval and analytics important issues. There have been many attempts to solve data analytic problems using distributed platforms, but the main problem with the proposed methods is not observing the data locality. In this article, a MapReduce-based method called Hengam is proposed. In this method, data format unification helps nodes to have data independence. The unified format leads to an increase in the information retrieval speed and prevents data exchange betoen nodes. The proposed method was evaluated using data items from an ICT company and the information retrieval time was much better than that of other open-source distributed data warehouse software.


2017 ◽  
Vol 3 (1) ◽  
Author(s):  
Luca M. Ghiringhelli ◽  
Christian Carbogno ◽  
Sergey Levchenko ◽  
Fawzi Mohamed ◽  
Georg Huhs ◽  
...  

Energies ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 1555 ◽  
Author(s):  
Vangelis Marinakis

European buildings are producing a massive amount of data from a wide spectrum of energy-related sources, such as smart meters’ data, sensors and other Internet of things devices, creating new research challenges. In this context, the aim of this paper is to present a high-level data-driven architecture for buildings data exchange, management and real-time processing. This multi-disciplinary big data environment enables the integration of cross-domain data, combined with emerging artificial intelligence algorithms and distributed ledgers technology. Semantically enhanced, interlinked and multilingual repositories of heterogeneous types of data are coupled with a set of visualization, querying and exploration tools, suitable application programming interfaces (APIs) for data exchange, as well as a suite of configurable and ready-to-use analytical components that implement a series of advanced machine learning and deep learning algorithms. The results from the pilot application of the proposed framework are presented and discussed. The data-driven architecture enables reliable and effective policymaking, as well as supports the creation and exploitation of innovative energy efficiency services through the utilization of a wide variety of data, for the effective operation of buildings.


Sign in / Sign up

Export Citation Format

Share Document