scholarly journals Execution Time Reduction in Function Oriented Scientific Workflows

2021 ◽  
Author(s):  
Ali Al-Haboobi ◽  
Gabor Kecskemeti

Scientific workflows have been an increasingly important research area of distributed systems (such as cloud computing). Researchers have shown an increased interest in the automated processing scientific applications such as workflows. Recently, Function as a Service (FaaS) has emerged as a novel distributed systems platform for processing non-interactive applications. FaaS has limitations in resource use (e.g., CPU and RAM) as well as state management. In spite of these, initial studies have already demonstrated using FaaS for processing scientific workflows. DEWE v3 executes workflows in this fashion, but it often suffers from duplicate data transfers while using FaaS. This behaviour is due to the handling of intermediate data dependencies after and before each function invocation. These data dependencies could fill the temporary storage of the function environment. Our approach alters the job dispatch algorithm of DEWE v3 to reduce data dependency transfers. The proposed algorithm schedules jobs with precedence requirements to primarily run in the same function invocation. We evaluate our proposed algorithm and the original algorithm with small- and large-scale Montage workflows. Our results show that the improved system can reduce the total workflow execution time of scientific workflows over DEWE v3 by about 10\% when using AWS Lambda.

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-25
Author(s):  
Xiao-Yan Gao ◽  
Radhya Sahal ◽  
Gui-Xiu Chen ◽  
Mohammed H. Khafagy ◽  
Fatma A. Omara

Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 423
Author(s):  
Márk Szalay ◽  
Péter Mátray ◽  
László Toka

The stateless cloud-native design improves the elasticity and reliability of applications running in the cloud. The design decouples the life-cycle of application states from that of application instances; states are written to and read from cloud databases, and deployed close to the application code to ensure low latency bounds on state access. However, the scalability of applications brings the well-known limitations of distributed databases, in which the states are stored. In this paper, we propose a full-fledged state layer that supports the stateless cloud application design. In order to minimize the inter-host communication due to state externalization, we propose, on the one hand, a system design jointly with a data placement algorithm that places functions’ states across the hosts of a data center. On the other hand, we design a dynamic replication module that decides the proper number of copies for each state to ensure a sweet spot in short state-access time and low network traffic. We evaluate the proposed methods across realistic scenarios. We show that our solution yields state-access delays close to the optimal, and ensures fast replica placement decisions in large-scale settings.


2007 ◽  
Vol 41 (2) ◽  
pp. 83-88
Author(s):  
Flavio P. Junqueira ◽  
Vassilis Plachouras ◽  
Fabrizio Silvestri ◽  
Ivana Podnar

2019 ◽  
Vol 17 (06) ◽  
pp. 947-975 ◽  
Author(s):  
Lei Shi

We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.


2013 ◽  
Vol 710 ◽  
pp. 217-220 ◽  
Author(s):  
Fei Wang ◽  
Lei Feng ◽  
Meng Ran Tang ◽  
Ji Yuan Li ◽  
Qing Guo Tang

Synthetic nanomaterials have the disadvantages of large-scale investment, high energy consumption, complex production process and heavy environmental load. Mineral nanomaterials such as sepiolite group mineral nanomaterials are characterized by small size effect, quantum size effect and surface effect. Water treatment application of sepiolite group mineral nanomaterials has become an active research area and showed good development and application prospects. Based on the above reasons, this paper systematically summarizes the water treatment application of sepiolite group mineral nanomaterials, and development trend related to water treatment application of sepiolite group mineral nanomaterials were also proposed.


Author(s):  
Wenfeng Zheng ◽  
Xiaolu Li ◽  
Lirong Yin ◽  
Zhengtong Yin ◽  
Bo Yang ◽  
...  

Due to the growing frequency of earthquakes, safeties of human lives and properties are facing serious threats. However, the research in the field of spatial-temporal distribution of earthquake is quite a few. In this paper, we use wavelet model to analyze the spatial-temporal distribution of earthquakes. Because the spatial-temporal distribution of earthquake activity is closely related to the distribution of the earthquake fault zone, we analyze large-scale earthquake clusters by selecting the Eurasia seismic belt and the surrounding region as the research area. From the perspective of the time domain, the results show that the seismic energy of the earthquake fault zone presences compact support or similar compact support distribution, suggesting that the seismic zone exists a relatively quiet period and active stage. This indicate that the seismic zone is periodical. The period of strong earthquakes above normal and less than normal is different by time changes. The cycles of earthquakes are different due to different regions and different geological and geographical environment.


Sign in / Sign up

Export Citation Format

Share Document