Execution Time Reduction in Function Oriented Scientific Workflows

Acta Cybernetica ◽

10.14232/actacyb.288489 ◽

2021 ◽

Author(s):

Ali Al-Haboobi ◽

Gabor Kecskemeti

Keyword(s):

Distributed Systems ◽

Execution Time ◽

Large Scale ◽

Research Area ◽

Scientific Workflows ◽

Data Dependencies ◽

Original Algorithm ◽

State Management ◽

Intermediate Data ◽

Aws Lambda

Scientific workflows have been an increasingly important research area of distributed systems (such as cloud computing). Researchers have shown an increased interest in the automated processing scientific applications such as workflows. Recently, Function as a Service (FaaS) has emerged as a novel distributed systems platform for processing non-interactive applications. FaaS has limitations in resource use (e.g., CPU and RAM) as well as state management. In spite of these, initial studies have already demonstrated using FaaS for processing scientific workflows. DEWE v3 executes workflows in this fashion, but it often suffers from duplicate data transfers while using FaaS. This behaviour is due to the handling of intermediate data dependencies after and before each function invocation. These data dependencies could fill the temporary storage of the function environment. Our approach alters the job dispatch algorithm of DEWE v3 to reduce data dependency transfers. The proposed algorithm schedules jobs with precedence requirements to primarily run in the same function invocation. We evaluate our proposed algorithm and the original algorithm with small- and large-scale Montage workflows. Our results show that the improved system can reduce the total workflow execution time of scientific workflows over DEWE v3 by about 10\% when using AWS Lambda.

Download Full-text

Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink

Complexity ◽

10.1155/2020/6617149 ◽

2020 ◽

Vol 2020 ◽

pp. 1-25

Author(s):

Xiao-Yan Gao ◽

Radhya Sahal ◽

Gui-Xiu Chen ◽

Mohammed H. Khafagy ◽

Fatma A. Omara

Keyword(s):

Big Data ◽

Execution Time ◽

Large Scale ◽

Query Execution ◽

Multiple Queries ◽

Intermediate Data ◽

Large Scale Data ◽

Join Queries ◽

Multiquery Optimization ◽

Data Granularity

Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.

Download Full-text

An Integrated Specification and Verification Environment for Component-Based Architectures of Large-Scale Distributed Systems

10.21236/ada501823 ◽

2009 ◽

Cited By ~ 1

Author(s):

John Hatcliff ◽

Torben Amtoft ◽

Anindya Banerjee

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Specification And Verification

Download Full-text

State Management for Cloud-Native Applications

Electronics ◽

10.3390/electronics10040423 ◽

2021 ◽

Vol 10 (4) ◽

pp. 423

Author(s):

Márk Szalay ◽

Péter Mátray ◽

László Toka

Keyword(s):

Large Scale ◽

Distributed Databases ◽

Access Time ◽

Replica Placement ◽

State Management ◽

Placement Decisions ◽

Dynamic Replication ◽

Cloud Databases ◽

Placement Algorithm ◽

The One

The stateless cloud-native design improves the elasticity and reliability of applications running in the cloud. The design decouples the life-cycle of application states from that of application instances; states are written to and read from cloud databases, and deployed close to the application code to ensure low latency bounds on state access. However, the scalability of applications brings the well-known limitations of distributed databases, in which the states are stored. In this paper, we propose a full-fledged state layer that supports the stateless cloud application design. In order to minimize the inter-host communication due to state externalization, we propose, on the one hand, a system design jointly with a data placement algorithm that places functions’ states across the hosts of a data center. On the other hand, we design a dynamic replication module that decides the proper number of copies for each state to ensure a sweet spot in short state-access time and low network traffic. We evaluate the proposed methods across realistic scenarios. We show that our solution yields state-access delays close to the optimal, and ensures fast replica placement decisions in large-scale settings.

Download Full-text

Workshop on large-scale distributed systems for information retrieval

ACM SIGIR Forum ◽

10.1145/1328964.1328979 ◽

2007 ◽

Vol 41 (2) ◽

pp. 83-88

Author(s):

Flavio P. Junqueira ◽

Vassilis Plachouras ◽

Fabrizio Silvestri ◽

Ivana Podnar

Keyword(s):

Information Retrieval ◽

Distributed Systems ◽

Large Scale

Download Full-text

Distributed learning with indefinite kernels

Analysis and Applications ◽

10.1142/s021953051850032x ◽

2019 ◽

Vol 17 (06) ◽

pp. 947-975 ◽

Cited By ~ 2

Author(s):

Lei Shi

Keyword(s):

Large Scale ◽

Substantial Reduction ◽

Computation Time ◽

Distributed Learning ◽

Rates Of Convergence ◽

Regression Problem ◽

Data Set ◽

Regularization Scheme ◽

Original Algorithm ◽

Indefinite Kernel

We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.

Download Full-text

Advances in Water Treatment Application of Sepiolite Mineral Materials

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.710.217 ◽

2013 ◽

Vol 710 ◽

pp. 217-220 ◽

Cited By ~ 1

Author(s):

Fei Wang ◽

Lei Feng ◽

Meng Ran Tang ◽

Ji Yuan Li ◽

Qing Guo Tang

Keyword(s):

Water Treatment ◽

Size Effect ◽

Large Scale ◽

Surface Effect ◽

Development Trend ◽

High Energy ◽

Research Area ◽

Group Mineral ◽

Active Research ◽

Treatment Application

Synthetic nanomaterials have the disadvantages of large-scale investment, high energy consumption, complex production process and heavy environmental load. Mineral nanomaterials such as sepiolite group mineral nanomaterials are characterized by small size effect, quantum size effect and surface effect. Water treatment application of sepiolite group mineral nanomaterials has become an active research area and showed good development and application prospects. Based on the above reasons, this paper systematically summarizes the water treatment application of sepiolite group mineral nanomaterials, and development trend related to water treatment application of sepiolite group mineral nanomaterials were also proposed.

Download Full-text

Measuring large-scale distributed systems: case of BitTorrent Mainline DHT

IEEE P2P 2013 Proceedings ◽

10.1109/p2p.2013.6688697 ◽

2013 ◽

Cited By ~ 23

Author(s):

Liang Wang ◽

Jussi Kangasharju

Keyword(s):

Distributed Systems ◽

Large Scale

Download Full-text

Towards Scalable Simulation of Large Scale Distributed Systems

2009 International Conference on Network-Based Information Systems ◽

10.1109/nbis.2009.46 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ciprian Dobre ◽

Florin Pop ◽

Valentin Cristea

Keyword(s):

Distributed Systems ◽

Large Scale

Download Full-text

Wavelet analysis of the temporal-spatial distribution in the Eurasia seismic belt

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691317500187 ◽

2017 ◽

Vol 15 (03) ◽

pp. 1750018 ◽

Cited By ~ 4

Author(s):

Wenfeng Zheng ◽

Xiaolu Li ◽

Lirong Yin ◽

Zhengtong Yin ◽

Bo Yang ◽

...

Keyword(s):

Fault Zone ◽

Compact Support ◽

Large Scale ◽

Temporal Distribution ◽

Seismic Energy ◽

Research Area ◽

Earthquake Activity ◽

Seismic Zone ◽

Seismic Belt ◽

Earthquake Fault

Due to the growing frequency of earthquakes, safeties of human lives and properties are facing serious threats. However, the research in the field of spatial-temporal distribution of earthquake is quite a few. In this paper, we use wavelet model to analyze the spatial-temporal distribution of earthquakes. Because the spatial-temporal distribution of earthquake activity is closely related to the distribution of the earthquake fault zone, we analyze large-scale earthquake clusters by selecting the Eurasia seismic belt and the surrounding region as the research area. From the perspective of the time domain, the results show that the seismic energy of the earthquake fault zone presences compact support or similar compact support distribution, suggesting that the seismic zone exists a relatively quiet period and active stage. This indicate that the seismic zone is periodical. The period of strong earthquakes above normal and less than normal is different by time changes. The cycles of earthquakes are different due to different regions and different geological and geographical environment.

Download Full-text

Decentralized adaptive replica location mechanism in large-scale distributed systems

Proceedings of the 8th International Scientific and Practical Conference of Students, Post-graduates and Young Scientists. Modern Technique and Technologies. MTT'2002 (Cat. No.02EX550) ◽

10.1109/pdcat.2003.1236402 ◽

2004 ◽

Author(s):

Dongsheng Li ◽

Xicheng Lu ◽

Yijie Wang ◽

Kai Lu ◽

Nong Xiao

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Replica Location

Download Full-text