Intra-Tile Parallelization for Two-Level Perfectly Nested Loops With Non-Uniform Dependences

2020 ◽  
Author(s):  
Zahra Abdi Reyhan ◽  
Shahriar Lotfi ◽  
Ayaz Isazadeh ◽  
Jaber Karimpour

Abstract Most important scientific and engineering applications have complex computations or large data. In all these applications, a huge amount of time is consumed by nested loops. Therefore, loops are the main source of the parallelization of scientific and engineering programs. Many parallelizing compilers focus on parallelization of nested loops with uniform dependences, and parallelization of nested loops with non-uniform dependences has not been extensively investigated. This paper addresses the problem of parallelizing two-level nested loops with non-uniform dependences. The aim is to minimize the execution time by improving the load balancing and minimizing the inter-processor communication. We propose a new tiling algorithm, k-StepIntraTiling, using bin packing problem to minimize the execution time. We demonstrate the effectiveness of the proposed method in several experiments. Simulation and experimental results show that the algorithm effectively reduces the total execution time of several benchmarks compared to the other tiling methods.

2021 ◽  
Vol 11 (3) ◽  
pp. 72-91
Author(s):  
Priyanka H. ◽  
Mary Cherian

Cloud computing has become more prominent, and it is used in large data centers. Distribution of well-organized resources (bandwidth, CPU, and memory) is the major problem in the data centers. The genetically enhanced shuffling frog leaping algorithm (GESFLA) framework is proposed to select the optimal virtual machines to schedule the tasks and allocate them in physical machines (PMs). The proposed GESFLA-based resource allocation technique is useful in minimizing the wastage of resource usage and also minimizes the power consumption of the data center. The proposed GESFL algorithm is compared with task-based particle swarm optimization (TBPSO) for efficiency. The experimental results show the excellence of GESFLA over TBPSO in terms of resource usage ratio, migration time, and total execution time. The proposed GESFLA framework reduces the energy consumption of data center up to 79%, migration time by 67%, and CPU utilization is improved by 9% for Planet Lab workload traces. For the random workload, the execution time is minimized by 71%, transfer time is reduced up to 99%, and the CPU consumption is improved by 17% when compared to TBPSO.


2008 ◽  
Vol 45 (04) ◽  
pp. 922-939 ◽  
Author(s):  
Devavrat Shah ◽  
John N. Tsitsiklis

We study the best achievable performance (in terms of the average queue size and delay) in a stochastic and dynamic version of the bin-packing problem. Items arrive to a queue according to a Poisson process with rate 2ρ, where ρ ∈ (0, 1). The item sizes are independent and identically distributed (i.i.d.) with a uniform distribution in [0, 1]. At each time unit, a single unit-size bin is available and can receive any of the queued items, as long as their total size does not exceed 1. Coffman and Stolyar (1999) and Gamarnik (2004) have established that there exist packing policies under which the average queue size is finite for every ρ ∈ (0, 1). In this paper we study the precise scaling of the average queue size, as a function of ρ, with emphasis on the critical regime where ρ approaches 1. Standard results on the probabilistic (but static) bin-packing problem can be readily applied to produce policies under which the queue size scales as O(h 2), where h = 1 / (1 - ρ), which raises the question of whether this is the best possible. We establish that the average queue size scales as Ω(hlogh), under any policy. Furthermore, we provide an easily implementable policy, which packs at most two items per bin. Under that policy, the average queue size scales as O(hlog3/2 h), which is nearly optimal. On the other hand, if we impose the additional requirement that any two items packed together must have near-complementary sizes (in a sense to be made precise), we show that the average queue size must scale as Θ(h 2).


2008 ◽  
Vol 45 (4) ◽  
pp. 922-939 ◽  
Author(s):  
Devavrat Shah ◽  
John N. Tsitsiklis

We study the best achievable performance (in terms of the average queue size and delay) in a stochastic and dynamic version of the bin-packing problem. Items arrive to a queue according to a Poisson process with rate 2ρ, where ρ ∈ (0, 1). The item sizes are independent and identically distributed (i.i.d.) with a uniform distribution in [0, 1]. At each time unit, a single unit-size bin is available and can receive any of the queued items, as long as their total size does not exceed 1. Coffman and Stolyar (1999) and Gamarnik (2004) have established that there exist packing policies under which the average queue size is finite for every ρ ∈ (0, 1). In this paper we study the precise scaling of the average queue size, as a function of ρ, with emphasis on the critical regime where ρ approaches 1. Standard results on the probabilistic (but static) bin-packing problem can be readily applied to produce policies under which the queue size scales as O(h2), where h = 1 / (1 - ρ), which raises the question of whether this is the best possible. We establish that the average queue size scales as Ω(hlogh), under any policy. Furthermore, we provide an easily implementable policy, which packs at most two items per bin. Under that policy, the average queue size scales as O(hlog3/2h), which is nearly optimal. On the other hand, if we impose the additional requirement that any two items packed together must have near-complementary sizes (in a sense to be made precise), we show that the average queue size must scale as Θ(h2).


1988 ◽  
Vol 11 (1) ◽  
pp. 1-19
Author(s):  
Andrzej Rowicki

The purpose of the paper is to consider an algorithm for preemptive scheduling for two-processor systems with identical processors. Computations submitted to the systems are composed of dependent tasks with arbitrary execution times and contain no loops and have only one output. We assume that preemptions times are completely unconstrained, and preemptions consume no time. Moreover, the algorithm determines the total execution time of the computation. It has been proved that this algorithm is optimal, that is, the total execution time of the computation (schedule length) is minimized.


Sign in / Sign up

Export Citation Format

Share Document