A Formal Model of Parallel Execution on Multicore Architectures with Multilevel Caches

Energy Efficiency Evaluation of Parallel Execution of DEVS Models in Multicore Architectures

2020 Winter Simulation Conference (WSC) ◽

10.1109/wsc48552.2020.9384117 ◽

2020 ◽

Author(s):

Guillermo G. Trabes ◽

Veronica Gil Costa ◽

Gabriel A. Wainer

Keyword(s):

Energy Efficiency ◽

Parallel Execution ◽

Efficiency Evaluation ◽

Multicore Architectures ◽

Energy Efficiency Evaluation

Download Full-text

Parallelization of a Commercial Streamline Simulator and Performance on Practical Models

SPE Reservoir Evaluation & Engineering ◽

10.2118/118684-pa ◽

2010 ◽

Vol 13 (03) ◽

pp. 383-390 ◽

Cited By ~ 5

Author(s):

R.P.. P. Batycky ◽

M.. Förster ◽

M.R.. R. Thiele ◽

K.. Stüben

Keyword(s):

Large Scale ◽

Programming Model ◽

Scaling Law ◽

Independent Solution ◽

Parallel Execution ◽

Water Model ◽

Test Machine ◽

Multicore Architectures ◽

Streamline Simulation ◽

Run Time

Summary We present the parallelization of a commercial streamline simulator to multicore architectures based on the OpenMP programming model and its performance on various field examples. This work is a continuation of recent work by Gerritsen et al. (2009) in which a research streamline simulator was extended to parallel execution. We identified that the streamline-transport step represents approximately 40-80% of the total run time. It is exactly this step that is straightforward to parallelize owing to the independent solution of each streamline that is at the heart of streamline simulation. Because we are working with an existing large serial code, we used specialty software to quickly and easily identify variables that required particular handling for implementing the parallel extension. Minimal rewrite to existing code was required to extend the streamline-transport step to OpenMP. As part of this work, we also parallelized additional run-time code, including the gravity-line solver and some simple routines required for constructing the pressure matrix. Overall, the run-time fraction of code parallelized ranged from 0.50 to 0.83, depending on the transport physics being considered. We tested our parallel simulator on a variety of large models including SPE 10, Forties-a UK oil/water model, Judy Creek-a Canadian waterflood/water-alternating-gas (WAG) model, and a South American black-oil model. We noted overall speedup factors from 1.8 to 3.3x for eight threads. In terms of real time, this implies that large-scale streamline simulation models as tested here can be simulated in less than 4 hours. We found speedup results to be reasonable when compared with Amdahl's ideal scaling law. Beyond eight threads, we observed minimal speedups because of memory bandwidth limits on our test machine.

Download Full-text

Parallel Execution of Devs in Shared-memory Multicore Architectures

Spring Simulation Conference (SpringSim 2020) ◽

10.22360/springsim.2020.hpc.005 ◽

2020 ◽

Keyword(s):

Shared Memory ◽

Parallel Execution ◽

Multicore Architectures

Download Full-text

A formal model of data access for multicore architectures with multilevel caches

Science of Computer Programming ◽

10.1016/j.scico.2019.04.003 ◽

2019 ◽

Vol 179 ◽

pp. 24-53 ◽

Cited By ~ 1

Author(s):

Shiji Bijo ◽

Einar Broch Johnsen ◽

Ka I Pun ◽

S. Lizeth Tapia Tarifa

Keyword(s):

Formal Model ◽

Data Access ◽

Multicore Architectures

Download Full-text

Parallel Gaussian elimination of symmetric positive definite band matrices for shared-memory multicore architectures

RAIRO - Operations Research ◽

10.1051/ro/2020013 ◽

2020 ◽

Author(s):

Sirine Marrakchi ◽

Mohamed Jemni

Keyword(s):

Shared Memory ◽

Gaussian Elimination ◽

Positive Definite ◽

Parallel Execution ◽

Optimal Time ◽

Multicore Architectures ◽

Start Time ◽

Band Matrices ◽

Symmetric Positive Definite ◽

High Degree

This study presents a new parallel Gaussian elimination approach for symmetric positive definite band systems. For each task, the appropriate start time and adequate processor are determined. Unnecessary dependencies between tasks are eliminated. Simultaneously, all processors perform their associated tasks with precedence constraints under consideration. Our main goal is to obtain a high degree of parallelism by balancing the load of processors and reducing the total idle and parallel execution times. The theoretical lower bounds for parallel execution time and number of processors required to execute the precedence graph at an optimal time are also computed. The validity of our investigation is confirmed by carrying out several experiments on a shared-memory multicore architecture using OpenMP. Practical results prove the efficiency of the proposed method.

Download Full-text

Static Scheduling with Load Balancing for Solving Triangular Band Linear Systems on Multicore Processors

Fundamenta Informaticae ◽

10.3233/fi-2021-2012 ◽

2021 ◽

Vol 179 (1) ◽

pp. 35-58

Author(s):

Sirine Marrakchi ◽

Mohamed Jemni

Keyword(s):

Linear Systems ◽

Multicore Processors ◽

Parallel Execution ◽

Task Graph ◽

Multicore Architectures ◽

Multicore Processor ◽

Start Time ◽

Static Scheduling ◽

Mathematical Formulas ◽

High Degree

A new approach for solving triangular band linear systems is established in this study to balance the load and obtain a high degree of parallelism. Our investigation consists to attribute both adequate start time and processor to each task and eliminate the useless dependencies which are not used in the parallel solve stage. Thereby, processors execute in parallel their related tasks taking account of the considered precedence constraints. The theoretical lower bounds for parallel execution time and the number of processors required to carry out the task graph in the shortest time are determined. Experimentations are realized on a shared-memory multicore processor. The experimental results are fitted to the values derived from the determined mathematical formulas. The comparison of results obtained by our contribution with those from triangular systems resolution routine belonging to the library PLASMA, Parallel Linear Algebra Software for Multicore Architectures, confirms the efficiency of the proposed approach.

Download Full-text