parallel speedup
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Jakob P. Pettersen ◽  
Eivind Almaas

AbstractBackgroundDifferential co-expression network analysis has become an important tool to gain understanding of biological phenotypes and diseases. The CSD algorithm is a method to generate differential co-expression networks by comparing gene co-expressions from two different conditions. Each of the gene pairs is assigned conserved (C), specific (S) and differentiated (D) scores based on the co-expression of the gene pair between the two conditions. The result of the procedure is a network where the nodes are genes and the links are the gene pairs with the highest C-, S-, and D-scores. However, the existing CSD-implementations suffer from poor computational performance, difficult user procedures and lack of documentation.ResultsWe created the R-package csdR aimed at reaching good performance together with ease of use, sufficient documentation, and with the ability to play well with other tools for data analysis. csdR was benchmarked on a realistic dataset with 20, 645 genes. After verifying that the chosen number of iterations gave sufficient robustness, we tested the performance against the two existing CSD implementations. csdR was superior in performance to one of the implementations, whereas the other did not run. Our implementation can utilize multiple processing cores. However, we were unable to achieve more than ∼ 2.7 parallel speedup with saturation reached at about 10 cores.ConclusionsThe results suggest that csdR is a useful tool for differential co-expression analysis and is able to generate robust results within a workday on datasets of realistic sizes when run on a workstation or compute server.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1330
Author(s):  
Junjie Zhang ◽  
Lukas Razik ◽  
Sigurd Hofsmo Jakobsen ◽  
Salvatore D’Arco ◽  
Andrea Benigni

In this paper we introduce an approach to accelerate many-scenario (i.e., hundreds to thousands) power system simulations which is based on a highly scalable and flexible open-source software environment. In this approach, the parallel execution of simulations follows the single program, multiple data (SPMD) paradigm, where the dynamic simulation program is executed in parallel and takes different inputs to generate different scenarios. The power system is modeled using an existing Modelica library and compiled to a simulation executable using the OpenModelica Compiler. Furthermore, the parallel simulation is performed with the aid of a message-passing interface (MPI) and the approach includes dynamic workload balancing. Finally, benchmarks with the simulation environment are performed on high-performance computing (HPC) clusters with four test cases. The results show high scalability and a considerable parallel speedup of the proposed approach in the simulation of all scenarios.


2021 ◽  
Vol 5 (2) ◽  
pp. 62-77
Author(s):  
Sesha Kalyur ◽  
Nagaraja G.S

Although several automated Parallel Conversion solutions are available, very few have attempted, to provide proper estimates of the available Inherent Parallelism and expected Parallel Speedup. CALIPER which is the outcome of this research work is a parallel performance estimation technology that can fill this void. High level language structures such as Functions, Loops, Conditions, etc which ease program development, can be a hindrance for effective performance analysis. We refer to these program structures as the Program Shape. As a preparatory step, CALIPER attempts to remove these shape related hindrances, an activity we refer to as Program Shape Flattening. Programs are also characterized by dependences that exist between different instructions and impose an upper limit on the parallel conversion gains. For parallel estimation, we first group instructions that share dependences, and add them to a class we refer to as Dependence Class or Parallel Class. While instructions belonging to a class run sequentially, the classes themselves run in parallel. Parallel runtime, is now the runtime of the class that runs the longest. We report performance estimates of parallel conversion as two metrics. The inherent parallelism in the program is reported, as Maximum Available Parallelism (MAP) and the speedup after conversion as Speedup After Parallelization (SAP).


2017 ◽  
Vol 59 ◽  
pp. 351-435 ◽  
Author(s):  
Lars Otten ◽  
Rina Dechter

We present a parallel AND/OR Branch-and-Bound scheme that uses the power of a computational grid to push the boundaries of feasibility for combinatorial optimization. Two variants of the scheme are described, one of which aims to use machine learning techniques for parallel load balancing. In-depth analysis identifies two inherent sources of parallel search space redundancies that, together with general parallel execution overhead, can impede parallelization and render the problem far from embarrassingly parallel. We conduct extensive empirical evaluation on hundreds of CPUs, the first of its kind, with overall positive results. In a significant number of cases parallel speedup is close to the theoretical maximum and we are able to solve many very complex problem instances orders of magnitude faster than before; yet analysis of certain results also serves to demonstrate the inherent limitations of the approach due to the aforementioned redundancies.


2017 ◽  
Vol 21 (4) ◽  
pp. 1039-1064 ◽  
Author(s):  
Tony W. H. Sheu ◽  
S. Z. Wang ◽  
J. H. Li ◽  
Matthew R. Smith

AbstractIn this study an explicit Finite Difference Method (FDM) based scheme is developed to solve the Maxwell's equations in time domain for a lossless medium. This manuscript focuses on two unique aspects – the three dimensional time-accurate discretization of the hyperbolic system of Maxwell equations in three-point non-staggered grid stencil and it's application to parallel computing through the use of Graphics Processing Units (GPU). The proposed temporal scheme is symplectic, thus permitting conservation of all Hamiltonians in the Maxwell equation. Moreover, to enable accurate predictions over large time frames, a phase velocity preserving scheme is developed for treatment of the spatial derivative terms. As a result, the chosen time increment and grid spacing can be optimally coupled. An additional theoretical investigation into this pairing is also shown. Finally, the application of the proposed scheme to parallel computing using one Nvidia K20 Tesla GPU card is demonstrated. For the benchmarks performed, the parallel speedup when compared to a single core of an Intel i7-4820K CPU is approximately 190x.


Sign in / Sign up

Export Citation Format

Share Document