speedup factor Latest Research Papers

The way developers implement their algorithms and how these implementations behave on modern CPUs are governed by the design and organization of these. The vectorization units (SIMD) are among the few CPUs’ parts that can and must be explicitly controlled. In the HPC community, the x86 CPUs and their vectorization instruction sets were de-facto the standard for decades. Each new release of an instruction set was usually a doubling of the vector length coupled with new operations. Each generation was pushing for adapting and improving previous implementations. The release of the ARM scalable vector extension (SVE) changed things radically for several reasons. First, we expect ARM processors to equip many supercomputers in the next years. Second, SVE’s interface is different in several aspects from the x86 extensions as it provides different instructions, uses a predicate to control most operations, and has a vector size that is only known at execution time. Therefore, using SVE opens new challenges on how to adapt algorithms including the ones that are already well-optimized on x86. In this paper, we port a hybrid sort based on the well-known Quicksort and Bitonic-sort algorithms. We use a Bitonic sort to process small partitions/arrays and a vectorized partitioning implementation to divide the partitions. We explain how we use the predicates and how we manage the non-static vector size. We also explain how we efficiently implement the sorting kernels. Our approach only needs an array of O(log N) for the recursive calls in the partitioning phase, both in the sequential and in the parallel case. We test the performance of our approach on a modern ARMv8.2 (A64FX) CPU and assess the different layers of our implementation by sorting/partitioning integers, double floating-point numbers, and key/value pairs of integers. Our results show that our approach is faster than the GNU C++ sort algorithm by a speedup factor of 4 on average.

Download Full-text

A Deep Graph Network–Enhanced Sampling Approach to Efficiently Explore the Space of Reduced Representations of Proteins

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.637396 ◽

2021 ◽

Vol 8 ◽

Author(s):

Federico Errica ◽

Marco Giulini ◽

Davide Bacciu ◽

Roberto Menichetti ◽

Alessio Micheli ◽

...

Keyword(s):

Md Simulations ◽

Computer Architectures ◽

Learning Approach ◽

Enhanced Sampling ◽

Algorithmic Approach ◽

Reduced Representation ◽

Structured Input ◽

Reduced Representations ◽

Speedup Factor ◽

Automated Methods

The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless development of computer architectures and algorithms. The consequent explosion in the number and extent of MD trajectories induces the need for automated methods to rationalize the raw data and make quantitative sense of them. Recently, an algorithmic approach was introduced by some of us to identify the subset of a protein’s atoms, or mapping, that enables the most informative description of the system. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to such simplification; albeit relatively straightforward, this calculation can be time-consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. We rely on Deep Graph Networks, which provide extreme flexibility in handling structured input data and whose predictions prove to be accurate and-remarkably efficient. The trained network produces a speedup factor as large as 105 with respect to the algorithmic computation of the mapping entropy, enabling the reconstruction of its landscape by means of the Wang–Landau sampling scheme. Applications of this method reach much further than this, as the proposed pipeline is easily transferable to the computation of arbitrary properties of a molecular structure.

Download Full-text

Narrowing the Speedup Factor Gap of Partitioned-EDF

Information and Computation ◽

10.1016/j.ic.2021.104743 ◽

2021 ◽

pp. 104743

Author(s):

Xingwu Liu ◽

Xin Han ◽

Liang Zhao ◽

Zhishan Guo

Keyword(s):

Speedup Factor

Download Full-text

OPTIMISATION OF CRITICALITY AND BURNUP CALCULATIONS IN MONK®

EPJ Web of Conferences ◽

10.1051/epjconf/202124704003 ◽

2021 ◽

Vol 247 ◽

pp. 04003

Author(s):

Andrew Cox ◽

Albrecht Kyrieleis ◽

Sam Powell-Gill ◽

Simon Richards ◽

Francesco Tantillo

Keyword(s):

Monte Carlo ◽

Standard Deviation ◽

Test Cases ◽

Base Temperature ◽

Monte Carlo Code ◽

Doppler Broadening ◽

Potential Importance ◽

Grid Approach ◽

Energy Grid ◽

Speedup Factor

The primary goal of this paper is to increase the efficiency of criticality and burnup calculations in the ANSWERS MONK® Monte Carlo code [1]. Two ways of achieving this goal are investigated as part of the H2020 McSAFE Project: creating a unified energy grid for all materials in the model, and reducing the spread in variances of fluxes for depletable materials using a generated optimised importance map. The average tracking speedup factor across all cycles of all burnup calculations ran using the unified energy grid, at base temperature, was found to be 1.96. For criticality calculations at 400K with runtime Doppler broadening, the unified grid approach gave a total speedup factor of 7.32. This demonstrates the potential importance of this method to reduce the calculation time with models with runtime Doppler broadening. The use of the generated optimised importance map has been demonstrated to significantly reduce the variance in the standard deviations on the fluxes in the fuel pins across two different test cases. If a solution is required in which the standard deviation in none of the fuel pins exceeds 5% it was found that the number of scoring stages required was more than halved, highlighting the potential for the outlined methodology to speedup burnup credit calculations.

Download Full-text

gcn.MOPS: Accelerating cn.MOPS with GPU

10.29007/hb5r ◽

2019 ◽

Author(s):

Mohammad Alkhamis ◽

Amirali Baniasadi

Keyword(s):

Dna Sequencing ◽

Performance Improvement ◽

Copy Number ◽

R Package ◽

Copy Number Variations ◽

Alternative Mechanism ◽

Memory Usage ◽

Sequencing Data ◽

Next Generation Dna Sequencing ◽

Speedup Factor

cn.MOPS is a frequently cited model-based algorithm used to quantitatively detect copy-number variations in next-generation, DNA-sequencing data. Previous work has implemented the algorithm as an R package and has achieved considerable yet limited performance improvement by employing multi-CPU parallelism (maximum achievable speedup was experimentally determined to be 9.24). In this paper, we propose an alternative mechanism of process acceleration. Using one CPU core and a GPU device in the proposed solution, gcn.MOPS, we achieve a speedup factor of 159 and reduce memory usage by more than half compared to cn.MOPS running on one CPU core.

Download Full-text

Model Order Reduction Technique Applied on Harmonic Analysis of a Submerged Vibrating Blade

International Journal of Applied Mechanics and Engineering ◽

10.2478/ijame-2019-0009 ◽

2019 ◽

Vol 24 (1) ◽

pp. 131-142 ◽

Cited By ~ 1

Author(s):

E. Tengs ◽

F. Charrassier ◽

M. Holst ◽

Pål-Tore Storli

Keyword(s):

Model Order Reduction ◽

Order Reduction ◽

Reduction Technique ◽

Reduced Order Model ◽

Order Model ◽

Model Order ◽

Reduced Order ◽

Speed Up ◽

Speedup Factor ◽

Fully Coupled

Abstract As part of an ongoing study into hydropower runner failure, a submerged, vibrating blade is investigated both experimentally and numerically. The numerical simulations performed are fully coupled acoustic-structural simulations in ANSYS Mechanical. In order to speed up the simulations, a model order reduction technique based on Krylov subspaces is implemented. This paper presents a comparison between the full ANSYS harmonic response and the reduced order model, and shows excellent agreement. The speedup factor obtained by using the reduced order model is shown to be between one and two orders of magnitude. The number of dimensions in the reduced subspace needed for accurate results is investigated, and confirms what is found in other studies on similar model order reduction applications. In addition, experimental results are available for validation, and show good match when not too far from the resonance peak.

Download Full-text

Optimización de rendimiento, justicia y consumo energético en sistemas multicore asimétricos mediante planificación

10.35537/10915/74504 ◽

2019 ◽

Author(s):

Adrián Pousa

Keyword(s):

Multicore Processors ◽

El Sistema ◽

Asymmetric Multicore Processors ◽

Asymmetric Multicore ◽

Speedup Factor

Los procesadores multicore asimétricos o AMPs (Asymmetric Multicore Processors) constituyen una alternativa de bajo consumo energético a los procesadores multicore convencionales formados por cores idénticos, pero también plantean grandes desafíos para el software de sistema. Los AMPs integran cores complejos de alto rendimiento y cores simples de bajo consumo. La mayoría de los algoritmos de planificación existentes para AMPs intentan optimizar el rendimiento global. Sin embargo, estos algoritmos degradan otros aspectos como la justicia o la eficiencia energética. El principal objetivo de esta tesis doctoral es superar estas limitaciones, mediante el diseño de estrategias de planificación más flexibles para AMPs. Asimismo, en esta tesis mostramos el impacto que la optimización de una métrica tiene en otras. Para mejorar el rendimiento global, la justicia o la eficiencia energética en AMPs, el planificador debe tener en cuenta el beneficio que cada aplicación alcanza al usar los distintos cores en un AMP. Dado que no todos los hilos en ejecución de una carga de trabajo obtienen siempre el mismo beneficio relativo (speedup factor–SF) al usar un core de alto rendimiento, debe tenerse en cuenta esta diversidad de SFs para optimizar los distintos objetivos. El sistema operativo (SO) debe determinar de forma efectiva el SF de cada hilo en ejecución. En esta tesis proponemos una metodología general para construir modelos de estimación de SF precisos basados en el uso de contadores hardware. La mayoría de los algoritmos de planificación existentes para AMPs, han sido evaluados empleando o bien simuladores o plataformas asimétricas emuladas o bien prototipos de planificadores en modo usuario. Por el contrario, en esta tesis doctoral, evaluamos los algoritmos propuestos en un entorno realista: empleando implementaciones de los algoritmos en el kernel de SOs reales y sobre hardware multicore asimétrico real.

Download Full-text

Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU's

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2019010104 ◽

2019 ◽

Vol 11 (1) ◽

pp. 49-70

Author(s):

Mohsin Altaf Wani ◽

Manzoor Ahmad

Keyword(s):

Dynamic Programming ◽

Dynamic Programming Algorithm ◽

Search Time ◽

Search Tree ◽

Binary Search ◽

High Rate ◽

Binary Search Tree ◽

Programming Algorithm ◽

Optimal Arrangement ◽

Speedup Factor

Modern GPUs perform computation at a very high rate when compared to CPUs; as a result, they are increasingly used for general purpose parallel computation. Determining if a statically optimal binary search tree is an optimization problem to find the optimal arrangement of nodes in a binary search tree so that average search time is minimized. Knuth's modification to the dynamic programming algorithm improves the time complexity to O(n2). We develop a multiple GPU-based implementation of this algorithm using different approaches. Using suitable GPU implementation for a given workload provides a speedup of up to four times over other GPU based implementations. We are able to achieve a speedup factor of 409 on older GTX 570 and a speedup factor of 745 is achieved on a more modern GTX 1060 when compared to a conventional single threaded CPU based implementation.

Download Full-text

An Improved Speedup Factor for Sporadic Tasks with Constrained Deadlines Under Dynamic Priority Scheduling

2018 IEEE Real-Time Systems Symposium (RTSS) ◽

10.1109/rtss.2018.00058 ◽

2018 ◽

Author(s):

Xin Han ◽

Liang Zhao ◽

Zhishan Guo ◽

Xingwu Liu

Keyword(s):

Sporadic Tasks ◽

Priority Scheduling ◽

Dynamic Priority ◽

Speedup Factor

Download Full-text

speedup factor
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Tighter Bounds of Speedup Factor of Partitioned EDF for Constrained-Deadline Sporadic Tasks

A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)

A Deep Graph Network–Enhanced Sampling Approach to Efficiently Explore the Space of Reduced Representations of Proteins

Narrowing the Speedup Factor Gap of Partitioned-EDF

OPTIMISATION OF CRITICALITY AND BURNUP CALCULATIONS IN MONK®

gcn.MOPS: Accelerating cn.MOPS with GPU

Model Order Reduction Technique Applied on Harmonic Analysis of a Submerged Vibrating Blade

Optimización de rendimiento, justicia y consumo energético en sistemas multicore asimétricos mediante planificación

Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU's

An Improved Speedup Factor for Sporadic Tasks with Constrained Deadlines Under Dynamic Priority Scheduling

Export Citation Format

speedup factorRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Tighter Bounds of Speedup Factor of Partitioned EDF for Constrained-Deadline Sporadic Tasks

A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)

A Deep Graph Network–Enhanced Sampling Approach to Efficiently Explore the Space of Reduced Representations of Proteins

Narrowing the Speedup Factor Gap of Partitioned-EDF

OPTIMISATION OF CRITICALITY AND BURNUP CALCULATIONS IN MONK®

gcn.MOPS: Accelerating cn.MOPS with GPU

Model Order Reduction Technique Applied on Harmonic Analysis of a Submerged Vibrating Blade

Optimización de rendimiento, justicia y consumo energético en sistemas multicore asimétricos mediante planificación

Statically Optimal Binary Search Tree Computation Using Non-Serial Polyadic Dynamic Programming on GPU's

An Improved Speedup Factor for Sporadic Tasks with Constrained Deadlines Under Dynamic Priority Scheduling

speedup factor
Recently Published Documents