OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

Roberto Castro; Diego Andrade; Basilio  Fraguela

doi:10.3390/math9172033

OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

Mathematics ◽

10.3390/math9172033 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2033

Author(s):

Roberto Castro ◽

Diego Andrade ◽

Basilio Fraguela

Keyword(s):

Open Source Software ◽

Video Processing ◽

High Performance ◽

State Of The Art ◽

Matrix Multiplication ◽

Filtering Algorithm ◽

Convolution Operation ◽

Convolution Algorithm ◽

Algorithm Implementation ◽

Performance Computing

Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution can be solved as it is mathematically enunciated, but other methods allow to transform it into a Fast Fourier Transform (FFT) or a GEneral Matrix Multiplication (GEMM). In this latter group, the Winograd algorithm is a state-of-the-art variant that is specially suitable for smaller convolutions. In this paper, we present openCNN, an optimized CUDA C++ implementation of the Winograd convolution algorithm. Our approach achieves speedups of up to 1.76× on Turing RTX 2080Ti and up to 1.85× on Ampere RTX 3090 with respect to Winograd convolution in cuDNN 8.2.0. OpenCNN is released as open-source software.

Architecture for the Integration of High Performance Computing Applications in PLM

Volume 2: 27th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2007-35185 ◽

2007 ◽

Author(s):

Reiner Anderl ◽

Orkun Yaman

Keyword(s):

Data Management ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Reference Information ◽

Simulation Domain ◽

Architectural Framework ◽

Industrial Context ◽

Performance Computing ◽

Integrate Data

High Performance Computing (HPC) has become ubiquitous for simulations in the industrial context. To identify the requirements for integration of HPC-relevant data and processes a survey has been conducted concerning the German car manufacturers and service and component suppliers. This contribution presents the results of the evaluation and suggests an architecture concept to integrate data and workflows related with CAE and HPC-facilities in PLM. It describes the state of the art of HPC-applications within the simulation domain. Intensive efforts are currently invested on CAE-data management. However, an approach to systematic data management of HPC does not exist. This study states importance of an integrating approach for data management of HPC-applications and develops an architectural framework to implement HPC-data management into the existing PLM landscape. Requirements on key functionalities and interfaces are defined as well as a framework for a reference information model is conceptualized.

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Journal of Functional Programming ◽

10.1017/s0956796816000010 ◽

2016 ◽

Vol 26 ◽

Cited By ~ 1

Author(s):

JOST BERTHOLD ◽

HANS-WOLFGANG LOIDL ◽

KEVIN HAMMOND

Keyword(s):

Shared Memory ◽

High Performance ◽

Parallel Machines ◽

State Of The Art ◽

Computing Systems ◽

Programming Abstraction ◽

Work Distribution ◽

High Level ◽

Parallelism Model ◽

Performance Computing

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

High-performance computing systems: Status and outlook

Acta Numerica ◽

10.1017/s0962492912000050 ◽

2012 ◽

Vol 21 ◽

pp. 379-474 ◽

Cited By ~ 36

Author(s):

J. J. Dongarra ◽

A. J. van der Steen

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Computing Systems ◽

Future Developments ◽

Steady Growth ◽

Current State ◽

Near Future ◽

Performance Computing ◽

Shed Light

This article describes the current state of the art of high-performance computing systems, and attempts to shed light on near-future developments that might prolong the steady growth in speed of such systems, which has been one of their most remarkable characteristics. We review the different ways devised to speed them up, both with regard to components and their architecture. In addition, we discuss the requirements for software that can take advantage of existing and future architectures.

Resilient gossip-inspired all-reduce algorithms for high-performance computing: Potential, limitations, and open questions

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018762531 ◽

2018 ◽

Vol 33 (2) ◽

pp. 366-383

Author(s):

Marc Casas ◽

Wilfried N Gansterer ◽

Elias Wimmer

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Reduction Algorithm ◽

Data Corruption ◽

Parallel Reduction ◽

Open Questions ◽

Performance Computing

We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements.

State of the Art and Future Trends in Data Reduction for High-Performance Computing

Supercomputing Frontiers and Innovations ◽

10.14529/jsfi200101 ◽

2020 ◽

Vol 7 (1) ◽

Keyword(s):

High Performance Computing ◽

Data Reduction ◽

High Performance ◽

State Of The Art ◽

Future Trends ◽

Performance Computing

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments

Scientific Programming ◽

10.1155/2019/8348791 ◽

2019 ◽

Vol 2019 ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Pawel Czarnul ◽

Jerzy Proficz ◽

Adam Krzywaniak

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hybrid Methods ◽

State Of The Art ◽

Control Methods ◽

Energy Aware ◽

Power Capping ◽

Power Limits ◽

Performance Computing

The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of metrics such as execution time, energy consumption, and temperature with consideration of imposed power limits. Control methods include scheduling, DVFS/DFS/DCT, power capping with programmatic APIs such as Intel RAPL, NVIDIA NVML, as well as application optimizations, and hybrid methods. We discuss tools and APIs for energy/power management as well as tools and environments for prediction and/or simulation of energy/power consumption in modern HPC systems. Finally, programming examples, i.e., applications and benchmarks used in particular works are discussed. Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing.

The State-of-the-Art Trends in Education Strategy for Sustainable Development of the High Performance Computing Ecosystem

Communications in Computer and Information Science - Supercomputing ◽

10.1007/978-3-319-71255-0_40 ◽

2017 ◽

pp. 494-504 ◽

Cited By ~ 1

Author(s):

Sergey Mosin

Keyword(s):

Sustainable Development ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Education Strategy ◽

Performance Computing

Power measurement for high performance computing: State of the art

2011 International Green Computing Conference and Workshops ◽

10.1109/igcc.2011.6008596 ◽

2011 ◽

Author(s):

Chung-Hsing Hsu ◽

Stephen W. Poole

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Power Measurement ◽

Performance Computing

A Door to State-of-the-Art High-Performance Computing

IEEE Distributed Systems Online ◽

10.1109/mdso.2007.35 ◽

2007 ◽

Vol 8 (6) ◽

pp. 4-4 ◽

Cited By ~ 2

Author(s):

Arturo Ortiz-Tapia

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Performance Computing

A survey of high-performance computing scaling challenges

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015597083 ◽

2016 ◽

Vol 31 (1) ◽

pp. 104-113 ◽

Cited By ~ 26

Author(s):

Al Geist ◽

Daniel A Reed

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Operational Experience ◽

Computing Systems ◽

Software Complexity ◽

Petascale Computing ◽

New Challenges ◽

Commodity Clusters ◽

Performance Computing

Commodity clusters revolutionized high-performance computing when they first appeared two decades ago. As scale and complexity have grown, new challenges in reliability and systemic resilience, energy efficiency and optimization and software complexity have emerged that suggest the need for re-evaluation of current approaches. This paper reviews the state of the art and reflects on some of the challenges likely to be faced when building trans-petascale computing systems, using insights and perspectives drawn from operational experience and community debates.