An Implicit Harmonic Balance Method in Graphics Processing Units for Oscillating Blades

An implicit harmonic balance (HB) method for modeling the unsteady nonlinear periodic flow about vibrating airfoils in turbomachinery is presented. An implicit edge-based three-dimensional Reynolds-averaged Navier–Stokes equations (RANS) solver for unstructured grids, which runs both on central processing units (CPUs) and graphics processing units (GPUs), is used. The HB method performs a spectral discretization of the time derivatives and marches in pseudotime, a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time-spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a backward finite-difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite-difference version of the code, whereas an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the tenth standard aeroelastic configuration and a transonic compressor.

Download Full-text

An Implicit Harmonic Balance Method in Graphics Processing Units for Vibrating Blades

Volume 2C: Turbomachinery ◽

10.1115/gt2015-42275 ◽

2015 ◽

Author(s):

Javier Crespo ◽

Roque Corral ◽

Jesus Pueblas

Keyword(s):

Finite Difference ◽

Harmonic Balance ◽

Graphics Processing Units ◽

Stokes Equations ◽

Computational Cost ◽

Harmonic Balance Method ◽

Balance Method ◽

Transonic Compressor ◽

Speed Up ◽

Graphics Processing

An implicit harmonic balance method for modeling the unsteady non-linear periodic flow about vibrating airfoils in turbomachinery is presented. As departing point, an implicit edge-based three-dimensional Reynolds Averaged Navier-Stokes equations solver for unstructured grids that runs both on central processing units (CPUs) and graphics processing units (GPUs) is used. The harmonic balance method performs a spectral discretization of the time derivatives and marches in pseudo-time a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a Backward Finite Difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite difference version of the code whereas and an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the 10th standard aeroelastic configuration and a transonic compressor.

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units

Journal of Fluids Engineering ◽

10.1115/1.4023858 ◽

2013 ◽

Vol 135 (6) ◽

Cited By ~ 18

Author(s):

S. P. Vanka

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Graphics Processing Units ◽

Stokes Equations ◽

Multicore Processors ◽

Fluid Flows ◽

Data Access ◽

Central Processing ◽

Data Parallel ◽

Graphics Processing

This paper discusses the various issues of using graphics processing units (GPU) for computing fluid flows. GPUs, used primarily for processing graphics functions in a computer, are massively parallel multicore processors, which can also perform scientific computations in a data parallel mode. In the past ten years, GPUs have become quite powerful and have challenged the central processing units (CPUs) in their price and performance characteristics. However, in order to fully benefit from the GPUs' performance, the numerical algorithms must be made data parallel and converge rapidly. In addition, the hardware features of the GPUs require that the memory access be managed carefully in order to not suffer from the high latency. Fully explicit algorithms for Euler and Navier–Stokes equations and the lattice Boltzmann method for mesoscopic flows have been widely incorporated on the GPUs, with significant speed-up over a scalar algorithm. However, more complex algorithms with implicit formulations and unstructured grids require innovative thinking in data access and management. This article reviews the literature on linear solvers and computational fluid dynamics (CFD) algorithms on GPUs, including the author's own research on simulations of fluid flows using GPUs.

Download Full-text

Industry-scale finite-difference elastic wave modeling on graphics processing units using the out-of-core technique

Geophysics ◽

10.1190/geo2015-0267.1 ◽

2016 ◽

Vol 81 (2) ◽

pp. T35-T43 ◽

Cited By ~ 3

Author(s):

Jon Marius Venstad

Keyword(s):

Finite Difference ◽

Graphics Processing Units ◽

Finite Difference Methods ◽

Main Memory ◽

Multicore Architectures ◽

Wave Modeling ◽

Central Processing ◽

Computational Overhead ◽

The Difference ◽

Graphics Processing

The difference in computational power between the few- and multicore architectures represented by central processing units (CPUs) and graphics processing units (GPUs) is significant today, and this difference is likely to increase in the years ahead. GPUs are, therefore, ever more popular for applications in computational physics, such as wave modeling. Finite-difference methods are popular for wave modeling and are well suited for the GPU architecture, but developing an efficient and capable GPU implementation is hindered by the limited size of the GPU memory. I revealed how the out-of-core technique can be used to circumvent the memory limit on the GPU, increasing the available memory to that of the CPU (the main memory) instead, with no significant computational overhead. This approach has several advantages over a parallel scheme in terms of applicability, flexibility, and hardware requirements. Choices in the numerical scheme — the numerical differentiators in particular — also greatly affect computational efficiency. These factors are considered explicitly for GPU implementations of wave modeling because GPUs are special purpose with a visible architecture.

Download Full-text

Development of an Edge-Based Harmonic Balance Method for Turbomachinery Flows

Volume 7: Turbomachinery, Parts A, B, and C ◽

10.1115/gt2011-45170 ◽

2011 ◽

Cited By ~ 2

Author(s):

Roque Corral ◽

Javier Crespo

Keyword(s):

Harmonic Balance ◽

Stokes Equations ◽

Computational Cost ◽

Harmonic Balance Method ◽

Convergence Acceleration ◽

Balance Method ◽

Navier Stokes ◽

Acceleration Techniques ◽

Edge Based ◽

The Time Domain

An harmonic balance method for modeling unsteady nonlinear periodic flows in turbomachinery is presented. The method solves the Reynolds Averaged Navier-Stokes equations in the time domain and may be implemented in a relatively simple way into an existing code including all the standard convergence acceleration techniques used for steady problems. The application of the method to vibrating airfoils and rotorstator interaction is discussed. It is demonstrated that the time spectral scheme may achieve the same temporal accuracy at a lower computational cost at the expense of using more memory.

Download Full-text

A Compact Convolutional Neural Network for Surface Defect Inspection

Sensors ◽

10.3390/s20071974 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1974 ◽

Cited By ~ 1

Author(s):

Yibin Huang ◽

Congying Qiu ◽

Xiaonan Wang ◽

Shijun Wang ◽

Kui Yuan

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Surface Defects ◽

Computational Cost ◽

Low Frequency ◽

Defect Inspection ◽

Central Processing ◽

Spatial Pyramid Pooling ◽

Similar Accuracy ◽

Graphics Processing

The advent of convolutional neural networks (CNNs) has accelerated the progress of computer vision from many aspects. However, the majority of the existing CNNs heavily rely on expensive GPUs (graphics processing units). to support large computations. Therefore, CNNs have not been widely used to inspect surface defects in the manufacturing field yet. In this paper, we develop a compact CNN-based model that not only achieves high performance on tiny defect inspection but can be run on low-frequency CPUs (central processing units). Our model consists of a light-weight (LW) bottleneck and a decoder. By a pyramid of lightweight kernels, the LW bottleneck provides rich features with less computational cost. The decoder is also built in a lightweight way, which consists of an atrous spatial pyramid pooling (ASPP) and depthwise separable convolution layers. These lightweight designs reduce the redundant weights and computation greatly. We train our models on groups of surface datasets. The model can successfully classify/segment surface defects with an Intel i3-4010U CPU within 30 ms. Our model obtains similar accuracy with MobileNetV2 while only has less than its 1/3 FLOPs (floating-point operations per second) and 1/8 weights. Our experiments indicate CNNs can be compact and hardware-friendly for future applications in the automated surface inspection (ASI).

Download Full-text

Numerical simulation of transonic compressor under circumferential inlet distortion and rotor/stator interference using harmonic balance method

Modern Physics Letters B ◽

10.1142/s0217984918400213 ◽

2018 ◽

Vol 32 (12n13) ◽

pp. 1840021

Author(s):

Ziwei Wang ◽

Xiong Jiang ◽

Ti Chen ◽

Yan Hao ◽

Min Qiu

Keyword(s):

Numerical Simulation ◽

Unsteady Flow ◽

Harmonic Balance ◽

Harmonic Balance Method ◽

Balance Method ◽

Transonic Compressor ◽

Inlet Distortion ◽

Speed Up ◽

Computational Resources ◽

Optimizing Algorithm

Simulating the unsteady flow of compressor under circumferential inlet distortion and rotor/stator interference would need full-annulus grid with a dual time method. This process is time consuming and needs a large amount of computational resources. Harmonic balance method simulates the unsteady flow in compressor on single passage grid with a series of steady simulations. This will largely increase the computational efficiency in comparison with the dual time method. However, most simulations with harmonic balance method are conducted on the flow under either circumferential inlet distortion or rotor/stator interference. Based on an in-house CFD code, the harmonic balance method is applied in the simulation of flow in the NASA Stage 35 under both circumferential inlet distortion and rotor/stator interference. As the unsteady flow is influenced by two different unsteady disturbances, it leads to the computational instability. The instability can be avoided by coupling the harmonic balance method with an optimizing algorithm. The computational result of harmonic balance method is compared with the result of full-annulus simulation. It denotes that, the harmonic balance method simulates the flow under circumferential inlet distortion and rotor/stator interference as precise as the full-annulus simulation with a speed-up of about 8 times.

Download Full-text

Finite element method completely implemented for graphic processor units using parallel algorithm libraries

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694703 ◽

2017 ◽

Vol 33 (1) ◽

pp. 53-66 ◽

Cited By ~ 1

Author(s):

Franz Pichler ◽

Gundolf Haase

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Time Step ◽

Device Architecture ◽

Transient Problems ◽

Speed Up ◽

Automotive Batteries ◽

Graphics Processing

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit

Volume 13: New Developments in Simulation Methods and Software for Engineering Applications; Safety Engineering, Risk Analysis and Reliability Methods; Transportation Systems ◽

10.1115/imece2009-11587 ◽

2009 ◽

Cited By ~ 5

Author(s):

Aaron F. Shinn ◽

S. P. Vanka

Keyword(s):

Stokes Equations ◽

Graphics Processing Unit ◽

Navier Stokes ◽

Processing Unit ◽

Navier Stokes Equations ◽

Driven Cavity ◽

Multigrid Algorithm ◽

Computational Speed ◽

Speed Up ◽

Graphics Processing

A semi-implicit pressure based multigrid algorithm for solving the incompressible Navier-Stokes equations was implemented on a Graphics Processing Unit (GPU) using CUDA (Compute Unified Device Architecture). The multigrid method employed was the Full Approximation Scheme (FAS), which is used for solving nonlinear equations. This algorithm is applied to the 2D driven cavity problem and compared to the CPU version of the code (written in Fortran) to assess computational speed-up.

Download Full-text

CLIJ: GPU-accelerated image processing for everyone

10.1101/660704 ◽

2019 ◽

Cited By ~ 1

Author(s):

Robert Haase ◽

Loic A. Royer ◽

Peter Steinbach ◽

Deborah Schmidt ◽

Alexandr Dibrov ◽

...

Keyword(s):

Image Processing ◽

Graphics Processing Units ◽

End Users ◽

Entry Level ◽

Speed Up ◽

Mobile Computers ◽

Graphics Processing

AbstractGraphics processing units (GPU) allow image processing at unprecedented speed. We present CLIJ, a Fiji plugin enabling end-users with entry level experience in programming to benefit from GPU-accelerated image processing. Freely programmable workflows can speed up image processing in Fiji by factor 10 and more using high-end GPU hardware and on affordable mobile computers with built-in GPUs.

Download Full-text