A Distributed GPU-Based Framework for Real-Time 3D Volume Rendering of Large Astronomical Data Cubes

AbstractWe present a framework to volume-render three-dimensional data cubes interactively using distributed ray-casting and volume-bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core central processing unit (CPU). The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of terabyte-sized data cubes. We tested the presented framework using a computing cluster comprising 64 nodes with a total of 128 GPUs. The framework proved to be scalable to render a 204 GB data cube with an average of 30 frames per second. Our performance analyses also compare the use of NVIDIA Tesla 1060 and 2050 GPU architectures and the effect of increasing the visualization output resolution on the rendering performance. Although our initial focus, as shown in the examples presented in this work, is volume rendering of spectral data cubes from radio astronomy, we contend that our approach has applicability to other disciplines where close to real-time volume rendering of terabyte-order three-dimensional data sets is a requirement.

Download Full-text

An Accelerated 3D Navier–Stokes Solver for Flows in Turbomachines

Journal of Turbomachinery ◽

10.1115/1.4001192 ◽

2010 ◽

Vol 133 (2) ◽

Cited By ~ 43

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier–Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes but has been implemented to run on graphics processing units (GPUs) instead of the traditional central processing unit. The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. The scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 min on a cluster with four GPUs.

Download Full-text

An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines

Volume 7: Turbomachinery, Parts A and B ◽

10.1115/gt2009-60052 ◽

2009 ◽

Cited By ~ 20

Author(s):

Tobias Brandvik ◽

Graham Pullan

Keyword(s):

Graphics Processing Units ◽

Three Dimensional ◽

Navier Stokes ◽

Linear Scaling ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Order Of Magnitude ◽

Graphics Processing ◽

Good Agreement

A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs.

Download Full-text

Accelerating 3D Medical Image Segmentation by Adaptive Small-Scale Target Localization

Journal of Imaging ◽

10.3390/jimaging7020035 ◽

2021 ◽

Vol 7 (2) ◽

pp. 35

Author(s):

Boris Shirokikh ◽

Alexey Shevtsov ◽

Alexandra Dalechina ◽

Egor Krivov ◽

Valery Kostjuchenko ◽

...

Keyword(s):

Image Segmentation ◽

Graphics Processing Units ◽

Medical Image ◽

State Of The Art ◽

Three Dimensional ◽

Medical Image Segmentation ◽

Small Scale ◽

Processing Unit ◽

Central Processing ◽

Lung Cancer Diagnosis

The prevailing approach for three-dimensional (3D) medical image segmentation is to use convolutional networks. Recently, deep learning methods have achieved human-level performance in several important applied problems, such as volumetry for lung-cancer diagnosis or delineation for radiation therapy planning. However, state-of-the-art architectures, such as U-Net and DeepMedic, are computationally heavy and require workstations accelerated with graphics processing units for fast inference. However, scarce research has been conducted concerning enabling fast central processing unit computations for such networks. Our paper fills this gap. We propose a new segmentation method with a human-like technique to segment a 3D study. First, we analyze the image at a small scale to identify areas of interest and then process only relevant feature-map patches. Our method not only reduces the inference time from 10 min to 15 s but also preserves state-of-the-art segmentation quality, as we illustrate in the set of experiments with two large datasets.

Download Full-text

Real-Time Speech-To-Text / Text-To-Speech Converter With Automatic Text Summarizer using Natural Language Generation And Abstract Meaning Representation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7911.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2361-2365

Keyword(s):

Natural Language ◽

Real Time ◽

Graphics Processing Units ◽

Processing Unit ◽

Conference Calls ◽

Baseline Model ◽

Central Processing ◽

Proposed Model ◽

Graphics Processing ◽

Meaning Representation

Due to extensive needs for growth in various sectors, which include software, telecom, healthcare, defence, etc., there is a necessary increase in the number as well as the duration of meetings, conference calls, reconnaissance stakeouts, financial reviews. The obtained reports of these play a significant role in defining the plan of actions. The proposed model is to convert real-time speech to corresponding text and then to its respective summary using Natural Language Grammar (NLG) and Abstract Meaning Representation (AMR) graphs and then again turned back the obtained summary to speech. The proposed model intends to achieve the task using two major algorithms, 1) Deep Speech 2, 2) AMR graphs. The speech-recognition model recommended has a speedup of 4x if the algorithm runs on a Central Processing Unit (CPU), and the use of particular Graphics Processing Units (GPUs) for running deep learning algorithms can give a speedup of 21x. The performance of the summarizer used is close to the Lead-3-AMR-Baseline model, which is a solid baseline for the CNN/Dailymail dataset. The summarizer we use scores ROGUE score close to the Lead-3- AMR-Baseline model with an accuracy of 99.37%.

Download Full-text

Hawk: the image reconstruction package for coherent X-ray diffractive imaging

Journal of Applied Crystallography ◽

10.1107/s0021889810036083 ◽

2010 ◽

Vol 43 (6) ◽

pp. 1535-1539 ◽

Cited By ~ 42

Author(s):

Filipe R. N. C. Maia ◽

Tomas Ekeberg ◽

David van der Spoel ◽

Janos Hajdu

Keyword(s):

Open Source ◽

Graphics Processing Units ◽

Rapid Development ◽

Three Dimensional ◽

Processing Unit ◽

Diffractive Imaging ◽

X Ray ◽

Two Dimensional Image ◽

Central Processing ◽

Diffraction Patterns

The past few years have seen a tremendous growth in the field of coherent X-ray diffractive imaging, in large part due to X-ray free-electron lasers which provide a peak brilliance billions of times higher than that of synchrotrons. However, this rapid development in terms of hardware has not been matched on the software side. The release ofHawkis intended to close this gap. To the authors' knowledgeHawkis the first publicly available and fully open source software program for reconstructing images from continuous diffraction patterns. The software handles all steps leading from a raw diffraction pattern to a reconstructed two-dimensional image including geometry determination, background correction, masking and phasing. It also includes preliminary three-dimensional support and support for graphics processing units using the Compute Unified Device Architecture, which speeds up processing by orders of magnitude compared to a single central processing unit.Hawkimplements numerous algorithms and is easily extended. This, in combination with its open-source licence, provides a platform for other groups to test, develop and distribute their own algorithms.Hawkis available under the GNU General Public License from http://xray.bmc.uu.se/hawk.

Download Full-text

High Performance GPU-Based Fourier Volume Rendering

International Journal of Biomedical Imaging ◽

10.1155/2015/590727 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Marwan Abdellah ◽

Ayman Eldeib ◽

Amr Sharawi

Keyword(s):

Volume Rendering ◽

High Performance ◽

Graphics Processing Unit ◽

Rapid Evolution ◽

Processing Unit ◽

Central Processing ◽

Slice Theorem ◽

Cuda Technology ◽

Gpu Architectures ◽

3D Volume

Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of itsO(N2log⁡N)time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that areO(N3)computationally complex. Relying on theFourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look likeX-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.

Download Full-text

Prevention of Mountain Disasters and Maintenance of Residential Area through Real-Time Terrain Rendering

Sustainability ◽

10.3390/su13052950 ◽

2021 ◽

Vol 13 (5) ◽

pp. 2950

Author(s):

Su-Kyung Sung ◽

Eun-Seok Lee ◽

Byeong-Seok Shin

Keyword(s):

Real Time ◽

Graphics Processing Units ◽

High Speed ◽

Large Scale ◽

Civil Engineering ◽

Three Dimensional ◽

Sampled Data ◽

Residential Areas ◽

Terrain Rendering ◽

Mountain Disasters

Climate change increases the frequency of localized heavy rains and typhoons. As a result, mountain disasters, such as landslides and earthworks, continue to occur, causing damage to roads and residential areas downstream. Moreover, large-scale civil engineering works, including dam construction, cause rapid changes in the terrain, which harm the stability of residential areas. Disasters, such as landslides and earthenware, occur extensively, and there are limitations in the field of investigation; thus, there are many studies being conducted to model terrain geometrically and to observe changes in terrain according to external factors. However, conventional topography methods are expressed in a way that can only be interpreted by people with specialized knowledge. Therefore, there is a lack of consideration for three-dimensional visualization that helps non-experts understand. We need a way to express changes in terrain in real time and to make it intuitive for non-experts to understand. In conventional height-based terrain modeling and simulation, there is a problem in which some of the sampled data are irregularly distorted and do not show the exact terrain shape. The proposed method utilizes a hierarchical vertex cohesion map to correct inaccurately modeled terrain caused by uniform height sampling, and to compensate for geometric errors using Hausdorff distances, while not considering only the elevation difference of the terrain. The mesh reconstruction, which triangulates the three-vertex placed at each location and makes it the smallest unit of 3D model data, can be done at high speed on graphics processing units (GPUs). Our experiments confirm that it is possible to express changes in terrain accurately and quickly compared with existing methods. These functions can improve the sustainability of residential spaces by predicting the damage caused by mountainous disasters or civil engineering works around the city and make it easy for non-experts to understand.

Download Full-text

A Real-Time Photogrammetric System for Acquisition and Monitoring of Three-Dimensional Human Body Kinematics

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.5.363 ◽

2021 ◽

Vol 87 (5) ◽

pp. 363-373

Author(s):

Long Chen ◽

Bo Wu ◽

Yao Zhao ◽

Yuan Li

Keyword(s):

Real Time ◽

Human Body ◽

Graphics Processing Unit ◽

Three Dimensional ◽

Stereo Pair ◽

Processing Unit ◽

Detection Distance ◽

Human Kinematics ◽

Graphics Processing ◽

Time Acquisition

Real-time acquisition and analysis of three-dimensional (3D) human body kinematics are essential in many applications. In this paper, we present a real-time photogrammetric system consisting of a stereo pair of red-green-blue (RGB) cameras. The system incorporates a multi-threaded and graphics processing unit (GPU)-accelerated solution for real-time extraction of 3D human kinematics. A deep learning approach is adopted to automatically extract two-dimensional (2D) human body features, which are then converted to 3D features based on photogrammetric processing, including dense image matching and triangulation. The multi-threading scheme and GPU-acceleration enable real-time acquisition and monitoring of 3D human body kinematics. Experimental analysis verified that the system processing rate reached ∼18 frames per second. The effective detection distance reached 15 m, with a geometric accuracy of better than 1% of the distance within a range of 12 m. The real-time measurement accuracy for human body kinematics ranged from 0.8% to 7.5%. The results suggest that the proposed system is capable of real-time acquisition and monitoring of 3D human kinematics with favorable performance, showing great potential for various applications.

Download Full-text

FEM using projection of physical properties suitable for movement modeling and optimization processes

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-01-2020-0022 ◽

2020 ◽

Vol 39 (5) ◽

pp. 1185-1199

Author(s):

Baptiste Ristagno ◽

Dominique Giraud ◽

Julien Fontchastagner ◽

Denis Netter ◽

Noureddine Takorabet ◽

...

Keyword(s):

Three Dimensional ◽

Transient State ◽

Mathematical Function ◽

Processing Unit ◽

Time Saving ◽

Content Type ◽

Central Processing ◽

Cpu Time ◽

Simple Geometry ◽

Evaluation Time

Purpose Optimization processes and movement modeling usually require a high number of simulations. The purpose of this paper is to reduce global central processing unit (CPU) time by decreasing each evaluation time. Design Methodology Approach Remeshing the geometry at each iteration is avoided in the proposed method. The idea consists in using a fixed mesh on which functions are projected to represent geometry and supply. Findings Results are very promising. CPU time is reduced for three dimensional problems by almost a factor two, keeping a low relative deviation from usual methods. CPU time saving is performed by avoiding meshing step and also by a better initialization of iterative resolution. Optimization, movement modeling and transient-state simulation are very efficient and give same results as usual finite element method. Research Limitations Implications The method is restricted to simple geometry owing to the difficulty of finding spatial mathematical function describing the geometry. Moreover, a compromise between imprecision, caused by the boundary evaluation, and time saving must be found. Originality Value The method can be applied to optimize rotating machines design. Moreover, movement modeling is performed by shifting functions corresponding to moving parts.

Download Full-text

Comparative study of the implementation of the Lagrange interpolation algorithm on GPU and CPU using CUDA to compute the density of a material at different temperatures

SHS Web of Conferences ◽

10.1051/shsconf/202111907002 ◽

2021 ◽

Vol 119 ◽

pp. 07002

Author(s):

Youness Rtal ◽

Abdelkader Hadjoudja

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Lagrange Interpolation ◽

Polynomial Interpolation ◽

Programming Model ◽

Interpolation Method ◽

Processing Unit ◽

Central Processing ◽

Computational Performance ◽

Different Temperatures

Graphics Processing Units (GPUs) are microprocessors attached to graphics cards, which are dedicated to the operation of displaying and manipulating graphics data. Currently, such graphics cards (GPUs) occupy all modern graphics cards. In a few years, these microprocessors have become potent tools for massively parallel computing. Such processors are practical instruments that serve in developing several fields like image processing, video and audio encoding and decoding, the resolution of a physical system with one or more unknowns. Their advantages: faster processing and consumption of less energy than the power of the central processing unit (CPU). In this paper, we will define and implement the Lagrange polynomial interpolation method on GPU and CPU to calculate the sodium density at different temperatures Ti using the NVIDIA CUDA C parallel programming model. It can increase computational performance by harnessing the power of the GPU. The objective of this study is to compare the performance of the implementation of the Lagrange interpolation method on CPU and GPU processors and to deduce the efficiency of the use of GPUs for parallel computing.

Download Full-text