High Performance 3D PET Reconstruction Using Spherical Basis Functions on a Polar Grid

Statistical iterative methods are a widely used method of image reconstruction in emission tomography. Traditionally, the image space is modelled as a combination of cubic voxels as a matter of simplicity. After reconstruction, images are routinely filtered to reduce statistical noise at the cost of spatial resolution degradation. An alternative to produce lower noise during reconstruction is to model the image space with spherical basis functions. These basis functions overlap in space producing a significantly large number of non-zero elements in the system response matrix (SRM) to store, which additionally leads to long reconstruction times. These two problems are partly overcome by exploiting spherical symmetries, although computation time is still slower compared to non-overlapping basis functions. In this work, we have implemented the reconstruction algorithm using Graphical Processing Unit (GPU) technology for speed and a precomputed Monte-Carlo-calculated SRM for accuracy. The reconstruction time achieved using spherical basis functions on a GPU was 4.3 times faster than the Central Processing Unit (CPU) and 2.5 times faster than a CPU-multi-core parallel implementation using eight cores. Overwriting hazards are minimized by combining a random line of response ordering and constrained atomic writing. Small differences in image quality were observed between implementations.

Download Full-text

Efficient parallelization of SPH algorithm on modern multi-core CPUs and massively parallel GPUs

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500549 ◽

2021 ◽

pp. 2150054

Author(s):

Pravin Jagtap ◽

Rupesh Nasre ◽

V. S. Sanapala ◽

B. S. V. Patnaik

Keyword(s):

High Performance ◽

Performance Metrics ◽

Computational Simulation ◽

Massively Parallel ◽

Benchmark Problems ◽

Processing Unit ◽

Central Processing ◽

Neighbor Search ◽

Computational Performance ◽

Sph Algorithm

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.

Download Full-text

SeisNoise.jl: Ambient Seismic Noise Cross Correlation on the CPU and GPU in Julia

Seismological Research Letters ◽

10.1785/0220200192 ◽

2020 ◽

Vol 92 (1) ◽

pp. 517-527

Author(s):

Timothy Clements ◽

Marine A. Denolle

Keyword(s):

Seismic Noise ◽

High Performance ◽

Cross Correlation ◽

Graphic Processing Unit ◽

Ambient Seismic Noise ◽

Processing Unit ◽

Central Processing ◽

And Performance ◽

Noise Cross Correlation ◽

Performance Computing

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.

Download Full-text

Fast and Accurate Finite Transducer Analysis Method for Wireless Passive Impedance-Loaded SAW Sensors

Sensors ◽

10.3390/s18113988 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3988

Author(s):

Wei Luo ◽

Yang Yuan ◽

Yi Wang ◽

Qiuyun Fu ◽

Hui Xia ◽

...

Keyword(s):

Computation Time ◽

Processing Unit ◽

Bulk Wave ◽

Analysis Method ◽

Accurate Analysis ◽

Central Processing ◽

Finite Transducer ◽

Long Time ◽

High Degree ◽

Element Method

An accurate and fast simulation tool plays an important role in the design of wireless passive impedance-loaded surface acoustic wave (SAW) sensors which have received much attention recently. This paper presents a finite transducer analysis method for wireless passive impedance-loaded SAW sensors. The finite transducer analysis method uses a numerically combined finite element method-boundary element method (FEM/BEM) model to analyze non-periodic transducers. In non-periodic transducers, FEM/BEM was the most accurate analysis method until now, however this method consumes central processing unit (CPU) time. This paper presents a faster algorithm to calculate the bulk wave part of the equation coefficient which usually requires a long time. A complete non-periodic FEM/BEM model of the impedance sensors was constructed. Modifications were made to the final equations in the FEM/BEM model to adjust for the impedance variation of the sensors. Compared with the conventional method, the proposed method reduces the computation time efficiently while maintaining the same high degree of accuracy. Simulations and their comparisons with experimental results for test devices are shown to prove the effectiveness of the analysis method.

Download Full-text

Using spherical basis functions on a polar grid for iterative image reconstruction in small animal PET

10.1117/12.876495 ◽

2011 ◽

Cited By ~ 1

Author(s):

Jorge Cabello ◽

Josep F. Oliver ◽

Magdalena Rafecas

Keyword(s):

Image Reconstruction ◽

Small Animal ◽

Basis Functions ◽

Small Animal Pet ◽

Iterative Image Reconstruction ◽

Animal Pet ◽

Spherical Basis ◽

Polar Grid ◽

Spherical Basis Functions

Download Full-text

Spherical basis functions and uniform distribution of points on spheres

Journal of Approximation Theory ◽

10.1016/j.jat.2007.09.009 ◽

2008 ◽

Vol 151 (2) ◽

pp. 186-207 ◽

Cited By ~ 12

Author(s):

Xingping Sun ◽

Zhenzhong Chen

Keyword(s):

Uniform Distribution ◽

Basis Functions ◽

Distribution Of Points ◽

Spherical Basis ◽

Spherical Basis Functions

Download Full-text

A Parallel Implementation of Unscheduled Flow Control in Interconnected Power Systems

Mathematical Problems in Engineering ◽

10.1155/2012/376291 ◽

2012 ◽

Vol 2012 ◽

pp. 1-19

Author(s):

G. Ozdemir Dag ◽

Mustafa Bagriyanik

Keyword(s):

Power Systems ◽

Power System ◽

Parallel Computation ◽

High Performance ◽

Power Flow ◽

Parallel Implementation ◽

Computation Time ◽

Test System ◽

Fuzzy Decision Making ◽

Optimization Approach

The unscheduled power flow problem needs to be minimized or controlled as soon as possible in a deregulated power system since the transmission systems are mostly operated at their power-carrying limits or very close to it. The time spent for simulations to determine the current states of all the system and control variables of the interconnected power system is important. Taking necessary action in case of any failure of equipment or any other occurrence of an undesired situation could be critical. Using supercomputing facilities and parallel computing techniques together decreases the computation time greatly. In this study, a parallel implementation of a multiobjective optimization approach based on both genetic algorithms and fuzzy decision making to manage unscheduled flows is presented. Parallel computation techniques are applied using supercomputers (high-performance computers). The proposed method is applied to the IEEE 300 bus test system. Two different cases for some parameters of GA are considered to see the power of parallel computation technique. Then the simulation results are presented.

Download Full-text

Direct and Inverse Sobolev Error Estimates for Scattered Data Interpolation via Spherical Basis Functions

Foundations of Computational Mathematics ◽

10.1007/s10208-005-0197-7 ◽

2006 ◽

Vol 7 (3) ◽

pp. 369-390 ◽

Cited By ~ 39

Author(s):

Francis J. Narcowich ◽

Xingping Sun ◽

Joseph D. Ward ◽

Holger Wendland

Keyword(s):

Error Estimates ◽

Scattered Data ◽

Basis Functions ◽

Data Interpolation ◽

Scattered Data Interpolation ◽

Spherical Basis ◽

Spherical Basis Functions

Download Full-text

Approximation of parabolic PDEs on spheres using spherical basis functions

Advances in Computational Mathematics ◽

10.1007/s10444-003-3960-9 ◽

2005 ◽

Vol 22 (4) ◽

pp. 377-397 ◽

Cited By ~ 19

Author(s):

Q. T. Le Gia

Keyword(s):

Basis Functions ◽

Parabolic Pdes ◽

Spherical Basis ◽

Spherical Basis Functions

Download Full-text

Controllers: An abstraction to ease the use of hardware accelerators

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017702962 ◽

2017 ◽

Vol 32 (6) ◽

pp. 838-853 ◽

Cited By ~ 4

Author(s):

Ana Moreton–Fernandez ◽

Hector Ortega–Arranz ◽

Arturo Gonzalez–Escribano

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Abstract Entity ◽

Hardware Accelerators ◽

Processing Unit ◽

Central Processing ◽

Computing Platforms ◽

Graphics Processing ◽

Performance Computing ◽

Selection Of

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.

Download Full-text

GRID-ENABLED NON-RIGID REGISTRATION OF MEDICAL IMAGES

Parallel Processing Letters ◽

10.1142/s0129626404001830 ◽

2004 ◽

Vol 14 (02) ◽

pp. 197-216 ◽

Cited By ~ 3

Author(s):

RADU STEFANESCU ◽

XAVIER PENNEC ◽

NICHOLAS AYACHE

Keyword(s):

High Performance ◽

Cluster Computing ◽

Parallel Implementation ◽

Low Cost ◽

Computation Time ◽

Clinical Environment ◽

Healthcare Organizations ◽

Rigid Registration ◽

Demons Algorithm ◽

Point To Point

Over recent years, non-rigid registration has become a major issue in medical imaging. It consists in recovering a dense point-to-point correspondence field between two images, and usually takes a long time. This is in contrast to the needs of a clinical environment, where usability and speed are major constraints, leading to the necessity of reducing the computation time from slightly less than an hour to just a few minutes. As financial pressure makes it hard for healthcare organizations to invest in expensive high-performance computing (HPC) solutions, cluster computing proves to be a convenient solution to our computation needs, offering a large processing power at a low cost. Among the fast and efficient non-rigid registration methods, we chose the demons algorithm for its simplicity and good performances. The parallel implementation decomposes the correspondence field into spatial blocks, each block being assigned to a node of the cluster. We obtained an acceleration of 11 by using 15 2GHz PC's connected through a 1GB/s Ethernet network and reduced the computation time from 40min to 3min30. In order to further optimize the costs and the maintenance load, we investigate in the second part the transparent use of shared computing resources, either through a graphic client or a Web one.

Download Full-text