scholarly journals High Performance 3D PET Reconstruction Using Spherical Basis Functions on a Polar Grid

2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
J. Cabello ◽  
J. E. Gillam ◽  
M. Rafecas

Statistical iterative methods are a widely used method of image reconstruction in emission tomography. Traditionally, the image space is modelled as a combination of cubic voxels as a matter of simplicity. After reconstruction, images are routinely filtered to reduce statistical noise at the cost of spatial resolution degradation. An alternative to produce lower noise during reconstruction is to model the image space with spherical basis functions. These basis functions overlap in space producing a significantly large number of non-zero elements in the system response matrix (SRM) to store, which additionally leads to long reconstruction times. These two problems are partly overcome by exploiting spherical symmetries, although computation time is still slower compared to non-overlapping basis functions. In this work, we have implemented the reconstruction algorithm using Graphical Processing Unit (GPU) technology for speed and a precomputed Monte-Carlo-calculated SRM for accuracy. The reconstruction time achieved using spherical basis functions on a GPU was 4.3 times faster than the Central Processing Unit (CPU) and 2.5 times faster than a CPU-multi-core parallel implementation using eight cores. Overwriting hazards are minimized by combining a random line of response ordering and constrained atomic writing. Small differences in image quality were observed between implementations.

Author(s):  
Pravin Jagtap ◽  
Rupesh Nasre ◽  
V. S. Sanapala ◽  
B. S. V. Patnaik

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.


2020 ◽  
Vol 92 (1) ◽  
pp. 517-527
Author(s):  
Timothy Clements ◽  
Marine A. Denolle

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3988
Author(s):  
Wei Luo ◽  
Yang Yuan ◽  
Yi Wang ◽  
Qiuyun Fu ◽  
Hui Xia ◽  
...  

An accurate and fast simulation tool plays an important role in the design of wireless passive impedance-loaded surface acoustic wave (SAW) sensors which have received much attention recently. This paper presents a finite transducer analysis method for wireless passive impedance-loaded SAW sensors. The finite transducer analysis method uses a numerically combined finite element method-boundary element method (FEM/BEM) model to analyze non-periodic transducers. In non-periodic transducers, FEM/BEM was the most accurate analysis method until now, however this method consumes central processing unit (CPU) time. This paper presents a faster algorithm to calculate the bulk wave part of the equation coefficient which usually requires a long time. A complete non-periodic FEM/BEM model of the impedance sensors was constructed. Modifications were made to the final equations in the FEM/BEM model to adjust for the impedance variation of the sensors. Compared with the conventional method, the proposed method reduces the computation time efficiently while maintaining the same high degree of accuracy. Simulations and their comparisons with experimental results for test devices are shown to prove the effectiveness of the analysis method.


2012 ◽  
Vol 2012 ◽  
pp. 1-19
Author(s):  
G. Ozdemir Dag ◽  
Mustafa Bagriyanik

The unscheduled power flow problem needs to be minimized or controlled as soon as possible in a deregulated power system since the transmission systems are mostly operated at their power-carrying limits or very close to it. The time spent for simulations to determine the current states of all the system and control variables of the interconnected power system is important. Taking necessary action in case of any failure of equipment or any other occurrence of an undesired situation could be critical. Using supercomputing facilities and parallel computing techniques together decreases the computation time greatly. In this study, a parallel implementation of a multiobjective optimization approach based on both genetic algorithms and fuzzy decision making to manage unscheduled flows is presented. Parallel computation techniques are applied using supercomputers (high-performance computers). The proposed method is applied to the IEEE 300 bus test system. Two different cases for some parameters of GA are considered to see the power of parallel computation technique. Then the simulation results are presented.


Author(s):  
Ana Moreton–Fernandez ◽  
Hector Ortega–Arranz ◽  
Arturo Gonzalez–Escribano

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.


2004 ◽  
Vol 14 (02) ◽  
pp. 197-216 ◽  
Author(s):  
RADU STEFANESCU ◽  
XAVIER PENNEC ◽  
NICHOLAS AYACHE

Over recent years, non-rigid registration has become a major issue in medical imaging. It consists in recovering a dense point-to-point correspondence field between two images, and usually takes a long time. This is in contrast to the needs of a clinical environment, where usability and speed are major constraints, leading to the necessity of reducing the computation time from slightly less than an hour to just a few minutes. As financial pressure makes it hard for healthcare organizations to invest in expensive high-performance computing (HPC) solutions, cluster computing proves to be a convenient solution to our computation needs, offering a large processing power at a low cost. Among the fast and efficient non-rigid registration methods, we chose the demons algorithm for its simplicity and good performances. The parallel implementation decomposes the correspondence field into spatial blocks, each block being assigned to a node of the cluster. We obtained an acceleration of 11 by using 15 2GHz PC's connected through a 1GB/s Ethernet network and reduced the computation time from 40min to 3min30. In order to further optimize the costs and the maintenance load, we investigate in the second part the transparent use of shared computing resources, either through a graphic client or a Web one.


Sign in / Sign up

Export Citation Format

Share Document