scholarly journals Performance-aware programming for intraoperative intensity-based image registration on graphics processing units

Author(s):  
Martin C. W. Leong ◽  
Kit-Hang Lee ◽  
Bowen P. Y. Kwan ◽  
Yui-Lun Ng ◽  
Zhiyu Liu ◽  
...  

Abstract Purpose Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm’s robustness, but the innate computation required by the registration algorithm has been unresolved. Methods Intensity-based registration methods involve operations with high arithmetic load and memory access demand, which supposes to be reduced by graphics processing units (GPUs). Although GPUs are widespread and affordable, there is a lack of open-source GPU implementations optimized for non-rigid image registration. This paper demonstrates performance-aware programming techniques, which involves systematic exploitation of GPU features, by implementing the diffeomorphic log-demons algorithm. Results By resolving the pinpointed computation bottlenecks on GPU, our implementation of diffeomorphic log-demons on Nvidia GTX Titan X GPU has achieved ~ 95 times speed-up compared to the CPU and registered a 1.3-M voxel image in 286 ms. Even for large 37-M voxel images, our implementation is able to register in 8.56 s, which attained ~ 258 times speed-up. Our solution involves effective employment of GPU computation units, memory, and data bandwidth to resolve computation bottlenecks. Conclusion The computation bottlenecks in diffeomorphic log-demons are pinpointed, analyzed, and resolved using various GPU performance-aware programming techniques. The proposed fast computation on basic image operations not only enhances the computation of diffeomorphic log-demons, but is also potentially extended to speed up many other intensity-based approaches. Our implementation is open-source on GitHub at https://bit.ly/2PYZxQz.

2019 ◽  
Author(s):  
Robert Haase ◽  
Loic A. Royer ◽  
Peter Steinbach ◽  
Deborah Schmidt ◽  
Alexandr Dibrov ◽  
...  

AbstractGraphics processing units (GPU) allow image processing at unprecedented speed. We present CLIJ, a Fiji plugin enabling end-users with entry level experience in programming to benefit from GPU-accelerated image processing. Freely programmable workflows can speed up image processing in Fiji by factor 10 and more using high-end GPU hardware and on affordable mobile computers with built-in GPUs.


2020 ◽  
Vol 2 (1) ◽  
pp. 29-36
Author(s):  
M. I. Zghoba ◽  
◽  
Yu. I. Hrytsiuk ◽  

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.


Author(s):  
B. Kumar ◽  
O. Dikshit

Extended morphological profile (EMP) is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs) at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA) C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.


2012 ◽  
Vol 39 (6Part30) ◽  
pp. 4001-4001
Author(s):  
G Warmerdam ◽  
P Steininger ◽  
M Neuner ◽  
G Sharp ◽  
B Winey

Information ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 1 ◽  
Author(s):  
Yu Chen ◽  
Dongxiang Lu ◽  
Guy Courbebaisse

Image registration is a key pre-procedure for high level image processing. However, taking into consideration the complexity and accuracy of the algorithm, the image registration algorithm always has high time complexity. To speed up the registration algorithm, parallel computation is a relevant strategy. Parallelizing the algorithm by implementing Lattice Boltzmann method (LBM) seems a good candidate. In consequence, this paper proposes a novel parallel LBM based model (LB model) for image registration. The main idea of our method consists in simulating the convection diffusion equation through a LB model with an ad hoc collision term. By applying our method on computed tomography angiography images (CTA images), Magnet Resonance images (MR images), natural scene image and artificial images, our model proves to be faster than classical methods and achieves accurate registration. In the continuity of 2D image registration model, the LB model is extended to 3D volume registration providing excellent results in domain such as medical imaging. Our method can run on massively parallel architectures, ranging from embedded field programmable gate arrays (FPGAs) and digital signal processors (DSPs) up to graphics processing units (GPUs).


Author(s):  
B. Kumar ◽  
O. Dikshit

Extended morphological profile (EMP) is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs) at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA) C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.


2020 ◽  
Author(s):  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data while physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the quark structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays that can also be seamlessly adapted for other similar analyses. The GooFit fitter running on GPUs shows a remarkable speed-up in the computing performance when compared to a ROOT/RooFit implementation of the same, running on multicore CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.


2019 ◽  
Vol 35 (17) ◽  
pp. 3181-3183 ◽  
Author(s):  
Patryk Orzechowski ◽  
Jason H Moore

Abstract Motivation In this paper, we present an open source package with the latest release of Evolutionary-based BIClustering (EBIC), a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding a full support for multiple graphics processing units (GPUs) support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from the analysis. Results Evolutionary-based BIClustering was applied to datasets of different sizes, including a large DNA methylation dataset with 436 444 rows. For the largest dataset we observed over 6.6-fold speedup in computation time on a cluster of eight GPUs compared to running the method on a single GPU. This proves high scalability of the method. Availability and implementation The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic. Installation and usage instructions are also available online. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document