Performance-aware programming for intraoperative intensity-based image registration on graphics processing units

Abstract Purpose Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm’s robustness, but the innate computation required by the registration algorithm has been unresolved. Methods Intensity-based registration methods involve operations with high arithmetic load and memory access demand, which supposes to be reduced by graphics processing units (GPUs). Although GPUs are widespread and affordable, there is a lack of open-source GPU implementations optimized for non-rigid image registration. This paper demonstrates performance-aware programming techniques, which involves systematic exploitation of GPU features, by implementing the diffeomorphic log-demons algorithm. Results By resolving the pinpointed computation bottlenecks on GPU, our implementation of diffeomorphic log-demons on Nvidia GTX Titan X GPU has achieved ~ 95 times speed-up compared to the CPU and registered a 1.3-M voxel image in 286 ms. Even for large 37-M voxel images, our implementation is able to register in 8.56 s, which attained ~ 258 times speed-up. Our solution involves effective employment of GPU computation units, memory, and data bandwidth to resolve computation bottlenecks. Conclusion The computation bottlenecks in diffeomorphic log-demons are pinpointed, analyzed, and resolved using various GPU performance-aware programming techniques. The proposed fast computation on basic image operations not only enhances the computation of diffeomorphic log-demons, but is also potentially extended to speed up many other intensity-based approaches. Our implementation is open-source on GitHub at https://bit.ly/2PYZxQz.

Download Full-text

CLIJ: GPU-accelerated image processing for everyone

10.1101/660704 ◽

2019 ◽

Cited By ~ 1

Author(s):

Robert Haase ◽

Loic A. Royer ◽

Peter Steinbach ◽

Deborah Schmidt ◽

Alexandr Dibrov ◽

...

Keyword(s):

Image Processing ◽

Graphics Processing Units ◽

End Users ◽

Entry Level ◽

Speed Up ◽

Mobile Computers ◽

Graphics Processing

AbstractGraphics processing units (GPU) allow image processing at unprecedented speed. We present CLIJ, a Fiji plugin enabling end-users with entry level experience in programming to benefit from GPU-accelerated image processing. Freely programmable workflows can speed up image processing in Fiji by factor 10 and more using high-end GPU hardware and on affordable mobile computers with built-in GPUs.

Download Full-text

HyperMix: An Open-Source Tool for Fast Spectral Unmixing on Graphics Processing Units

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2015.2435001 ◽

2015 ◽

Vol 12 (9) ◽

pp. 1883-1887 ◽

Cited By ~ 5

Author(s):

Luis Ignacio Jimenez ◽

Antonio Plaza

Keyword(s):

Open Source ◽

Graphics Processing Units ◽

Spectral Unmixing ◽

Open Source Tool ◽

Graphics Processing

Download Full-text

Performance-aware programming for intraoperative intensity-based image registration on graphics processing units

10.5353/th_991044058179703414 ◽

2018 ◽

Author(s):

Chun-wing, Martin Leong

Keyword(s):

Image Registration ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

TRAINING NEURAL NETWORK FOR TAXI PASSENGER DEMAND FORECASTING USING GRAPHICS PROCESSING UNITS

Ukrainian Journal of Information Technology ◽

10.23939/ujit2020.02.029 ◽

2020 ◽

Vol 2 (1) ◽

pp. 29-36

Author(s):

M. I. Zghoba ◽

◽

Yu. I. Hrytsiuk ◽

Keyword(s):

Neural Network ◽

Graphics Processing Units ◽

Neural Network Training ◽

Training Time ◽

Passenger Demand ◽

Network Training ◽

The Neural Network ◽

Input Dataset ◽

Speed Up ◽

Graphics Processing

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.

Download Full-text

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-263-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 263-267

Author(s):

B. Kumar ◽

O. Dikshit

Keyword(s):

Classification Accuracy ◽

Graphics Processing Units ◽

Spatial Information ◽

Parallel Implementation ◽

Hyperspectral Imagery ◽

Hyperspectral Images ◽

Important Concern ◽

Speed Up ◽

Graphics Processing ◽

The Impact

Extended morphological profile (EMP) is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs) at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA) C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.

Download Full-text

TH-C-BRA-10: An Open-Source 2D/3D-Image-Registration Algorithm: Cranial Image Guided Radiotherapy

Medical Physics ◽

10.1118/1.4736326 ◽

2012 ◽

Vol 39 (6Part30) ◽

pp. 4001-4001

Author(s):

G Warmerdam ◽

P Steininger ◽

M Neuner ◽

G Sharp ◽

B Winey

Keyword(s):

Image Registration ◽

Open Source ◽

3D Image ◽

Image Guided Radiotherapy ◽

Image Guided ◽

Registration Algorithm ◽

Image Registration Algorithm ◽

3D Image Registration

Download Full-text

A Parallel Image Registration Algorithm Based on a Lattice Boltzmann Model

Information ◽

10.3390/info11010001 ◽

2019 ◽

Vol 11 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Yu Chen ◽

Dongxiang Lu ◽

Guy Courbebaisse

Keyword(s):

Image Registration ◽

Lattice Boltzmann ◽

Graphics Processing Units ◽

Ad Hoc ◽

Main Idea ◽

Digital Signal ◽

Lattice Boltzmann Model ◽

Collision Term ◽

Registration Algorithm ◽

Image Registration Algorithm

Image registration is a key pre-procedure for high level image processing. However, taking into consideration the complexity and accuracy of the algorithm, the image registration algorithm always has high time complexity. To speed up the registration algorithm, parallel computation is a relevant strategy. Parallelizing the algorithm by implementing Lattice Boltzmann method (LBM) seems a good candidate. In consequence, this paper proposes a novel parallel LBM based model (LB model) for image registration. The main idea of our method consists in simulating the convection diffusion equation through a LB model with an ad hoc collision term. By applying our method on computed tomography angiography images (CTA images), Magnet Resonance images (MR images), natural scene image and artificial images, our model proves to be faster than classical methods and achieves accurate registration. In the continuity of 2D image registration model, the LB model is extended to 3D volume registration providing excellent results in domain such as medical imaging. Our method can run on massively parallel architectures, ranging from embedded field programmable gate arrays (FPGAs) and digital signal processors (DSPs) up to graphics processing units (GPUs).

Download Full-text

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-263-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 263-267

Author(s):

B. Kumar ◽

O. Dikshit

Keyword(s):

Classification Accuracy ◽

Graphics Processing Units ◽

Spatial Information ◽

Parallel Implementation ◽

Hyperspectral Imagery ◽

Hyperspectral Images ◽

Important Concern ◽

Speed Up ◽

Graphics Processing ◽

The Impact

Download Full-text

A GPU Based Multidimensional Amplitude Analysis to Search for Tetraquark Candidates

10.21203/rs.3.rs-51185/v1 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Speed Up ◽

Computationally Intensive ◽

Quark Structure ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data while physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the quark structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays that can also be seamlessly adapted for other similar analyses. The GooFit fitter running on GPUs shows a remarkable speed-up in the computing performance when compared to a ROOT/RooFit implementation of the same, running on multicore CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

EBIC: an open source software for high-dimensional and big data analyses

Bioinformatics ◽

10.1093/bioinformatics/btz027 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3181-3183 ◽

Cited By ~ 3

Author(s):

Patryk Orzechowski ◽

Jason H Moore

Keyword(s):

Open Source ◽

Open Source Software ◽

Graphics Processing Units ◽

Missing Values ◽

Computation Time ◽

Supplementary Information ◽

Full Support ◽

High Scalability ◽

Genomic Data Mining ◽

Graphics Processing

Abstract Motivation In this paper, we present an open source package with the latest release of Evolutionary-based BIClustering (EBIC), a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding a full support for multiple graphics processing units (GPUs) support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from the analysis. Results Evolutionary-based BIClustering was applied to datasets of different sizes, including a large DNA methylation dataset with 436 444 rows. For the largest dataset we observed over 6.6-fold speedup in computation time on a cluster of eight GPUs compared to running the method on a single GPU. This proves high scalability of the method. Availability and implementation The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic. Installation and usage instructions are also available online. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text