CLIJ: GPU-accelerated image processing for everyone

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.

Download Full-text

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-263-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 263-267

Author(s):

B. Kumar ◽

O. Dikshit

Keyword(s):

Classification Accuracy ◽

Graphics Processing Units ◽

Spatial Information ◽

Parallel Implementation ◽

Hyperspectral Imagery ◽

Hyperspectral Images ◽

Important Concern ◽

Speed Up ◽

Graphics Processing ◽

The Impact

Extended morphological profile (EMP) is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs) at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA) C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.

Download Full-text

A GPU-accelerated image reduction pipeline

Publications of the Astronomical Society of Japan ◽

10.1093/pasj/psaa091 ◽

2020 ◽

Author(s):

Masafumi Niwano ◽

Katsuhiro L Murata ◽

Ryo Adachi ◽

Sili Wang ◽

Yutaro Tachibana ◽

...

Keyword(s):

Image Processing ◽

Graphics Processing Units ◽

High Speed ◽

Emission Measure ◽

Robotic Telescope ◽

Graphics Processing ◽

High Speed Image Processing ◽

Python Package ◽

Telescope System

Abstract We developed a high-speed image reduction pipeline using Graphics Processing Units (GPUs) as hardware accelerators. Astronomers desire to detect the emission measure counterpart of gravitational-wave sources as soon as possible and to share in the systematic follow-up observation. Therefore, high-speed image processing is important. We developed a new image-reduction pipeline for our robotic telescope system, which uses a GPU via the Python package CuPy for high-speed image processing. As a result, the new pipeline has increased in processing speed by more than 40 times compared with the current one, while maintaining the same functions.

Download Full-text

Performance-aware programming for intraoperative intensity-based image registration on graphics processing units

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-020-02303-y ◽

2021 ◽

Author(s):

Martin C. W. Leong ◽

Kit-Hang Lee ◽

Bowen P. Y. Kwan ◽

Yui-Lun Ng ◽

Zhiyu Liu ◽

...

Keyword(s):

Image Registration ◽

Open Source ◽

Graphics Processing Units ◽

Navigation Systems ◽

Registration Algorithm ◽

Registration Process ◽

Programming Techniques ◽

Speed Up ◽

Basic Image ◽

Graphics Processing

Abstract Purpose Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm’s robustness, but the innate computation required by the registration algorithm has been unresolved. Methods Intensity-based registration methods involve operations with high arithmetic load and memory access demand, which supposes to be reduced by graphics processing units (GPUs). Although GPUs are widespread and affordable, there is a lack of open-source GPU implementations optimized for non-rigid image registration. This paper demonstrates performance-aware programming techniques, which involves systematic exploitation of GPU features, by implementing the diffeomorphic log-demons algorithm. Results By resolving the pinpointed computation bottlenecks on GPU, our implementation of diffeomorphic log-demons on Nvidia GTX Titan X GPU has achieved ~ 95 times speed-up compared to the CPU and registered a 1.3-M voxel image in 286 ms. Even for large 37-M voxel images, our implementation is able to register in 8.56 s, which attained ~ 258 times speed-up. Our solution involves effective employment of GPU computation units, memory, and data bandwidth to resolve computation bottlenecks. Conclusion The computation bottlenecks in diffeomorphic log-demons are pinpointed, analyzed, and resolved using various GPU performance-aware programming techniques. The proposed fast computation on basic image operations not only enhances the computation of diffeomorphic log-demons, but is also potentially extended to speed up many other intensity-based approaches. Our implementation is open-source on GitHub at https://bit.ly/2PYZxQz.

Download Full-text

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-263-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 263-267

Author(s):

B. Kumar ◽

O. Dikshit

Keyword(s):

Classification Accuracy ◽

Graphics Processing Units ◽

Spatial Information ◽

Parallel Implementation ◽

Hyperspectral Imagery ◽

Hyperspectral Images ◽

Important Concern ◽

Speed Up ◽

Graphics Processing ◽

The Impact

Extended morphological profile (EMP) is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs) at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA) C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.

Download Full-text

A GPU Based Multidimensional Amplitude Analysis to Search for Tetraquark Candidates

10.21203/rs.3.rs-51185/v1 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Speed Up ◽

Computationally Intensive ◽

Quark Structure ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data while physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the quark structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays that can also be seamlessly adapted for other similar analyses. The GooFit fitter running on GPUs shows a remarkable speed-up in the computing performance when compared to a ROOT/RooFit implementation of the same, running on multicore CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

An Implicit Harmonic Balance Method in Graphics Processing Units for Vibrating Blades

Volume 2C: Turbomachinery ◽

10.1115/gt2015-42275 ◽

2015 ◽

Author(s):

Javier Crespo ◽

Roque Corral ◽

Jesus Pueblas

Keyword(s):

Finite Difference ◽

Harmonic Balance ◽

Graphics Processing Units ◽

Stokes Equations ◽

Computational Cost ◽

Harmonic Balance Method ◽

Balance Method ◽

Transonic Compressor ◽

Speed Up ◽

Graphics Processing

An implicit harmonic balance method for modeling the unsteady non-linear periodic flow about vibrating airfoils in turbomachinery is presented. As departing point, an implicit edge-based three-dimensional Reynolds Averaged Navier-Stokes equations solver for unstructured grids that runs both on central processing units (CPUs) and graphics processing units (GPUs) is used. The harmonic balance method performs a spectral discretization of the time derivatives and marches in pseudo-time a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a Backward Finite Difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite difference version of the code whereas and an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the 10th standard aeroelastic configuration and a transonic compressor.

Download Full-text

An Implicit Harmonic Balance Method in Graphics Processing Units for Oscillating Blades

Journal of Turbomachinery ◽

10.1115/1.4031918 ◽

2015 ◽

Vol 138 (3) ◽

Cited By ~ 7

Author(s):

Javier Crespo ◽

Roque Corral ◽

Jesus Pueblas

Keyword(s):

Finite Difference ◽

Harmonic Balance ◽

Graphics Processing Units ◽

Stokes Equations ◽

Computational Cost ◽

Harmonic Balance Method ◽

Transonic Compressor ◽

Central Processing ◽

Speed Up ◽

Graphics Processing

An implicit harmonic balance (HB) method for modeling the unsteady nonlinear periodic flow about vibrating airfoils in turbomachinery is presented. An implicit edge-based three-dimensional Reynolds-averaged Navier–Stokes equations (RANS) solver for unstructured grids, which runs both on central processing units (CPUs) and graphics processing units (GPUs), is used. The HB method performs a spectral discretization of the time derivatives and marches in pseudotime, a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time-spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a backward finite-difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite-difference version of the code, whereas an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the tenth standard aeroelastic configuration and a transonic compressor.

Download Full-text

Speed-Up of SAR Image Formation Processing using Graphics Processing Units

Informatics ◽

10.2316/p.2010.724-015 ◽

2010 ◽

Author(s):

H. Sato ◽

S. Takase ◽

A. Ozaki ◽

T. Wakayama

Keyword(s):

Graphics Processing Units ◽

Image Formation ◽

Sar Image ◽

Speed Up ◽

Graphics Processing

Download Full-text

CLIJ: GPU-accelerated image processing for everyone

High performance image processing of satellite images using graphics processing units

TRAINING NEURAL NETWORK FOR TAXI PASSENGER DEMAND FORECASTING USING GRAPHICS PROCESSING UNITS

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

A GPU-accelerated image reduction pipeline

Performance-aware programming for intraoperative intensity-based image registration on graphics processing units

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

A GPU Based Multidimensional Amplitude Analysis to Search for Tetraquark Candidates

An Implicit Harmonic Balance Method in Graphics Processing Units for Vibrating Blades

An Implicit Harmonic Balance Method in Graphics Processing Units for Oscillating Blades

Speed-Up of SAR Image Formation Processing using Graphics Processing Units

Export Citation Format