ACCELERATING 3D NON-RIGID REGISTRATION USING GRAPHICS HARDWARE

There is an increasing need for real-time implementation of 3D image analysis processes, especially in the context of image-guided surgery. Among the various image analysis tasks, non-rigid image registration is particularly needed and is also computationally prohibitive. This paper presents a GPU (Graphical Processing Unit) implementation of the popular Demons algorithm using a Gaussian recursive filtering. Acceleration of the classical method is mainly achieved by a new filtering scheme on GPU which could be reused in or extended to other applications and denotes a significant contribution to the GPU-based image processing domain. This implementation was able to perform a non-rigid registration of 3D MR volumes in less than one minute, which corresponds to an acceleration factor of 10 compared to the corresponding CPU implementation. This demonstrated the usefulness of such method in an intra-operative context.

Download Full-text

Performance of a Parallel Multi-Agent Simulation using Graphics Hardware

International Journal of Agent Technologies and Systems ◽

10.4018/ijats.2014100104 ◽

2014 ◽

Vol 6 (4) ◽

pp. 72-91

Author(s):

Timothy W. C. Johnson ◽

John R. Rankin

Keyword(s):

Large Scale ◽

Autonomous Agents ◽

Graphics Hardware ◽

Processing Unit ◽

Agent Based ◽

Agent Based Modelling ◽

Processing Power ◽

Agent Simulation ◽

Multi Agent ◽

Graphical Processing

Large-scale Agent-Based Modelling and Simulation (ABMS) is a field of research that is becoming increasingly popular as researchers work to construct simulations at a higher level of complexity and realism than previously done. These systems can not only be difficult and time consuming to implement, but can also be constrained in their scope due to issues arising from a shortage of available processing power. This work simultaneously presents solutions to these two problems by demonstrating a model for ABMS that allows a developer to design their own simulation, which is then automatically converted into code capable of running on a mainstream Graphical Processing Unit (GPU). By harnessing the extra processing power afforded by the GPU this paper creates simulations that are capable of running in real-time with more autonomous agents than allowed by systems using traditional x86 processors.

Download Full-text

Graphical processing unit implementation of an integrated shape-based active contour: Application to digital pathology

Journal of Pathology Informatics ◽

10.4103/2153-3539.92029 ◽

2011 ◽

Vol 2 (2) ◽

pp. 13 ◽

Cited By ~ 3

Author(s):

Sahirzeeshan Ali ◽

Anant Madabhushi

Keyword(s):

Active Contour ◽

Digital Pathology ◽

Graphical Processing Unit ◽

Processing Unit ◽

Graphical Processing Unit Implementation ◽

Graphical Processing

Download Full-text

Mixed-mode database miner classifier: Parallel computation of graphical processing unit mining

International Journal of Electrical Engineering Education ◽

10.1177/0020720920988494 ◽

2021 ◽

pp. 002072092098849

Author(s):

Soumya Ranjan Nayak ◽

S Sivakumar ◽

Akash Kumar Bhoi ◽

Gyoo-Soo Chae ◽

Pradeep Kumar Mallick

Keyword(s):

Credit Card ◽

Mixed Mode ◽

Processing Time ◽

Gpu Computing ◽

Graphical Processing Unit ◽

Computational Time ◽

Processing Unit ◽

Large Set ◽

Minimal Processing ◽

Graphical Processing

Graphical processing unit (GPU) has gained more popularity among researchers in the field of decision making and knowledge discovery systems. However, most of the earlier studies have GPU memory utilization, computational time, and accuracy limitations. The main contribution of this paper is to present a novel algorithm called the Mixed Mode Database Miner (MMDBM) classifier by implementing multithreading concepts on a large number of attributes. The proposed method use the quick sort algorithm in GPU parallel computing to overcome the state of the art limitations. This method applies the dynamic rule generation approach for constructing the decision tree based on the predicted rules. Moreover, the implementation results are compared with both SLIQ and MMDBM using Java and GPU with the computed acceleration ratio time using the BP dataset. The primary objective of this work is to improve the performance with less processing time. The results are also analyzed using various threads in GPU mining using eight different datasets of UCI Machine learning repository. The proposed MMDBM algorithm have been validated on these chosen eight different dataset with accuracy of 91.3% in diabetes, 89.1% in breast cancer, 96.6% in iris, 89.9% in labor, 95.4% in vote, 89.5% in credit card, 78.7% in supermarket and 78.7% in BP, and simultaneously, it also takes less computational time for given datasets. The outcome of this work will be beneficial for the research community to develop more effective multi thread based GPU solution in GPU mining to handle large set of data in minimal processing time. Therefore, this can be considered a more reliable and precise method for GPU computing.

Download Full-text

A graphical processing unit‐based parallel hybrid genetic algorithm for resource‐constrained multi‐project scheduling problem

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6266 ◽

2021 ◽

Author(s):

Furkan Uysal ◽

Rifat Sonmez ◽

Selcuk Kursat Isleyen

Keyword(s):

Genetic Algorithm ◽

Project Scheduling ◽

Hybrid Genetic Algorithm ◽

Graphical Processing Unit ◽

Processing Unit ◽

Scheduling Problem ◽

Resource Constrained ◽

Parallel Hybrid ◽

Project Scheduling Problem ◽

Graphical Processing

Download Full-text

A computationally efficient approach to segmentation of the aorta and coronary arteries using deep learning

10.1101/2021.02.18.21252005 ◽

2021 ◽

Author(s):

Wing Keung Cheung ◽

Robert Bell ◽

Arjun Nair ◽

Leon Menezies ◽

Riyaz Patel ◽

...

Keyword(s):

Deep Learning ◽

Coronary Arteries ◽

Automatic Segmentation ◽

Three Dimensional ◽

Regions Of Interest ◽

Dice Similarity Coefficient ◽

Processing Unit ◽

Two Dimensional ◽

Computed Tomography Images ◽

Graphical Processing

AbstractA fully automatic two-dimensional Unet model is proposed to segment aorta and coronary arteries in computed tomography images. Two models are trained to segment two regions of interest, (1) the aorta and the coronary arteries or (2) the coronary arteries alone. Our method achieves 91.20% and 88.80% dice similarity coefficient accuracy on regions of interest 1 and 2 respectively. Compared with a semi-automatic segmentation method, our model performs better when segmenting the coronary arteries alone. The performance of the proposed method is comparable to existing published two-dimensional or three-dimensional deep learning models. Furthermore, the algorithmic and graphical processing unit memory efficiencies are maintained such that the model can be deployed within hospital computer networks where graphical processing units are typically not available.

Download Full-text

Learning-Based 2D/3D Rigid Registration Using Jensen-Shannon Divergence for Image-Guided Surgery

Lecture Notes in Computer Science - Medical Imaging and Augmented Reality ◽

10.1007/11812715_29 ◽

2006 ◽

pp. 228-235 ◽

Cited By ~ 4

Author(s):

Rui Liao ◽

Christoph Guetter ◽

Chenyang Xu ◽

Yiyong Sun ◽

Ali Khamene ◽

...

Keyword(s):

Image Guided Surgery ◽

Rigid Registration ◽

Guided Surgery ◽

Image Guided ◽

Jensen Shannon Divergence

Download Full-text

Graphical processing unit (GPU) acceleration for numerical solution of population balance models using high resolution finite volume algorithm

Computers & Chemical Engineering ◽

10.1016/j.compchemeng.2016.03.023 ◽

2016 ◽

Vol 91 ◽

pp. 167-181 ◽

Cited By ~ 25

Author(s):

Botond Szilágyi ◽

Zoltán K. Nagy

Keyword(s):

High Resolution ◽

Numerical Solution ◽

Finite Volume ◽

Population Balance ◽

Graphical Processing Unit ◽

Gpu Acceleration ◽

Processing Unit ◽

Graphical Processing ◽

Volume Algorithm

Download Full-text

Low latency iterative reconstruction of first pass stress cardiac perfusion with physiological stress using graphical processing unit

Journal of Cardiovascular Magnetic Resonance ◽

10.1186/1532-429x-15-s1-e10 ◽

2013 ◽

Vol 15 (S1) ◽

Author(s):

Sébastien Roujol ◽

Tamer A Basha ◽

Christophe Schülke ◽

Martin Buehrer ◽

Warren J Manning ◽

...

Keyword(s):

Iterative Reconstruction ◽

Physiological Stress ◽

Graphical Processing Unit ◽

Low Latency ◽

Processing Unit ◽

Cardiac Perfusion ◽

First Pass ◽

Graphical Processing

Download Full-text

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Processes ◽

10.3390/pr8091199 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1199

Author(s):

Ravie Chandren Muniyandi ◽

Ali Maroosi

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Hamiltonian Path ◽

Fold Increase ◽

General Purpose ◽

Processing Unit ◽

Thread Block ◽

Hard Problems ◽

Graphical Processing ◽

Graphics Processing

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

Download Full-text