scholarly journals ACCELERATING 3D NON-RIGID REGISTRATION USING GRAPHICS HARDWARE

2008 ◽  
Vol 08 (01) ◽  
pp. 81-98 ◽  
Author(s):  
NICOLAS COURTY ◽  
PIERRE HELLIER

There is an increasing need for real-time implementation of 3D image analysis processes, especially in the context of image-guided surgery. Among the various image analysis tasks, non-rigid image registration is particularly needed and is also computationally prohibitive. This paper presents a GPU (Graphical Processing Unit) implementation of the popular Demons algorithm using a Gaussian recursive filtering. Acceleration of the classical method is mainly achieved by a new filtering scheme on GPU which could be reused in or extended to other applications and denotes a significant contribution to the GPU-based image processing domain. This implementation was able to perform a non-rigid registration of 3D MR volumes in less than one minute, which corresponds to an acceleration factor of 10 compared to the corresponding CPU implementation. This demonstrated the usefulness of such method in an intra-operative context.

2014 ◽  
Vol 6 (4) ◽  
pp. 72-91
Author(s):  
Timothy W. C. Johnson ◽  
John R. Rankin

Large-scale Agent-Based Modelling and Simulation (ABMS) is a field of research that is becoming increasingly popular as researchers work to construct simulations at a higher level of complexity and realism than previously done. These systems can not only be difficult and time consuming to implement, but can also be constrained in their scope due to issues arising from a shortage of available processing power. This work simultaneously presents solutions to these two problems by demonstrating a model for ABMS that allows a developer to design their own simulation, which is then automatically converted into code capable of running on a mainstream Graphical Processing Unit (GPU). By harnessing the extra processing power afforded by the GPU this paper creates simulations that are capable of running in real-time with more autonomous agents than allowed by systems using traditional x86 processors.


Author(s):  
Soumya Ranjan Nayak ◽  
S Sivakumar ◽  
Akash Kumar Bhoi ◽  
Gyoo-Soo Chae ◽  
Pradeep Kumar Mallick

Graphical processing unit (GPU) has gained more popularity among researchers in the field of decision making and knowledge discovery systems. However, most of the earlier studies have GPU memory utilization, computational time, and accuracy limitations. The main contribution of this paper is to present a novel algorithm called the Mixed Mode Database Miner (MMDBM) classifier by implementing multithreading concepts on a large number of attributes. The proposed method use the quick sort algorithm in GPU parallel computing to overcome the state of the art limitations. This method applies the dynamic rule generation approach for constructing the decision tree based on the predicted rules. Moreover, the implementation results are compared with both SLIQ and MMDBM using Java and GPU with the computed acceleration ratio time using the BP dataset. The primary objective of this work is to improve the performance with less processing time. The results are also analyzed using various threads in GPU mining using eight different datasets of UCI Machine learning repository. The proposed MMDBM algorithm have been validated on these chosen eight different dataset with accuracy of 91.3% in diabetes, 89.1% in breast cancer, 96.6% in iris, 89.9% in labor, 95.4% in vote, 89.5% in credit card, 78.7% in supermarket and 78.7% in BP, and simultaneously, it also takes less computational time for given datasets. The outcome of this work will be beneficial for the research community to develop more effective multi thread based GPU solution in GPU mining to handle large set of data in minimal processing time. Therefore, this can be considered a more reliable and precise method for GPU computing.


2021 ◽  
Author(s):  
Wing Keung Cheung ◽  
Robert Bell ◽  
Arjun Nair ◽  
Leon Menezies ◽  
Riyaz Patel ◽  
...  

AbstractA fully automatic two-dimensional Unet model is proposed to segment aorta and coronary arteries in computed tomography images. Two models are trained to segment two regions of interest, (1) the aorta and the coronary arteries or (2) the coronary arteries alone. Our method achieves 91.20% and 88.80% dice similarity coefficient accuracy on regions of interest 1 and 2 respectively. Compared with a semi-automatic segmentation method, our model performs better when segmenting the coronary arteries alone. The performance of the proposed method is comparable to existing published two-dimensional or three-dimensional deep learning models. Furthermore, the algorithmic and graphical processing unit memory efficiencies are maintained such that the model can be deployed within hospital computer networks where graphical processing units are typically not available.


Processes ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1199
Author(s):  
Ravie Chandren Muniyandi ◽  
Ali Maroosi

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.


SIMULATION ◽  
2011 ◽  
Vol 88 (6) ◽  
pp. 746-761 ◽  
Author(s):  
Kalyan S Perumalla ◽  
Brandon G Aaby ◽  
Srikanth B Yoginath ◽  
Sudip K Seal

Sign in / Sign up

Export Citation Format

Share Document