GPU Collision Detection Using Spatial Subdivision With Applications in Contact Dynamics

Volume 4: 7th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B and C ◽

10.1115/detc2009-86366 ◽

2009 ◽

Author(s):

Hammad Mazhar

Keyword(s):

Collision Detection ◽

Graphics Processing Unit ◽

Detection Algorithm ◽

Contact Dynamics ◽

Processing Unit ◽

Central Processing ◽

Dynamics Of Multibody Systems ◽

Wide Range ◽

Particle Hydrodynamics ◽

Massively Parallel Computing

This work concentrates on the issue of rigid body collision detection, a critical component of any software package employed to approximate the dynamics of multibody systems with frictional contact. This paper presents a scalable collision detection algorithm designed for massively parallel computing architectures. The approach proposed is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a 40x speedup over state-of-the art Central Processing Unit (CPU) implementations when handling multi-million object collision detection. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm. The approach can detect collisions between five million objects in less than two seconds; with newer GPUs, the capability of detecting collisions between eighty million objects in less than thirty seconds is expected. The proposed methodology is expected to have an impact on a wide range of granular flow dynamics and smoothed particle hydrodynamics applications, e.g. sand, gravel and fluid simulations, where the number of contacts can reach into the hundreds of millions.

Download Full-text

Parallel Ellipsoid Collision Detection With Applications in Contact Dynamics

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-29073 ◽

2010 ◽

Cited By ~ 2

Author(s):

Arman Pazouki ◽

Hammad Mazhar ◽

Dan Negrut

Keyword(s):

Collision Detection ◽

Graphics Processing Unit ◽

Contact Dynamics ◽

Contact Detection ◽

Processing Unit ◽

Depth Of Penetration ◽

Central Processing ◽

Dynamics Of Multibody Systems ◽

Graphics Processing ◽

Contact Parameters

This work concentrates on the contact detection of ellipsoids, an enhancement to collision detection which can be used to study the dynamics of multibody systems with frictional contact. A first method for contact detection is posed as an unconstrained optimization problem. This method, while computationally demanding, can find the contact parameters as well as determine the state of the contact of two ellipsoids. Next, a method is presented that is approximately two orders of magnitudes more efficient in finding the contact state of two ellipsoids. However, it cannot find the contact parameters such as contact normal, depth of penetration, etc. Finally, a parallel algorithm for the ellipsoid contact detection problem is presented. The algorithm is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a speedup of up to 70× over a Central Processing Unit (CPU) based method. The proposed methodology is expected to have an impact in granular flow dynamics applications.

Download Full-text

Modular Microservice based GPU Utilization Manager with Gunicorn

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i4.pp230-237 ◽

2020 ◽

pp. 230-237

Keyword(s):

Performance Monitoring ◽

Graphics Processing Unit ◽

Processing Unit ◽

New Era ◽

Central Processing ◽

Massively Parallel Computing ◽

Improved Performance ◽

Graphics Processing ◽

Mathematical Operations ◽

Speed Of Analysis

:Graphics processing unit (GPU) is a computer programmable chip that could perform rapid mathematical operations that can be accelerated with massive parallelism. In the early days, central processing unit (CPU) was responsible for all computations irrespective of whether it is feasible for parallel computation. However, in recent years GPUs are increasingly used for massively parallel computing applications, such as training Deep Neural Networks. GPU’s performance monitoring plays a key role in this new era since GPUs serve an inevitable role in increasing the speed of analysis of the developed system. GPU administration comes in picture to efficiently utilize the GPU when we deal with multiple workloads to run on the same hardware. In this study, various GPUparameters are monitored and help to keep them in safe levels and also to keep the improved performance of the system. This study,

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Asbestos Detection with Fluorescence Microscopy Images and Deep Learning

Sensors ◽

10.3390/s21134582 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4582

Author(s):

Changjie Cai ◽

Tomoki Nishimura ◽

Jooyeon Hwang ◽

Xiao-Ming Hu ◽

Akio Kuroda

Keyword(s):

Deep Learning ◽

Fluorescence Microscopy ◽

Occupational Safety ◽

Graphics Processing Unit ◽

Reference Method ◽

Processing Unit ◽

Fiber Concentration ◽

Art Object ◽

Wide Range ◽

Microscopy Images

Fluorescent probes can be used to detect various types of asbestos (serpentine and amphibole groups); however, the fiber counting using our previously developed software was not accurate for samples with low fiber concentration. Machine learning-based techniques (e.g., deep learning) for image analysis, particularly Convolutional Neural Networks (CNN), have been widely applied to many areas. The objectives of this study were to (1) create a database of a wide-range asbestos concentration (0–50 fibers/liter) fluorescence microscopy (FM) images in the laboratory; and (2) determine the applicability of the state-of-the-art object detection CNN model, YOLOv4, to accurately detect asbestos. We captured the fluorescence microscopy images containing asbestos and labeled the individual asbestos in the images. We trained the YOLOv4 model with the labeled images using one GTX 1660 Ti Graphics Processing Unit (GPU). Our results demonstrated the exceptional capacity of the YOLOv4 model to learn the fluorescent asbestos morphologies. The mean average precision at a threshold of 0.5 ([email protected]) was 96.1% ± 0.4%, using the National Institute for Occupational Safety and Health (NIOSH) fiber counting Method 7400 as a reference method. Compared to our previous counting software (Intec/HU), the YOLOv4 achieved higher accuracy (0.997 vs. 0.979), particularly much higher precision (0.898 vs. 0.418), recall (0.898 vs. 0.780) and F-1 score (0.898 vs. 0.544). In addition, the YOLOv4 performed much better for low fiber concentration samples (<15 fibers/liter) compared to Intec/HU. Therefore, the FM method coupled with YOLOv4 is remarkable in detecting asbestos fibers and differentiating them from other non-asbestos particles.

Download Full-text

Brain abnormality detection using template matching

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0029 ◽

2018 ◽

Vol 14 (4) ◽

Author(s):

G.B. Praveen ◽

Anita Agrawal ◽

Shrey Pareek ◽

Amalin Prince

Keyword(s):

Magnetic Resonance ◽

Template Matching ◽

Imaging Modality ◽

Brain Mri ◽

Tumor Segmentation ◽

Processing Unit ◽

Brain Disorders ◽

Brain Tumor Segmentation ◽

Central Processing ◽

Wide Range

Abstract Magnetic resonance imaging (MRI) is a widely used imaging modality to evaluate brain disorders. MRI generates huge volumes of data, which consist of a sequence of scans taken at different instances of time. As the presence of brain disorders has to be evaluated on all magnetic resonance (MR) sequences, manual brain disorder detection becomes a tedious process and is prone to inter- and intra-rater errors. A technique for detecting abnormalities in brain MRI using template matching is proposed. Bias filed correction is performed on volumetric scans using N4ITK filter, followed by volumetric registration. Normalized cross-correlation template matching is used for image registration taking into account, the rotation and scaling operations. A template of abnormality is selected which is then matched in the volumetric scans, if found, the corresponding image is retrieved. Post-processing of the retrieved images is performed by the thresholding operation; the coordinates and area of the abnormality are reported. The experiments are carried out on the glioma dataset obtained from Brain Tumor Segmentation Challenge 2013 database (BRATS 2013). Glioma dataset consisted of MR scans of 30 real glioma patients and 50 simulated glioma patients. NVIDIA Compute Unified Device Architecture framework is employed in this paper, and it is found that the detection speed using graphics processing unit is almost four times faster than using only central processing unit. The average Dice and Jaccard coefficients for a wide range of trials are found to be 0.91 and 0.83, respectively.

Download Full-text

Lightweight Architecture for Real-Time Hand Pose Estimation with Deep Supervision

Symmetry ◽

10.3390/sym11040585 ◽

2019 ◽

Vol 11 (4) ◽

pp. 585

Author(s):

Yufei Wu ◽

Xiaofei Ruan ◽

Yu Zhang ◽

Huang Zhou ◽

Shengyu Du ◽

...

Keyword(s):

Real Time ◽

Pose Estimation ◽

Graphics Processing Unit ◽

Parallel Execution ◽

Processing Unit ◽

Network Efficiency ◽

Hand Pose Estimation ◽

Central Processing ◽

Deployment Optimization ◽

Hand Pose

The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimation. Our architecture is essentially a deeply-supervised pruned network in which less important layers and branches are removed to achieve a higher real-time inference target on resource-constrained devices without much accuracy compromise. We further make deployment optimization to facilitate the parallel execution capability of central processing units (CPUs). We conduct experiments on NYU and ICVL datasets and develop a demo1 using the RealSense camera. Experimental results show our lightweight network achieves an average running time of 32 ms (31.3 FPS, the original is 22.7 FPS) before deployment optimization. Meanwhile, the model is only about half parameters size of the original one with 11.9 mm mean joint error. After the further optimization with OpenVINO, the optimized model can run at 56 FPS on CPUs in contrast to 44 FPS running on a graphics processing unit (GPU) (Tensorflow) and it can achieve the real-time goal.

Download Full-text

A GPU-based 2D shallow water quality model

Journal of Hydroinformatics ◽

10.2166/hydro.2020.030 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1182-1197

Author(s):

Geovanny Gordillo ◽

Mario Morales-Hernández ◽

I. Echeverribar ◽

Javier Fernández-Pato ◽

Pilar García-Navarro

Keyword(s):

Water Quality ◽

Shallow Water ◽

Graphics Processing Unit ◽

Temporal Distribution ◽

Water Quality Model ◽

Quality Analysis ◽

Test Case ◽

Processing Unit ◽

Quality Model ◽

Central Processing

Abstract In this study, a 2D shallow water flow solver integrated with a water quality model is presented. The interaction between the main water quality constituents included is based on the Water Quality Analysis Simulation Program. Efficiency is achieved by computing with a combination of a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) device. This technique is intended to provide robust and accurate simulations with high computation speedups with respect to a single-core CPU in real events. The proposed numerical model is evaluated in cases that include the transport and reaction of water quality components over irregular bed topography and dry–wet fronts, verifying that the numerical solution in these situations conserves the required properties (C-property and positivity). The model can operate in any steady or unsteady form allowing an efficient assessment of the environmental impact of water flows. The field data from an unsteady river reach test case are used to show that the model is capable of predicting the measured temporal distribution of dissolved oxygen and water temperature, proving the robustness and computational efficiency of the model, even in the presence of noisy signals such as wind speed.

Download Full-text

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Open Computer Science ◽

10.1515/comp-2016-0006 ◽

2016 ◽

Vol 6 (1) ◽

pp. 79-90

Author(s):

Łukasz Syrocki ◽

Grzegorz Pestka

Keyword(s):

Eigenvalue Problem ◽

Graphics Processing Unit ◽

Generalized Eigenvalue Problem ◽

Processing Unit ◽

Graphics Processors ◽

Central Processing ◽

Generalized Eigenvalue ◽

Cuda Technology ◽

Cuda Architecture ◽

High Level

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.

Download Full-text