GPU Collision Detection Using Spatial Subdivision With Applications in Contact Dynamics

Author(s):  
Hammad Mazhar

This work concentrates on the issue of rigid body collision detection, a critical component of any software package employed to approximate the dynamics of multibody systems with frictional contact. This paper presents a scalable collision detection algorithm designed for massively parallel computing architectures. The approach proposed is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a 40x speedup over state-of-the art Central Processing Unit (CPU) implementations when handling multi-million object collision detection. GPUs are composed of many (on the order of hundreds) scalar processors that can simultaneously execute an operation; this strength is leveraged in the proposed algorithm. The approach can detect collisions between five million objects in less than two seconds; with newer GPUs, the capability of detecting collisions between eighty million objects in less than thirty seconds is expected. The proposed methodology is expected to have an impact on a wide range of granular flow dynamics and smoothed particle hydrodynamics applications, e.g. sand, gravel and fluid simulations, where the number of contacts can reach into the hundreds of millions.

Author(s):  
Arman Pazouki ◽  
Hammad Mazhar ◽  
Dan Negrut

This work concentrates on the contact detection of ellipsoids, an enhancement to collision detection which can be used to study the dynamics of multibody systems with frictional contact. A first method for contact detection is posed as an unconstrained optimization problem. This method, while computationally demanding, can find the contact parameters as well as determine the state of the contact of two ellipsoids. Next, a method is presented that is approximately two orders of magnitudes more efficient in finding the contact state of two ellipsoids. However, it cannot find the contact parameters such as contact normal, depth of penetration, etc. Finally, a parallel algorithm for the ellipsoid contact detection problem is presented. The algorithm is implemented on a ubiquitous Graphics Processing Unit (GPU) card and shown to achieve a speedup of up to 70× over a Central Processing Unit (CPU) based method. The proposed methodology is expected to have an impact in granular flow dynamics applications.


:Graphics processing unit (GPU) is a computer programmable chip that could perform rapid mathematical operations that can be accelerated with massive parallelism. In the early days, central processing unit (CPU) was responsible for all computations irrespective of whether it is feasible for parallel computation. However, in recent years GPUs are increasingly used for massively parallel computing applications, such as training Deep Neural Networks. GPU’s performance monitoring plays a key role in this new era since GPUs serve an inevitable role in increasing the speed of analysis of the developed system. GPU administration comes in picture to efficiently utilize the GPU when we deal with multiple workloads to run on the same hardware. In this study, various GPUparameters are monitored and help to keep them in safe levels and also to keep the improved performance of the system. This study,


Author(s):  
Wisoot Sanhan ◽  
Kambiz Vafai ◽  
Niti Kammuang-Lue ◽  
Pradit Terdtoon ◽  
Phrut Sakulchangsatjatai

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.


2018 ◽  
Vol 7 (12) ◽  
pp. 472 ◽  
Author(s):  
Bo Wan ◽  
Lin Yang ◽  
Shunping Zhou ◽  
Run Wang ◽  
Dezhi Wang ◽  
...  

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4582
Author(s):  
Changjie Cai ◽  
Tomoki Nishimura ◽  
Jooyeon Hwang ◽  
Xiao-Ming Hu ◽  
Akio Kuroda

Fluorescent probes can be used to detect various types of asbestos (serpentine and amphibole groups); however, the fiber counting using our previously developed software was not accurate for samples with low fiber concentration. Machine learning-based techniques (e.g., deep learning) for image analysis, particularly Convolutional Neural Networks (CNN), have been widely applied to many areas. The objectives of this study were to (1) create a database of a wide-range asbestos concentration (0–50 fibers/liter) fluorescence microscopy (FM) images in the laboratory; and (2) determine the applicability of the state-of-the-art object detection CNN model, YOLOv4, to accurately detect asbestos. We captured the fluorescence microscopy images containing asbestos and labeled the individual asbestos in the images. We trained the YOLOv4 model with the labeled images using one GTX 1660 Ti Graphics Processing Unit (GPU). Our results demonstrated the exceptional capacity of the YOLOv4 model to learn the fluorescent asbestos morphologies. The mean average precision at a threshold of 0.5 ([email protected]) was 96.1% ± 0.4%, using the National Institute for Occupational Safety and Health (NIOSH) fiber counting Method 7400 as a reference method. Compared to our previous counting software (Intec/HU), the YOLOv4 achieved higher accuracy (0.997 vs. 0.979), particularly much higher precision (0.898 vs. 0.418), recall (0.898 vs. 0.780) and F-1 score (0.898 vs. 0.544). In addition, the YOLOv4 performed much better for low fiber concentration samples (<15 fibers/liter) compared to Intec/HU. Therefore, the FM method coupled with YOLOv4 is remarkable in detecting asbestos fibers and differentiating them from other non-asbestos particles.


2018 ◽  
Vol 14 (4) ◽  
Author(s):  
G.B. Praveen ◽  
Anita Agrawal ◽  
Shrey Pareek ◽  
Amalin Prince

Abstract Magnetic resonance imaging (MRI) is a widely used imaging modality to evaluate brain disorders. MRI generates huge volumes of data, which consist of a sequence of scans taken at different instances of time. As the presence of brain disorders has to be evaluated on all magnetic resonance (MR) sequences, manual brain disorder detection becomes a tedious process and is prone to inter- and intra-rater errors. A technique for detecting abnormalities in brain MRI using template matching is proposed. Bias filed correction is performed on volumetric scans using N4ITK filter, followed by volumetric registration. Normalized cross-correlation template matching is used for image registration taking into account, the rotation and scaling operations. A template of abnormality is selected which is then matched in the volumetric scans, if found, the corresponding image is retrieved. Post-processing of the retrieved images is performed by the thresholding operation; the coordinates and area of the abnormality are reported. The experiments are carried out on the glioma dataset obtained from Brain Tumor Segmentation Challenge 2013 database (BRATS 2013). Glioma dataset consisted of MR scans of 30 real glioma patients and 50 simulated glioma patients. NVIDIA Compute Unified Device Architecture framework is employed in this paper, and it is found that the detection speed using graphics processing unit is almost four times faster than using only central processing unit. The average Dice and Jaccard coefficients for a wide range of trials are found to be 0.91 and 0.83, respectively.


Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 585
Author(s):  
Yufei Wu ◽  
Xiaofei Ruan ◽  
Yu Zhang ◽  
Huang Zhou ◽  
Shengyu Du ◽  
...  

The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimation. Our architecture is essentially a deeply-supervised pruned network in which less important layers and branches are removed to achieve a higher real-time inference target on resource-constrained devices without much accuracy compromise. We further make deployment optimization to facilitate the parallel execution capability of central processing units (CPUs). We conduct experiments on NYU and ICVL datasets and develop a demo1 using the RealSense camera. Experimental results show our lightweight network achieves an average running time of 32 ms (31.3 FPS, the original is 22.7 FPS) before deployment optimization. Meanwhile, the model is only about half parameters size of the original one with 11.9 mm mean joint error. After the further optimization with OpenVINO, the optimized model can run at 56 FPS on CPUs in contrast to 44 FPS running on a graphics processing unit (GPU) (Tensorflow) and it can achieve the real-time goal.


2020 ◽  
Vol 22 (5) ◽  
pp. 1182-1197
Author(s):  
Geovanny Gordillo ◽  
Mario Morales-Hernández ◽  
I. Echeverribar ◽  
Javier Fernández-Pato ◽  
Pilar García-Navarro

Abstract In this study, a 2D shallow water flow solver integrated with a water quality model is presented. The interaction between the main water quality constituents included is based on the Water Quality Analysis Simulation Program. Efficiency is achieved by computing with a combination of a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) device. This technique is intended to provide robust and accurate simulations with high computation speedups with respect to a single-core CPU in real events. The proposed numerical model is evaluated in cases that include the transport and reaction of water quality components over irregular bed topography and dry–wet fronts, verifying that the numerical solution in these situations conserves the required properties (C-property and positivity). The model can operate in any steady or unsteady form allowing an efficient assessment of the environmental impact of water flows. The field data from an unsteady river reach test case are used to show that the model is capable of predicting the measured temporal distribution of dissolved oxygen and water temperature, proving the robustness and computational efficiency of the model, even in the presence of noisy signals such as wind speed.


2016 ◽  
Vol 6 (1) ◽  
pp. 79-90
Author(s):  
Łukasz Syrocki ◽  
Grzegorz Pestka

AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified Device Architecture (CUDA) technology from NVIDIA, is provided. An integral part of the CUDA is the high level programming environment enabling tracking both code executed on Central Processing Unit and on Graphics Processing Unit. The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.


Sign in / Sign up

Export Citation Format

Share Document