An alternative approach for collaborative simulation execution on a CPU+GPU hybrid system

SIMULATION ◽  
2019 ◽  
Vol 96 (3) ◽  
pp. 347-361
Author(s):  
Wenjie Tang ◽  
Wentong Cai ◽  
Yiping Yao ◽  
Xiao Song ◽  
Feng Zhu

In the past few years, the graphics processing unit (GPU) has been widely used to accelerate time-consuming models in simulations. Since both model computation and simulation management are main factors that affect the performance of large-scale simulations, only accelerating model computation will limit the potential speedup. Moreover, models that can be well accelerated by a GPU could be insufficient, especially for simulations with many lightweight models. Traditionally, the parallel discrete event simulation (PDES) method is used to solve this class of simulation, but most PDES simulators only utilize the central processing unit (CPU) even though the GPU is commonly available now. Hence, we propose an alternative approach for collaborative simulation execution on a CPU+GPU hybrid system. The GPU supports both simulation management and model computation as CPUs. A concurrency-oriented scheduling algorithm was proposed to enable cooperation between the CPU and the GPU, so that multiple computation and communication resources can be efficiently utilized. In addition, GPU functions have also been carefully designed to adapt the algorithm. The combination of those efforts allows the proposed approach to achieve significant speedup compared to the traditional PDES on a CPU.

2018 ◽  
Vol 7 (12) ◽  
pp. 472 ◽  
Author(s):  
Bo Wan ◽  
Lin Yang ◽  
Shunping Zhou ◽  
Run Wang ◽  
Dezhi Wang ◽  
...  

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.


Author(s):  
Zhengkai Wu ◽  
Thomas M. Tucker ◽  
Chandra Nath ◽  
Thomas R. Kurfess ◽  
Richard W. Vuduc

In this paper, both software model visualization with path simulation and associated machining product are produced based on the step ring based 3-axis path planning to demo model-driven graphics processing unit (GPU) feature in tool path planning and 3D image model classification by GPU simulation. Subtractive 3D printing (i.e., 3D machining) is represented as integration between 3D printing modeling and CNC machining via GPU simulated software. Path planning is applied through material surface removal visualization in high resolution and 3D path simulation via ring selective path planning based on accessibility of path through pattern selection. First, the step ring selects critical features to reconstruct computer aided design (CAD) design model as STL (stereolithography) voxel, and then local optimization is attained within interested ring area for time and energy saving of GPU volume generation as compared to global all automatic path planning with longer latency. The reconstructed CAD model comes from an original sample (GATech buzz) with 2D image information. CAD model for optimization and validation is adopted to sustain manufacturing reproduction based on system simulation feedback. To avoid collision with the produced path from retraction path, we pick adaptive ring path generation and prediction in each planning iteration, which may also minimize material removal. Moreover, we did partition analysis and g-code optimization for large scale model and high density volume data. Image classification and grid analysis based on adaptive 3D tree depth are proposed for multi-level set partition of the model to define no cutting zones. After that, accessibility map is computed based on accessibility space for rotational angular space of path orientation to compare step ring based pass planning verses global all path planning. Feature analysis via central processing unit (CPU) or GPU processor for GPU map computation contributes to high performance computing and cloud computing potential through parallel computing application of subtractive 3D printing in the future.


2012 ◽  
Vol 1 (4) ◽  
pp. 88-131 ◽  
Author(s):  
Hamza Gharsellaoui ◽  
Mohamed Khalgui ◽  
Samir Ben Ahmed

Scheduling tasks is an essential requirement in most real-time and embedded systems, but leads to unwanted central processing unit (CPU) overheads. The authors present a real-time schedulability algorithm for preemptable, asynchronous and periodic reconfigurable task systems with arbitrary relative deadlines, scheduled on a uniprocessor by an optimal scheduling algorithm based on the earliest deadline first (EDF) principles and on the dynamic reconfiguration. A reconfiguration scenario is assumed to be a dynamic automatic operation allowing addition, removal or update of operating system’s (OS) functional asynchronous tasks. When such a scenario is applied to save the system at the occurrence of hardware-software faults, or to improve its performance, some real-time properties can be violated. The authors propose an intelligent agent-based architecture where a software agent is used to satisfy the user requirements and to respect time constraints. The agent dynamically provides precious technical solutions for users when these constraints are not verified, by removing tasks according to predefined heuristic, or by modifying the worst case execution times (WCETs), periods, and deadlines of tasks in order to meet deadlines and to minimize their response time. They implement the agent to support these services which are applied to a Blackberry Bold 9700 and to a Volvo system and present and discuss the results of experiments.


2019 ◽  
Vol 9 (16) ◽  
pp. 3305 ◽  
Author(s):  
Claudio Zanzi ◽  
Pablo Gómez ◽  
Joaquín López ◽  
Julio Hernández

One question that often arises is whether a specialized code or a more general code may be equally suitable for fire modeling. This paper investigates the performance and capabilities of a specialized code (FDS) and a general-purpose code (FLUENT) to simulate a fire in the commercial area of an underground intermodal transportation station. In order to facilitate a more precise comparison between the two codes, especially with regard to ventilation issues, the number of factors that may affect the fire evolution is reduced by simplifying the scenario and the fire model. The codes are applied to the same fire scenario using a simplified fire model, which considers a source of mass, heat and species to characterize the fire focus, and whose results are also compared with those obtained using FDS and a combustion model. An oscillating behavior of the fire-induced convective heat and mass fluxes through the natural vents is predicted, whose frequency compares well with experimental results for the ranges of compartment heights and heat release rates considered. The results obtained with the two codes for the smoke and heat propagation patterns and convective fluxes through the forced and natural ventilation systems are discussed and compared to each other. The agreement is very good for the temperature and species concentration distributions and the overall flow pattern, whereas appreciable discrepancies are only found in the oscillatory behavior of the fire-induced convective heat and mass fluxes through the natural vents. The relative performance of the codes in terms of central processing unit (CPU) time consumption is also discussed.


2018 ◽  
Vol 28 (4) ◽  
pp. 1-25 ◽  
Author(s):  
Noah Wolfe ◽  
Misbah Mubarak ◽  
Christopher D. Carothers ◽  
Robert B. Ross ◽  
Philip H. Carns

2019 ◽  
Vol 31 (3) ◽  
pp. 67-82
Author(s):  
Yu Huang ◽  
Wanxing Sheng ◽  
Peipei Jin ◽  
Baicuan Nie ◽  
Meikang Qiu ◽  
...  

Discrete event simulation is the most important and essential part in network simulation. The node-oriented model of discrete event scheduling is a model that allocates computing resources as nodes and makes the discrete event simulation as a simulation task on nodes. In this article the reason of low performance in large-scale network simulation is analyzed, and an ideal node-oriented model of discrete event scheduling is presented and a resource-limited node-oriented model of discrete event scheduling by adding some restrictions on network resources is proposed. Then, the authors complete contrast experiments of the resource-limited node-oriented model of discrete event scheduling and NS2. Finally, packet loss in resource-limited node-oriented model of discrete event scheduling is examined. Also, NS2 is discussed in this article and the authors have proposed an improved method for the packet loss algorithm in a resource-limited node-oriented model of discrete event scheduling.


2020 ◽  
Vol 22 (5) ◽  
pp. 1217-1235 ◽  
Author(s):  
M. Morales-Hernández ◽  
M. B. Sharif ◽  
S. Gangrade ◽  
T. T. Dullo ◽  
S.-C. Kao ◽  
...  

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.


Sign in / Sign up

Export Citation Format

Share Document