A Deflated Assembly Free Approach to Large-Scale Implicit Structural Dynamics

The primary computational bottle-neck in implicit structural dynamics is the repeated inversion of the underlying stiffness matrix. In this paper, a fast inversion technique is proposed by merging four distinct but complementary concepts: (1) voxelization with adaptive local refinement, (2) assembly-free (a.k.a. matrix-free or element-by-element) finite element analysis (FEA), (3) assembly-free deflated conjugate gradient (AF-DCG), and (4) multicore parallelization. In particular, we apply these concepts to the well-known Newmark-beta method, and the resulting AF-DCG is well-suited for large-scale problems. It can be easily ported to many-core central processing unit (CPU) and multicore graphics-programmable unit (GPU) architectures, as demonstrated through numerical experiments.

Download Full-text

Large Scale Finite Element Analysis Via Assembly-Free Deflated Conjugate Gradient

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4028591 ◽

2014 ◽

Vol 14 (4) ◽

Cited By ~ 11

Author(s):

Praveen Yadav ◽

Krishnan Suresh

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Conjugate Gradient ◽

Degrees Of Freedom ◽

Large Scale ◽

Solid Mechanics ◽

Processing Unit ◽

Element Analysis ◽

Central Processing ◽

Computational Bottleneck

Large-scale finite element analysis (FEA) with millions of degrees of freedom (DOF) is becoming commonplace in solid mechanics. The primary computational bottleneck in such problems is the solution of large linear systems of equations. In this paper, we propose an assembly-free version of the deflated conjugate gradient (DCG) for solving such equations, where neither the stiffness matrix nor the deflation matrix is assembled. While assembly-free FEA is a well-known concept, the novelty pursued in this paper is the use of assembly-free deflation. The resulting implementation is particularly well suited for large-scale problems and can be easily ported to multicore central processing unit (CPU) and graphics-programmable unit (GPU) architectures. For demonstration, we show that one can solve a 50 × 106 degree of freedom system on a single GPU card, equipped with 3 GB of memory. The second contribution is an extension of the “rigid-body agglomeration” concept used in DCG to a “curvature-sensitive agglomeration.” The latter exploits classic plate and beam theories for efficient deflation of highly ill-conditioned problems arising from thin structures.

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Step Ring Based 3D Path Planning via GPU Simulation for Subtractive 3D Printing

Volume 2: Materials; Biomanufacturing; Properties, Applications and Systems; Sustainable Manufacturing ◽

10.1115/msec2016-8751 ◽

2016 ◽

Cited By ~ 1

Author(s):

Zhengkai Wu ◽

Thomas M. Tucker ◽

Chandra Nath ◽

Thomas R. Kurfess ◽

Richard W. Vuduc

Keyword(s):

3D Printing ◽

Path Planning ◽

Large Scale ◽

Cnc Machining ◽

Scale Model ◽

Material Surface ◽

Processing Unit ◽

Cad Model ◽

Set Partition ◽

Central Processing

In this paper, both software model visualization with path simulation and associated machining product are produced based on the step ring based 3-axis path planning to demo model-driven graphics processing unit (GPU) feature in tool path planning and 3D image model classification by GPU simulation. Subtractive 3D printing (i.e., 3D machining) is represented as integration between 3D printing modeling and CNC machining via GPU simulated software. Path planning is applied through material surface removal visualization in high resolution and 3D path simulation via ring selective path planning based on accessibility of path through pattern selection. First, the step ring selects critical features to reconstruct computer aided design (CAD) design model as STL (stereolithography) voxel, and then local optimization is attained within interested ring area for time and energy saving of GPU volume generation as compared to global all automatic path planning with longer latency. The reconstructed CAD model comes from an original sample (GATech buzz) with 2D image information. CAD model for optimization and validation is adopted to sustain manufacturing reproduction based on system simulation feedback. To avoid collision with the produced path from retraction path, we pick adaptive ring path generation and prediction in each planning iteration, which may also minimize material removal. Moreover, we did partition analysis and g-code optimization for large scale model and high density volume data. Image classification and grid analysis based on adaptive 3D tree depth are proposed for multi-level set partition of the model to define no cutting zones. After that, accessibility map is computed based on accessibility space for rotational angular space of path orientation to compare step ring based pass planning verses global all path planning. Feature analysis via central processing unit (CPU) or GPU processor for GPU map computation contributes to high performance computing and cloud computing potential through parallel computing application of subtractive 3D printing in the future.

Download Full-text

Approximate method for the analysis of components undergoing ratchetting and failure

The Journal of Strain Analysis for Engineering Design ◽

10.1243/0309324981512814 ◽

1998 ◽

Vol 33 (1) ◽

pp. 55-65 ◽

Cited By ~ 9

Author(s):

J Lin ◽

F P E Dunne ◽

D R Hayhurst

Keyword(s):

Approximate Method ◽

Mechanical Loading ◽

Thermal Loading ◽

Cyclic Plasticity ◽

Processing Unit ◽

Computationally Efficient ◽

Element Analysis ◽

Cyclic Thermal Loading ◽

Central Processing ◽

Practical Design

An approximate method has been presented for the design analysis of engineering components subjected to combined cyclic thermal and mechanical loading. The method is based on the discretization of components using multibar modelling which enables the effects of stress redistribution to be included as creep and cyclic plasticity damage evolves. Cycle jumping methods have also been presented which extend previous methods to handle problems in which incremental plastic straining (ratchetting) occurs. Cycle jumping leads to considerable reductions in computer CPU (central processing unit) resources, and this has been shown for a range of loading conditions. The cycle jumping technique has been utilized to analyse the ratchetting behaviour of a multibar structure selected to model geometrical and thermomechanical effects typically encountered in practical design situations. The method has been used to predict the behaviour of a component when subjected to cyclic thermal loading, and the results compared with those obtained from detailed finite element analysis. The method is also used to analyse the same component when subjected to constant mechanical loading, in addition to cyclic thermal loading leading to ratchetting. The important features of the two analyses are then compared. In this way, the multibar modelling is shown to enable the computationally efficient analysis of engineering components.

Download Full-text

Analysis of Heat and Smoke Propagation and Oscillatory Flow through Ceiling Vents in a Large-Scale Compartment Fire

Applied Sciences ◽

10.3390/app9163305 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3305 ◽

Cited By ~ 1

Author(s):

Claudio Zanzi ◽

Pablo Gómez ◽

Joaquín López ◽

Julio Hernández

Keyword(s):

Convective Heat ◽

Large Scale ◽

Natural Ventilation ◽

Heat Propagation ◽

Oscillatory Behavior ◽

Combustion Model ◽

Processing Unit ◽

Fire Model ◽

Central Processing ◽

Mass Fluxes

One question that often arises is whether a specialized code or a more general code may be equally suitable for fire modeling. This paper investigates the performance and capabilities of a specialized code (FDS) and a general-purpose code (FLUENT) to simulate a fire in the commercial area of an underground intermodal transportation station. In order to facilitate a more precise comparison between the two codes, especially with regard to ventilation issues, the number of factors that may affect the fire evolution is reduced by simplifying the scenario and the fire model. The codes are applied to the same fire scenario using a simplified fire model, which considers a source of mass, heat and species to characterize the fire focus, and whose results are also compared with those obtained using FDS and a combustion model. An oscillating behavior of the fire-induced convective heat and mass fluxes through the natural vents is predicted, whose frequency compares well with experimental results for the ranges of compartment heights and heat release rates considered. The results obtained with the two codes for the smoke and heat propagation patterns and convective fluxes through the forced and natural ventilation systems are discussed and compared to each other. The agreement is very good for the temperature and species concentration distributions and the overall flow pattern, whereas appreciable discrepancies are only found in the oscillatory behavior of the fire-induced convective heat and mass fluxes through the natural vents. The relative performance of the codes in terms of central processing unit (CPU) time consumption is also discussed.

Download Full-text

Splotch

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016652713 ◽

2016 ◽

Vol 31 (6) ◽

pp. 550-563

Author(s):

Timothy Dykes ◽

Claudio Gheller ◽

Marzia Rivi ◽

Mel Krokos

Keyword(s):

High Performance ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Xeon Phi ◽

The Many ◽

Many Core ◽

Performance Results ◽

Graphics Processing ◽

Performance Computing

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.

Download Full-text

High-performance computing in water resources hydrodynamics

Journal of Hydroinformatics ◽

10.2166/hydro.2020.163 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1217-1235 ◽

Cited By ~ 3

Author(s):

M. Morales-Hernández ◽

M. B. Sharif ◽

S. Gangrade ◽

T. T. Dullo ◽

S.-C. Kao ◽

...

Keyword(s):

Water Resources ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Test Case ◽

Processing Unit ◽

Central Processing ◽

Graphics Processing ◽

Performance Computing

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.

Download Full-text

An alternative approach for collaborative simulation execution on a CPU+GPU hybrid system

SIMULATION ◽

10.1177/0037549719885178 ◽

2019 ◽

Vol 96 (3) ◽

pp. 347-361

Author(s):

Wenjie Tang ◽

Wentong Cai ◽

Yiping Yao ◽

Xiao Song ◽

Feng Zhu

Keyword(s):

Hybrid System ◽

Large Scale ◽

Scheduling Algorithm ◽

Discrete Event ◽

Processing Unit ◽

Model Computation ◽

Central Processing ◽

Collaborative Simulation ◽

Alternative Approach ◽

Parallel Discrete Event

In the past few years, the graphics processing unit (GPU) has been widely used to accelerate time-consuming models in simulations. Since both model computation and simulation management are main factors that affect the performance of large-scale simulations, only accelerating model computation will limit the potential speedup. Moreover, models that can be well accelerated by a GPU could be insufficient, especially for simulations with many lightweight models. Traditionally, the parallel discrete event simulation (PDES) method is used to solve this class of simulation, but most PDES simulators only utilize the central processing unit (CPU) even though the GPU is commonly available now. Hence, we propose an alternative approach for collaborative simulation execution on a CPU+GPU hybrid system. The GPU supports both simulation management and model computation as CPUs. A concurrency-oriented scheduling algorithm was proposed to enable cooperation between the CPU and the GPU, so that multiple computation and communication resources can be efficiently utilized. In addition, GPU functions have also been carefully designed to adapt the algorithm. The combination of those efforts allows the proposed approach to achieve significant speedup compared to the traditional PDES on a CPU.

Download Full-text

A robust system reliability analysis using partitioning and parallel processing of Markov chain

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060414000493 ◽

2014 ◽

Vol 28 (4) ◽

pp. 311-322 ◽

Cited By ~ 1

Author(s):

Po Ting Lin ◽

Yu-Cheng Chou ◽

Yung Ting ◽

Shian-Shing Shyu ◽

Chang-Kuo Chen

Keyword(s):

Markov Chain ◽

Parallel Processing ◽

Reliability Analysis ◽

System Reliability ◽

Large Scale ◽

Transition Probability ◽

Transition Probability Matrix ◽

Processing Unit ◽

Central Processing ◽

System Reliability Analysis

AbstractThis paper presents a robust reliability analysis method for systems of multimodular redundant (MMR) controllers using the method of partitioning and parallel processing of a Markov chain (PPMC). A Markov chain is formulated to represent the N distinct states of the MMR controllers. Such a Markov chain has N2 directed edges, and each edge corresponds to a transition probability between a pair of start and end states. Because N can be easily increased substantially, the system reliability analysis may require large computational resources, such as the central processing unit usage and memory occupation. By the PPMC, a Markov chain's transition probability matrix can be partitioned and reordered, such that the system reliability can be evaluated through only the diagonal submatrices of the transition probability matrix. In addition, calculations regarding the submatrices are independent of each other and thus can be conducted in parallel to assure the efficiency. The simulation results show that, compared with the sequential method applied to an intact Markov chain, the proposed PPMC can improve the performance and produce allowable accuracy for the reliability analysis on large-scale systems of MMR controllers.

Download Full-text

Inverse Molecular Dynamics Modeling Performance on GPU Architectures for a Problem of Fracture

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-28736 ◽

2010 ◽

Author(s):

Athanasios P. Iliopoulos ◽

John G. Michopoulos ◽

Samuel G. Lambrakos ◽

Noam Bernstein

Keyword(s):

Molecular Dynamics ◽

Inverse Problem ◽

General Purpose ◽

Processing Unit ◽

Dynamics Modeling ◽

Jones Potential ◽

Central Processing ◽

Fracture Dynamics ◽

Molecular Dynamics Modeling ◽

Gpu Architectures

The recent growth of General Purpose Graphic Processor Units (GPGPUs) technologies as well as the ongoing need for linking usability performance with structural materials processing and design across many length and time scales have motivated the present work. The inverse problem of determining the Lennard-Jones potential governing the fracture dynamics of atoms comprising a sheet of metal under tension, is used to examine the feasibility of utilizing efficiently GPGPU architectures. The implementation of this inverse problem under a molecular dynamics framework verifies the ability of this methodology to deliver the intended results. Subsequently, a sensitivity analysis is performed on GPGPU-enabled hardware to examine the effect of the size of the problem under consideration on the efficiency of various combinations of GPGPU and Central Processing Unit (CPU) cores. Speedup factors are determined relative to a single core CPU of a quad core processor.

Download Full-text