scholarly journals ANumerical Uncertainty in Parallel Processing Using Computational Fluid Dynamics as Example

2021 ◽  
Vol 8 (2) ◽  
pp. 169-180
Author(s):  
Mark Lin ◽  
Periklis Papadopoulos

Computational methods such as Computational Fluid Dynamics (CFD) traditionally yield a single output – a single number that is much like the result one would get if one were to perform a theoretical hand calculation. However, this paper will show that computation methods have inherent uncertainty which can also be reported statistically. In numerical computation, because many factors affect the data collected, the data can be quoted in terms of standard deviations (error bars) along with a mean value to make data comparison meaningful. In cases where two data sets are obscured by uncertainty, the two data sets are said to be indistinguishable. A sample CFD problem pertaining to external aerodynamics is copied and ran on 29 identical computers in a university computer lab. The expectation is that all 29 runs should return exactly the same result; unfortunately, in a few cases the result turns out to be different. This is attributed to the parallelization scheme which partitions the mesh to run in parallel on multiple cores of the computer. The distribution of the computational load is hardware-driven depending on the available resource of each computer at the time. Things, such as load-balancing among multiple Central Processing Unit (CPU) cores using Message Passing Interface (MPI) are transparent to the user. Software algorithm such as METIS or JOSTLE is used to automatically divide up the load between different processors. As such, the user has no control over the outcome of the CFD calculation even when the same problem is computed. Because of this, numerical uncertainty arises from parallel (multicore) computing. One way to resolve this issue is to compute problems using a single core, without mesh repartitioning. However, as this paper demonstrates even this is not straight forward. Keywords: numerical uncertainty, parallelization, load-balancing, automotive aerodynamics

2006 ◽  
Vol 129 (2) ◽  
pp. 221-231 ◽  
Author(s):  
André Burdet ◽  
Reza S. Abhari ◽  
Martin G. Rose

Computational fluid dynamics (CFD) has recently been used for the simulation of the aerothermodynamics of film cooling. The direct calculation of a single cooling hole requires substantial computational resources. A parametric study, for the optimization of the cooling system in real engines, is much too time consuming due to the large number of grid nodes required to cover all injection holes and plenum chambers. For these reasons, a hybrid approach is proposed, based on the modeling of the near film-cooling hole flow, tuned using experimental data, while computing directly the flow field in the blade-to-blade passage. A new injection film-cooling model is established, which can be embedded in a CFD code, to lower the central processing unit (CPU) cost and to reduce the simulation turnover time. The goal is to be able to simulate film-cooled turbine blades without having to explicitly mesh inside the holes and the plenum chamber. The stability, low CPU overhead level (1%) and accuracy of the proposed CFD-embedded film-cooling model are demonstrated in the ETHZ steady film-cooled flat-plate experiment presented in Part I (Bernsdorf, Rose, and Abhari, 2006, ASME J. Turbomach., 128, pp. 141–149) of this two-part paper. The prediction of film-cooling effectiveness using the CFD-embedded model is evaluated.


Author(s):  
M Franchetta ◽  
K O Suen ◽  
T G Bancroft

Underbonnet simulations are proving to be crucially important within a vehicle development programme, reducing test work and time-to-market. While computational fluid dynamics (CFD) simulations of steady forced flows have been demonstrated to be reliable, studies of transient convective flows in engine compartments are not yet carried out owing to high computing demands and lack of validated work. The present work assesses the practical feasibility of applying the CFD tool at the initial stage of a vehicle development programme for investigating the thermally driven flow in an engine bay under thermal soak. A computation procedure that enables pseudo time-marching CFD simulations to be performed with significantly reduced central processing unit (CPU) time usage is proposed. The methodology was initially tested on simple geometries and then implemented for investigating a simplified half-scale underbonnet compartment. The numerical results are compared with experimental data taken with thermocouples and with particle image velocimetry (PIV). The novel computation methodology is successful in efficiently providing detailed and time-accurate time-dependent thermal and flow predictions. Its application will extend the use of the CFD tool for transient investigations, enabling improvements to the component packaging of engine bays and the refinement of thermal management strategies with reduced need for in-territory testing.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Ronglin Jiang ◽  
Shugang Jiang ◽  
Yu Zhang ◽  
Ying Xu ◽  
Lei Xu ◽  
...  

This paper introduces a (finite difference time domain) FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI) and Open Multiprocessing (OpenMP). Since both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with16×18elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.


2015 ◽  
Vol 138 (1) ◽  
Author(s):  
Amit Amritkar ◽  
Danesh Tafti

Graphical processing unit (GPU) computation in recent years has seen extensive growth due to advancement in both hardware and software stack. This has led to increase in the use of GPUs as accelerators across a broad spectrum of applications. This work deals with the use of general purpose GPUs for performing computational fluid dynamics (CFD) computations. The paper discusses strategies and findings on porting a large multifunctional CFD code to the GPU architecture. Within this framework, the most compute intensive segment of the software, the BiCGStab linear solver using additive Schwarz block preconditioners with point Jacobi iterative smoothing is optimized for the GPU platform using various techniques in CUDA Fortran. Representative turbulent channel and pipe flow are investigated for validation and benchmarking purposes. Both single and double precision calculations are highlighted. For a modest single block grid of 64 × 64 × 64, the turbulent channel flow computations showed a speedup of about eightfold in double precision and more than 13-fold for single precision on the NVIDIA Tesla GPU over a serial run on an Intel central processing unit (CPU). For the pipe flow consisting of 1.78 × 106 grid cells distributed over 36 mesh blocks, the gains were more modest at 4.5 and 6.5 for double and single precision, respectively.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Water ◽  
2021 ◽  
Vol 13 (21) ◽  
pp. 3122
Author(s):  
Leonardo Primavera ◽  
Emilia Florio

The possibility to create a flood wave in a river network depends on the geometric properties of the river basin. Among the models that try to forecast the Instantaneous Unit Hydrograph (IUH) of rainfall precipitation, the so-called Multifractal Instantaneous Unit Hydrograph (MIUH) by De Bartolo et al. (2003) rather successfully connects the multifractal properties of the river basin to the observed IUH. Such properties can be assessed through different types of analysis (fixed-size algorithm, correlation integral, fixed-mass algorithm, sandbox algorithm, and so on). The fixed-mass algorithm is the one that produces the most precise estimate of the properties of the multifractal spectrum that are relevant for the MIUH model. However, a disadvantage of this method is that it requires very long computational times to produce the best possible results. In a previous work, we proposed a parallel version of the fixed-mass algorithm, which drastically reduced the computational times almost proportionally to the number of Central Processing Unit (CPU) cores available on the computational machine by using the Message Passing Interface (MPI), which is a standard for distributed memory clusters. In the present work, we further improved the code in order to include the use of the Open Multi-Processing (OpenMP) paradigm to facilitate the execution and improve the computational speed-up on single processor, multi-core workstations, which are much more common than multi-node clusters. Moreover, the assessment of the multifractal spectrum has also been improved through a direct computation method. Currently, to the best of our knowledge, this code represents the state-of-the-art for a fast evaluation of the multifractal properties of a river basin, and it opens up a new scenario for an effective flood forecast in reasonable computational times.


2010 ◽  
Vol 18 (3-4) ◽  
pp. 193-201 ◽  
Author(s):  
Dennis C. Jespersen

The Computational Fluid Dynamics code OVERFLOW includes as one of its solver options an algorithm which is a fairly small piece of code but which accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating this piece of code by using a Graphics Processing Unit (GPU). The algorithm needs to be modified to be suitable for a GPU and attention needs to be given to 64-bit and 32-bit arithmetic. Interestingly, the work done for the GPU produced ideas for accelerating the CPU code and led to significant speedup on the CPU.


Author(s):  
Roberto Porcù ◽  
Edie Miglio ◽  
Nicola Parolini ◽  
Mattia Penati ◽  
Noemi Vergopolan

Helicopters can experience brownout when flying close to a dusty surface. The uplifting of dust in the air can remarkably restrict the pilot’s visibility area. Consequently, a brownout can disorient the pilot and lead to the helicopter collision against the ground. Given its risks, brownout has become a high-priority problem for civil and military operations. Proper helicopter design is thus critical, as it has a strong influence over the shape and density of the cloud of dust that forms when brownout occurs. A way forward to improve aircraft design against brownout is the use of particle simulations. For simulations to be accurate and comparable to the real phenomenon, billions of particles are required. However, using a large number of particles, serial simulations can be slow and too computationally expensive to be performed. In this work, we investigate an message passing interface (MPI) + graphics processing unit (multi-GPU) approach to simulate brownout. In specific, we use a semi-implicit Euler method to consider the particle dynamics in a Lagrangian way, and we adopt a precomputed aerodynamic field. Here, we do not include particle–particle collisions in the model; this allows for independent trajectories and effective model parallelization. To support our methodology, we provide a speedup analysis of the parallelization concerning the serial and pure-MPI simulations. The results show (i) very high speedups of the MPI + multi-GPU implementation with respect to the serial and pure-MPI ones, (ii) excellent weak and strong scalability properties of the implemented time-integration algorithm, and (iii) the possibility to run realistic simulations of brownout with billions of particles at a relatively small computational cost. This work paves the way toward more realistic brownout simulations, and it highlights the potential of high-performance computing for aiding and advancing aircraft design for brownout mitigation.


Sign in / Sign up

Export Citation Format

Share Document