Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver

Author(s):  
Jan Hückelheim ◽  
Paul Hovland ◽  
Michelle Mills Strout ◽  
Jens-Dominik Müller

Reverse-mode algorithmic differentiation (AD) is an established method for obtaining adjoint derivatives of computer simulation applications. In computational fluid dynamics (CFD), adjoint derivatives of a cost function output such as drag or lift with respect to design parameters such as surface coordinates or geometry control points are a key ingredient for shape optimization, uncertainty quantification and flow control. The computational cost of CFD applications and their derivatives makes it essential to use high-performance computing hardware efficiently, including multi- and many-core architectures. Nevertheless, OpenMP is not supported in most AD tools, and previously shown methods achieve poor scalability of the derivative code. We present the AD of an OpenMP-parallelized finite volume compressible flow solver for unstructured meshes. Our approach enables us to reuse the parallelization of the original code in the computation of adjoint derivatives. The method works by identifying code segments that can be differentiated in reverse-mode without changing their memory access pattern. The OpenMP parallelization is integrated into the derivative code during the build process in a way that is robust to modifications of the original code and independent of the OpenMP support of the differentiation tool. We show the scalability of our adjoint CFD solver on test cases ranging from thousands to millions of finite volume mesh cells on CPUs with up to 16 threads as well as on an Intel XeonPhi card with 236 threads. We demonstrate that our approach is more practical to implement for production-sized CFD codes and produces more efficient adjoint derivative code than previously shown AD methods.

Fluids ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 149
Author(s):  
Andrea Chierici ◽  
Leonardo Chirco ◽  
Sandro Manservisi

Fluid-structure interaction (FSI) problems are of great interest, due to their applicability in science and engineering. However, the coupling between large fluid domains and small moving solid walls presents numerous numerical difficulties and, in some configurations, where the thickness of the solid wall can be neglected, one can consider membrane models, which are derived from the Koiter shell equations with a reduction of the computational cost of the algorithm. With this assumption, the FSI simulation is reduced to the fluid equations on a moving mesh together with a Robin boundary condition that is imposed on the moving solid surface. In this manuscript, we are interested in the study of inverse FSI problems that aim to achieve an objective by changing some design parameters, such as forces, boundary conditions, or geometrical domain shapes. We study the inverse FSI membrane model by using an optimal control approach that is based on Lagrange multipliers and adjoint variables. In particular, we propose a pressure boundary optimal control with the purpose to control the solid deformation by changing the pressure on a fluid boundary. We report the results of some numerical tests for two-dimensional domains to demonstrate the feasibility and robustness of our method.


Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


2019 ◽  
Vol 2019 (1) ◽  
Author(s):  
A. Khalid ◽  
M. N. Naeem ◽  
P. Agarwal ◽  
A. Ghaffar ◽  
Z. Ullah ◽  
...  

AbstractIn the current paper, authors proposed a computational model based on the cubic B-spline method to solve linear 6th order BVPs arising in astrophysics. The prescribed method transforms the boundary problem to a system of linear equations. The algorithm we are going to develop in this paper is not only simply the approximation solution of the 6th order BVPs using cubic B-spline, but it also describes the estimated derivatives of 1st order to 6th order of the analytic solution at the same time. This novel technique has lesser computational cost than numerous other techniques and is second order convergent. To show the efficiency of the proposed method, four numerical examples have been tested. The results are described using error tables and graphs and are compared with the results existing in the literature.


1984 ◽  
Vol 30 (6) ◽  
pp. 730-738 ◽  
Author(s):  
D. Gauvreau ◽  
M. J. Waring

New antibiotics produced by Streptomyces echinatus A8331 cultured in the presence of heterocyclic aromatic acids can be separated and purified by high-performance liquid chromatography using reversed phase columns. Natural quinoxaline antibiotics and certain quinoline derivatives can also be efficiently separated in normal phase systems. Details of purification procedures are described together with experiments to characterise the new antibiotics by field desorption mass spectrometry and proton magnetic resonance. Mono- and bis-substituted derivatives of echinomycin containing the following replacement chromophores have been isolated: 7-chloroquinoxaline-2-carbonyl, thieno[3,2-b]pyridine-5-carbonyl, and 6-methylquinoline-2-carbonyl. With a 6-methylquinoline-2-carboxylic acid precursor the analogues containing one or two replacement chromophores are each separable into two distinct components. One of the bis-substituted 6-methylquinoline products appears inactive in an antibacterial assay and behaves as a triostin analogue, presumably an immediate precursor of the corresponding echinomycin derivative.


Author(s):  
Xiaohan Tao ◽  
Jianmin Pang ◽  
Jinlong Xu ◽  
Yu Zhu

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.


Sign in / Sign up

Export Citation Format

Share Document