Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver

Reverse-mode algorithmic differentiation (AD) is an established method for obtaining adjoint derivatives of computer simulation applications. In computational fluid dynamics (CFD), adjoint derivatives of a cost function output such as drag or lift with respect to design parameters such as surface coordinates or geometry control points are a key ingredient for shape optimization, uncertainty quantification and flow control. The computational cost of CFD applications and their derivatives makes it essential to use high-performance computing hardware efficiently, including multi- and many-core architectures. Nevertheless, OpenMP is not supported in most AD tools, and previously shown methods achieve poor scalability of the derivative code. We present the AD of an OpenMP-parallelized finite volume compressible flow solver for unstructured meshes. Our approach enables us to reuse the parallelization of the original code in the computation of adjoint derivatives. The method works by identifying code segments that can be differentiated in reverse-mode without changing their memory access pattern. The OpenMP parallelization is integrated into the derivative code during the build process in a way that is robust to modifications of the original code and independent of the OpenMP support of the differentiation tool. We show the scalability of our adjoint CFD solver on test cases ranging from thousands to millions of finite volume mesh cells on CPUs with up to 16 threads as well as on an Intel XeonPhi card with 236 threads. We demonstrate that our approach is more practical to implement for production-sized CFD codes and produces more efficient adjoint derivative code than previously shown AD methods.

Download Full-text

On evaluating higher-order derivatives of the QR decomposition of tall matrices with full column rank in forward and reverse mode algorithmic differentiation

Optimization Methods and Software ◽

10.1080/10556788.2011.610454 ◽

2012 ◽

Vol 27 (2) ◽

pp. 391-403 ◽

Cited By ~ 3

Author(s):

Sebastian F. Walter ◽

Lutz Lehmann ◽

René Lamour

Keyword(s):

Qr Decomposition ◽

Higher Order ◽

Full Column Rank ◽

Algorithmic Differentiation ◽

Reverse Mode ◽

Derivatives Of

Download Full-text

Simultaneous determination of eight derivatives of propranolol in cornea perfusate in vitro by high performance liquid chromatography

Chinese Journal of Chromatography ◽

10.3724/sp.j.1123.2012.06039 ◽

2013 ◽

Vol 30 (11) ◽

pp. 1183-1187

Author(s):

Haitao WU ◽

Chuanbing CHEN ◽

Ningsheng WANG ◽

Suiqing MI ◽

Nanying LIAO

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Simultaneous Determination ◽

High Performance ◽

Derivatives Of

Download Full-text

Effect of Design Parameters on the Blast Response of Ultra-High Performance Concrete Columns

10.21838/uhpc.2016.43 ◽

2016 ◽

Cited By ~ 2

Author(s):

Sarah De Carufel ◽

Frederic Dagenais ◽

Christian Melançon ◽

Hassan Aoude

Keyword(s):

High Performance ◽

High Performance Concrete ◽

Concrete Columns ◽

Design Parameters ◽

Ultra High Performance Concrete ◽

Blast Response

Download Full-text

Optimal Pressure Boundary Control of Steady Multiscale Fluid-Structure Interaction Shell Model Derived from Koiter Equations

Fluids ◽

10.3390/fluids6040149 ◽

2021 ◽

Vol 6 (4) ◽

pp. 149

Author(s):

Andrea Chierici ◽

Leonardo Chirco ◽

Sandro Manservisi

Keyword(s):

Optimal Control ◽

Solid Wall ◽

Fluid Structure Interaction ◽

Computational Cost ◽

Robin Boundary Condition ◽

Design Parameters ◽

Fluid Structure ◽

Control Approach ◽

Structure Interaction ◽

Fluid Boundary

Fluid-structure interaction (FSI) problems are of great interest, due to their applicability in science and engineering. However, the coupling between large fluid domains and small moving solid walls presents numerous numerical difficulties and, in some configurations, where the thickness of the solid wall can be neglected, one can consider membrane models, which are derived from the Koiter shell equations with a reduction of the computational cost of the algorithm. With this assumption, the FSI simulation is reduced to the fluid equations on a moving mesh together with a Robin boundary condition that is imposed on the moving solid surface. In this manuscript, we are interested in the study of inverse FSI problems that aim to achieve an objective by changing some design parameters, such as forces, boundary conditions, or geometrical domain shapes. We study the inverse FSI membrane model by using an optimal control approach that is based on Lagrange multipliers and adjoint variables. In particular, we propose a pressure boundary optimal control with the purpose to control the solid deformation by changing the pressure on a fluid boundary. We report the results of some numerical tests for two-dimensional domains to demonstrate the feasibility and robustness of our method.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

Numerical approximation for the solution of linear sixth order boundary value problems by cubic B-spline

Advances in Difference Equations ◽

10.1186/s13662-019-2385-9 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 5

Author(s):

A. Khalid ◽

M. N. Naeem ◽

P. Agarwal ◽

A. Ghaffar ◽

Z. Ullah ◽

...

Keyword(s):

Numerical Approximation ◽

Analytic Solution ◽

Linear Equations ◽

Computational Cost ◽

Spline Method ◽

System Of Linear Equations ◽

Approximation Solution ◽

B Spline ◽

Novel Technique ◽

Derivatives Of

AbstractIn the current paper, authors proposed a computational model based on the cubic B-spline method to solve linear 6th order BVPs arising in astrophysics. The prescribed method transforms the boundary problem to a system of linear equations. The algorithm we are going to develop in this paper is not only simply the approximation solution of the 6th order BVPs using cubic B-spline, but it also describes the estimated derivatives of 1st order to 6th order of the analytic solution at the same time. This novel technique has lesser computational cost than numerous other techniques and is second order convergent. To show the efficiency of the proposed method, four numerical examples have been tested. The results are described using error tables and graphs and are compared with the results existing in the literature.

Download Full-text

The Plural Many‐core Architecture – High Performance at Low Power

Multi‐Processor System‐on‐Chip 1 ◽

10.1002/9781119818298.ch3 ◽

2021 ◽

pp. 53-68

Author(s):

Ran Ginosar

Keyword(s):

Low Power ◽

High Performance ◽

Many Core

Download Full-text

9-Anthryldiazomethane derivatives of prostaglandins for high performance liquid chromatographic analysis

Journal of Chromatography A ◽

10.1016/s0021-9673(01)88388-6 ◽

1982 ◽

Vol 253 ◽

pp. 271-275 ◽

Cited By ~ 59

Author(s):

Makiko Hatsumi ◽

Shin-Ichi Kimata ◽

Koshichiro Hirosawa

Keyword(s):

Chromatographic Analysis ◽

High Performance ◽

High Performance Liquid Chromatographic ◽

Liquid Chromatographic Analysis ◽

Liquid Chromatographic ◽

Derivatives Of

Download Full-text

Directed biosynthesis of novel derivatives of echinomycin. II. Purification and structure elucidation

Canadian Journal of Microbiology ◽

10.1139/m84-112 ◽

1984 ◽

Vol 30 (6) ◽

pp. 730-738 ◽

Cited By ~ 12

Author(s):

D. Gauvreau ◽

M. J. Waring

Keyword(s):

High Performance ◽

Reversed Phase ◽

Aromatic Acids ◽

Antibacterial Assay ◽

New Antibiotics ◽

Field Desorption Mass Spectrometry ◽

Directed Biosynthesis ◽

Phase Systems ◽

Derivatives Of ◽

Acid Precursor

New antibiotics produced by Streptomyces echinatus A8331 cultured in the presence of heterocyclic aromatic acids can be separated and purified by high-performance liquid chromatography using reversed phase columns. Natural quinoxaline antibiotics and certain quinoline derivatives can also be efficiently separated in normal phase systems. Details of purification procedures are described together with experiments to characterise the new antibiotics by field desorption mass spectrometry and proton magnetic resonance. Mono- and bis-substituted derivatives of echinomycin containing the following replacement chromophores have been isolated: 7-chloroquinoxaline-2-carbonyl, thieno[3,2-b]pyridine-5-carbonyl, and 6-methylquinoline-2-carbonyl. With a 6-methylquinoline-2-carboxylic acid precursor the analogues containing one or two replacement chromophores are each separable into two distinct components. One of the bis-substituted 6-methylquinoline products appears inactive in an antibacterial assay and behaves as a triostin analogue, presumably an immediate precursor of the corresponding echinomycin derivative.

Download Full-text

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

The Journal of Supercomputing ◽

10.1007/s11227-021-03853-x ◽

2021 ◽

Author(s):

Xiaohan Tao ◽

Jianmin Pang ◽

Jinlong Xu ◽

Yu Zhu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Scientific Computing ◽

Data Transfer ◽

Performance Model ◽

Experimental Result ◽

Transfer Model ◽

Scratchpad Memory ◽

On Chip ◽

Many Core

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

Download Full-text