code optimization
Recently Published Documents


TOTAL DOCUMENTS

293
(FIVE YEARS 33)

H-INDEX

17
(FIVE YEARS 1)

Author(s):  
Yuta Hirokawa ◽  
Atsushi Yamada ◽  
Shunsuke Yamada ◽  
Masashi Noda ◽  
Mitsuharu Uemoto ◽  
...  

In the field of optical science, it is becoming increasingly important to observe and manipulate matter at the atomic scale using ultrashort pulsed light. For the first time, we have performed the ab initio simulation solving the Maxwell equation for light electromagnetic fields, the time-dependent Kohn-Sham equation for electrons, and the Newton equation for ions in extended systems. In the simulation, the most time-consuming parts were stencil and nonlocal pseudopotential operations on the electron orbitals as well as fast Fourier transforms for the electron density. Code optimization was thoroughly performed on the Fujitsu A64FX processor to achieve the highest performance. A simulation of amorphous SiO2 thin film composed of more than 10,000 atoms was performed using 27,648 nodes of the Fugaku supercomputer. The simulation achieved excellent time-to-solution with the performance close to the maximum possible value in view of the memory bandwidth bound, as well as excellent weak scalability.


Author(s):  
Hanting Zhao ◽  
Zhuo Wang ◽  
Hongrui Zhang ◽  
Menglin Wei ◽  
Siyuan Jiang ◽  
...  

2021 ◽  
Author(s):  
Rudnei Dias da Cunha ◽  
Elismar R. Oliveira

Abstract We present algorithms to compute approximations of invariant measures and its attractors for IFS and GIFS, using the deterministic algorithm in a tractable way, with code optimization strategies and use of data structures and search algorithms. The results show that these algorithms allow the use of these (G)IFS in a reasonable running time.


2021 ◽  
Author(s):  
Yuxuan Jing ◽  
Rami M. Younis

Abstract Automatic differentiation software libraries augment arithmetic operations with their derivatives, thereby relieving the programmer of deriving, implementing, debugging, and maintaining derivative code. With this encapsulation however, the responsibility of code optimization relies more heavily on the AD system itself (as opposed to the programmer and the compiler). Moreover, given that there are multiple contexts in reservoir simulation software for which derivatives are required (e.g. property package and discrete operator evaluations), the AD infrastructure must also be adaptable. An Operator Overloading AD design is proposed and tested to provide scalability and computational efficiency seemlessly across memory- and compute-bound applications. This is achieved by 1) use of portable and standard programming language constructs (C++17 and OpenMP 4.5 standards), 2) adopting a vectorized programming interface, 3) lazy evaluation via expression templates, and 4) multiple memory alignment and layout policies. Empirical analysis is conducted on various kernels spanning various arithmetic intensity and working set sizes. Cache- aware roofline analysis results show that the performance and scalability attained are reliably ideal. In terms of floapting point operations executed per second, the performance of the AD system matches optimized hand-code. Finally, the implementation is benchmarked using the Automatically Differentiable Expression Templates Library (ADETL).


Author(s):  
Tatiana Nikolaevna Romanova ◽  
◽  
Dmitry Igorevich Gorin ◽  

A method for optimizing the filling of a machine word with independent instructions is proposed, which allows to increase the performance of programs by stacking the maximum number of independent commands in a package. The paper also confirms the hypothesis that with the transition to random register allocation by the compiler, the packet density will increase, which will result in a decrease in the program's running time.


Author(s):  
Vadim Bulavintsev ◽  
Dmitry Zhdanov

We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm’s loops as necessary for the match. The mapping should be performed bottom-up, from the lowest GPU architecture levels to the highest ones, to minimize off-chip memory access and maximize register file usage. The method provides programmer with a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting to a GPU the DPLL backtracking search algorithm for solving the Boolean satisfiability problem (SAT). The resulting GPU version of DPLL outperforms the CPU version in raw tree search performance sixfold for regular Boolean satisfiability problems and twofold for irregular ones.


Sign in / Sign up

Export Citation Format

Share Document