Accelerating and tuning small matrix multiplications on Sunway TaihuLight: A case study of spectral element CFD Code Nek5000

Author(s):  
Xianmeng Wang ◽  
Zhifeng Zhou ◽  
Changjun Hu ◽  
Wen Yang ◽  
Minfu Zhao ◽  
...  

The matrix–matrix products for matrices of small size have continued to play an important part in a range of scientific applications. The heterogeneous architecture, which is predicted to be a trend in the exascale supercomputing era, gives rises to the challenges of porting and optimizing small matrix products. We present a method to accelerating and tune small matrix multiplications on Sunway TaihuLight supercomputer, which has been titled as the most powerful supercomputer four times in the Top5000 list. Sunway TaihuLight is equipped with Shen-Wei hybrid manycore processors. We use Nek5000 as a case study to demonstrate our methods. Nek5000 is an open-source computational fluid dynamics (CFD) solver based on the spectral element method (SEM) for incompressible flow. The high-order SEM method, of which the computation kernel is small dense matrix products, is regarded to have the potential to overcome constraints of standard CFD software. By optimizing using vectorization, we gained about 30% performance improvement on management processing element. We accelerated Nek5000 using computing processing elements (CPEs). The experiments results suggest that employing 32 CPEs delivers the best performance enhancements. We scaled Nek5000 to 16,384 core groups with 540,672 cores, reaching about 30% performance improvements.

Author(s):  
H. Clarke Anderson ◽  
Priscilla R. Coulter

Epiphyseal cartilage matrix contains fibrils and particles of at least 5 different types: 1. Banded collagen fibrils, present throughout the matrix, but not seen in the lacunae. 2. Non-periodic fine fibrils <100Å in diameter (Fig. 1), which are most notable in the lacunae, and may represent immature collagen. 3. Electron dense matrix granules (Fig. 1) which are often attached to fine fibrils and collagen fibrils, and probably contain protein-polysaccharide although the possibility of a mineral content has not been excluded. 4. Matrix vesicles (Fig. 2) which show a selective distribution throughout the epiphysis, and may play a role in calcification. 5. Needle-like apatite crystals (Fig. 2).Blocks of formalin-fixed epiphysis from weanling mice were digested with the following agents in 0.1M phosphate buffer: a) 5% ethylenediaminetetraacetate (EDTA) at pH 8.3, b) 0.015% bovine testicular hyaluronidase (Sigma, type IV, 750 units/mg) at pH 5.5, and c) 0.1% collagenase (Worthington, chromatograhically pure, 200 units/mg) at pH 7.4. All digestions were carried out at 37°C overnight. Following digestion tissues were examined by light and electron microscopy to determine changes in the various fibrils and particles of the matrix.


2012 ◽  
Vol 512-515 ◽  
pp. 2135-2142 ◽  
Author(s):  
Yu Peng Wu ◽  
Zhi Yong Wen ◽  
Yue Liang Shen ◽  
Qing Yan Fang ◽  
Cheng Zhang ◽  
...  

A computational fluid dynamics (CFD) model of a 600 MW opposed swirling coal-fired utility boiler has been established. The chemical percolation devolatilization (CPD) model, instead of an empirical method, has been adapted to predict the nitrogen release during the devolatilization. The current CFD model has been validated by comparing the simulated results with the experimental data obtained from the boiler for case study. The validated CFD model is then applied to study the effects of ratio of over fire air (OFA) on the combustion and nitrogen oxides (NOx) emission characteristics. It is found that, with increasing the ratio of OFA, the carbon content in fly ash increases linearly, and the NOx emission reduces largely. The OFA ratio of 30% is optimal for both high burnout of pulverized coal and low NOx emission. The present study provides helpful information for understanding and optimizing the combustion of the studied boiler


2021 ◽  
Vol 2021 (4) ◽  
Author(s):  
A. de Giorgi ◽  
S. Vogl

Abstract The Kaluza-Klein (KK) decomposition of higher-dimensional gravity gives rise to a tower of KK-gravitons in the effective four-dimensional (4D) theory. Such massive spin-2 fields are known to be connected with unitarity issues and easily lead to a breakdown of the effective theory well below the naive scale of the interaction. However, the breakdown of the effective 4D theory is expected to be controlled by the parameters of the 5D theory. Working in a simplified Randall-Sundrum model we study the matrix elements for matter annihilations into massive gravitons. We find that truncating the KK-tower leads to an early breakdown of perturbative unitarity. However, by considering the full tower we obtain a set of sum rules for the couplings between the different KK-fields that restore unitarity up to the scale of the 5D theory. We prove analytically that these are fulfilled in the model under consideration and present numerical tests of their convergence. This work complements earlier studies that focused on graviton self-interactions and yields additional sum rules that are required if matter fields are incorporated into warped extra-dimensions.


2019 ◽  
Vol 809 ◽  
pp. 480-486
Author(s):  
Rohit George Sebastian ◽  
Christof Obertscheider ◽  
Ewald Fauster ◽  
Ralf Schledjewski

The growing use of composite materials has generated interest in improving and optimising composite manufacturing processes such as Liquid Composite Moulding (LCM). In LCM, dry preforms are placed in a mould and impregnated with the matrix material. The efficiency of filling the moulds can be improved by using Computational Fluid Dynamics (CFD) filling simulations during the design of the mould. As part of an on-going effort to develop a CFD tool for the simulation of LCM processes, a volume averaged energy balance equation has been derived and implemented in a custom OpenFOAM solver. The energy balance is implemented in a custom OpenFOAM solver with and without the pressure terms for comparison with results from RTM experiments. It is found that the pressure terms do not significantly influence the results for LCM processes.


2013 ◽  
Vol 368-370 ◽  
pp. 599-602 ◽  
Author(s):  
Ian Hung ◽  
Hsien Te Lin ◽  
Yu Chung Wang

This study focuses on the performance of air conditioning design at the Dazhi Cultural Center and uses a computational fluid dynamics (CFD) simulation to discuss the differences in wind velocity and ambient indoor temperature between all-zone air conditioning design and stratified air conditioning design. The results have strong implications for air conditioning design and can improve the indoor air quality of assembly halls.


1997 ◽  
Vol 6 (1) ◽  
pp. 127-152
Author(s):  
Eric De Sturler ◽  
Volker Strumpen

Recently, the first commercial High Performance Fortran (HPF) subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77) programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler needs substantial support from the programmer.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
J.-C. Cortés ◽  
L. Jódar ◽  
Francisco J. Solís ◽  
Roberto Ku-Carrillo

We introduce infinite matrix products including some of their main properties and convergence results. We apply them in order to extend to the matrix scenario the definition of the scalar gamma function given by an infinite product due to Weierstrass. A limit representation of the matrix gamma function is also provided.


Palaios ◽  
2021 ◽  
Vol 36 (3) ◽  
pp. 115-121
Author(s):  
EDUARDO MAYORAL ◽  
JORGE F. GENISE ◽  
FRANCISCO J. RODRÍGUEZ-TOVAR ◽  
ANA SANTOS

ABSTRACT Plio?-Pleistocene outcrops located at the southwestern edge of the Guadalquivir Basin in the area of Lepe (Huelva, Spain) provide an interesting example for studying the contemporaneity of traces with the rocks that contain them. Two different types of cells compatible with the ichnogenera Celliforma (Type 1) and Palmiraichnus (Type 2) were found in these outcrops. Their walls were constructed with the same material as the matrix and our first research in the area showed no extant bees producing them suggesting that they were coeval with the trace-bearing rocks. The case of the “Palmiraichnus-like” Type 2 cells was misleading because of its similarity with Palmiraichnus described from the region in the Canary Islands and Balearic Archipelago (Spain). Two determining features were vital in clarifying this first appearance. In the Palmiraichnus-like cells we found remains of a larval cocoon in one cell that could be dated by C14, giving a modern age. In the Celliforma-like cells more field research in the area allow us to observe extant bees nesting in these rocks in autumn. Ichnological literature show a few cases of asynchronies involving extant traces found mostly in Paleozoic and Mesozoic rocks. In contrast, the case presented herein indicates the time gap between the bearing rocks and the Lepe traces was shorter (ca. 12 ky–2.6 My), enhancing the similarity of traces and rocks and thus their potential coevalness. This case may serve as a warning about other potential examples in the fossil record in which relatively short asynchronies between traces and paleosols exist.


1999 ◽  
Vol 14 (13) ◽  
pp. 2103-2115 ◽  
Author(s):  
BISWANATH RATH

We study the divergent behavior of the Morse–Feshbach nonlinear perturbation series (MFNS) [P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Part II (McGraw-Hill, New York, 1953)] for producing convergent energy levels using the ground state of a quartic anharmonic oscillator (AHO) in the strong coupling limit. Numerical calculations have been done up to tenth order. Further comparison of the MFNS convergent result has been made with the matrix diagonalization method.


Sign in / Sign up

Export Citation Format

Share Document