DiVinE 2.0: High-Performance Model Checking

Author(s):  
Jiri Barnat ◽  
Lubos Brim ◽  
Petr Rockai
Author(s):  
Xiaohan Tao ◽  
Jianmin Pang ◽  
Jinlong Xu ◽  
Yu Zhu

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.


Author(s):  
Martin Schreiber ◽  
Pedro S Peixoto ◽  
Terry Haut ◽  
Beth Wingate

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.


2019 ◽  
Vol 41 (2) ◽  
pp. 100-107 ◽  
Author(s):  
Anthony N. Turner ◽  
Chris Bishop ◽  
Jon Cree ◽  
Paul Carr ◽  
Andy McCann ◽  
...  

2021 ◽  
Author(s):  
Antonina Kriuger ◽  
Alexander Reinbold ◽  
Martina Schubert-Frisius ◽  
Jörg Cortekar

<p>Cities are particularly vulnerable to climate change. At the same time, cities change slowly. Accordingly, preparatory measures to adapt to climate change have to be taken urgently. High-performance urban climate models with various applications can form the basis for prospective planning decisions, however, as of today no such model exists that can be easily applied outside of the scientific community. Therefore, the funding program Urban Climate Under Change [UC]<sup>2</sup> aims to further develop the new urban climate model PALM-4U (Parallelized Large-Eddy Simulation Model for Urban Applications) into a practice-oriented and user-friendly product that meets the needs of municipalities and other practical users in addition to scientific research.</p><p>Specifically, the high-performance model PALM-4U allows simulation of entire large cities comprising the area over 1.000 km<sup>2</sup> with a grid size of down to few meters. One of our goals within the project ProPolis is to design and test the practical implementation of PALM-4U in standard and innovative application fields which include thermal comfort (indices like PT, PET, UTCI), cold air balance (source areas, reach and others), local wind comfort (indices derived from medium winds and gusts) as well as dispersion of pollutants.</p><p>In close cooperation with our practice partners, we explore the potential of PALM-4U to support the urban planning processes in each specific application setting. Additionally, with development of the fit for purpose graphic user interface, manuals and trainings we aim to enable practitioners to apply the model for their individual planning questions and adaptation measures.</p><p>In our presentation, we will show an application case of PALM-4U in a major German city. We will investigate the effect of a planned development area on the local climate and the impact of different climate change adaptation measures (such as extensive vs. intensive green roofs). The comparative simulations of the current state and planning scenarios with integrated green and blue infrastructure should provide arguments for the municipal decision making in consideration of climate change aspects in a densely built-up environment, e.g. urban heat stress.</p>


Author(s):  
Harendra Kumar ◽  
Nutan Kumari Chauhan ◽  
Pradeep Kumar Yadav

Tasks allocation is an important step for obtaining high performance in distributed computing system (DCS). This article attempts to develop a mathematical model for allocating the tasks to the processors in order to achieve optimal cost and optimal reliability of the system. The proposed model has been divided into two stages. Stage-I, makes the ‘n' clusters of set of ‘m' tasks by using k-means clustering technique. To use the k-means clustering techniques, the inter-task communication costs have been modified in such a way that highly communicated tasks are clustered together to minimize the communication costs between tasks. Stage-II, allocates the ‘n' clusters of tasks onto ‘n' processors to minimize the system cost. To design the mathematical model, executions costs and inter tasks communication costs have been taken in the form of matrices. To test the performance of the proposed model, many examples are considered from different research papers and results of examples have compared with some existing models.


Author(s):  
Masaki Iwasawa ◽  
Daisuke Namekata ◽  
Keigo Nitadori ◽  
Kentaro Nomura ◽  
Long Wang ◽  
...  

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.


Sign in / Sign up

Export Citation Format

Share Document