application codes
Recently Published Documents


TOTAL DOCUMENTS

29
(FIVE YEARS 8)

H-INDEX

4
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1760
Author(s):  
Xiaochang Li ◽  
Zhengjun Zhai

During the recent decades, non-volatile memory (NVM) has been anticipated to scale up the main memory size, improve the performance of applications, and reduce the speed gap between main memory and storage devices, while supporting persistent storage to cope with power outages. However, to fit NVM, all existing DRAM-based applications have to be rewritten by developers. Therefore, the developer must have a good understanding of targeted application codes, so as to manually distinguish and store data fit for NVM. In order to intelligently facilitate NVM deployment for existing legacy applications, we propose a universal heterogeneous cache hierarchy which is able to automatically select and store the appropriate data of applications for non-volatile memory (UHNVM), without compulsory code understanding. In this article, a program context (PC) technique is proposed in the user space to help UHNVM to classify data. Comparing to the conventional hot or cold files categories, the PC technique can categorize application data in a fine-grained manner, enabling us to store them either in NVM or SSDs efficiently for better performance. Our experimental results using a real Optane dual-inline-memory-module (DIMM) card show that our new heterogeneous architecture reduces elapsed times by about 11% compared to the conventional kernel memory configuration without NVM.


Author(s):  
Thomas M Evans ◽  
Andrew Siegel ◽  
Erik W Draeger ◽  
Jack Deslippe ◽  
Marianne M Francois ◽  
...  

The US Department of Energy Office of Science and the National Nuclear Security Administration initiated the Exascale Computing Project (ECP) in 2016 to prepare mission-relevant applications and scientific software for the delivery of the exascale computers starting in 2023. The ECP currently supports 24 efforts directed at specific applications and six supporting co-design projects. These 24 application projects contain 62 application codes that are implemented in three high-level languages—C, C++, and Fortran—and use 22 combinations of graphical processing unit programming models. The most common implementation language is C++, which is used in 53 different application codes. The most common programming models across ECP applications are CUDA and Kokkos, which are employed in 15 and 14 applications, respectively. This article provides a survey of the programming languages and models used in the ECP applications codebase that will be used to achieve performance on the future exascale hardware platforms.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Marek Nowicki

AbstractSorting algorithms are among the most commonly used algorithms in computer science and modern software. Having efficient implementation of sorting is necessary for a wide spectrum of scientific applications. This paper describes the sorting algorithm written using the partitioned global address space (PGAS) model, implemented using the Parallel Computing in Java (PCJ) library. The iterative implementation description is used to outline the possible performance issues and provide means to resolve them. The key idea of the implementation is to have an efficient building block that can be easily integrated into many application codes. This paper also presents the performance comparison of the PCJ implementation with the MapReduce approach, using Apache Hadoop TeraSort implementation. The comparison serves to show that the performance of the implementation is good enough, as the PCJ implementation shows similar efficiency to the Hadoop implementation.


2020 ◽  
Vol 23 (1-4) ◽  
Author(s):  
Matthias Bolten ◽  
Stephanie Friedhoff ◽  
Jens Hahne ◽  
Sebastian Schöps

AbstractWe apply the multigrid-reduction-in-time (MGRIT) algorithm to an eddy current simulation of a two-dimensional induction machine supplied by a pulse-width-modulation signal. To resolve the fast-switching excitations, small time steps are needed, such that parallelization in time becomes highly relevant for reducing the simulation time. The MGRIT algorithm is an iterative method that allows calculating multiple time steps simultaneously by using a time-grid hierarchy. It is particularly well suited for introducing time parallelism in the simulation of electrical machines using existing application codes, as MGRIT is a non-intrusive approach that essentially uses the same time integrator as a traditional time-stepping algorithm. However, the key difficulty when using time-stepping routines of existing application codes for the MGRIT algorithm is that the cost of the time integrator on coarse time grids must be less expensive than on the fine grid to allow for speedup over sequential time stepping on the fine grid. To overcome this difficulty, we consider reducing the costs of the coarse-level problems by adding spatial coarsening. We investigate effects of spatial coarsening on MGRIT convergence when applied to two numerical models of an induction machine, one with linear material laws and a full nonlinear model. Parallel results demonstrate significant speedup in the simulation time compared to sequential time stepping, even for moderate numbers of processors.


2019 ◽  
Vol 32 (7) ◽  
Author(s):  
Kazuhiko Komatsu ◽  
Ayumu Gomi ◽  
Ryusuke Egawa ◽  
Daisuke Takahashi ◽  
Reiji Suda ◽  
...  

Author(s):  
Sajeeb Saha ◽  
Md. Ahsan Habib ◽  
Sujan Sarkar ◽  
Md. Abdur Razzaque ◽  
Md. Mustafizur Rahman

In the present situation, it may be essential to build a simple data sharing environment to monitor and protect the unauthorized modification of data. In such case, mechanisms may be required to develop to focus on significant weakened networking with proper solutions. In some situations, block chain data management may be used considering the cloud environment. It is well understood that in virtual environment, allocating resources may have significant role towards evaluating the performance including utilization of resources linked to the data center. Accuracy towards allocation of virtual machines in cloud data centers may be more essential considering the optimization problems in cloud computing. In such cases, it may also be desirable to prioritize on virtual machines linked to cloud data centers. Consolidating the dynamic virtual machines may also permit the virtual server providers to optimize utilization of resources and to focus on energy consumption. In fact, tremendous rise in acquiring computational power driven by modern service applications may be linked towards establishment of large-scale virtualized data centers. Accordingly, the joint collaboration of smart connected devices with data analytics may also enable enormous applications towards different predictive maintenance systems. To obtain the near optimal as well as feasible results in this case, it may be desirable to simulate implementing the algorithms and focusing on application codes. Also, different approaches may also be needed to minimize development time and cost. In many cases, the experimental result proves that the simulation techniques may minimize the cache miss and improve the execution time. In this paper, it has been intended towards distribution of tasks along with implementation mechanisms linked to virtual machines.


Author(s):  
Jon Calhoun ◽  
Franck Cappello ◽  
Luke N Olson ◽  
Marc Snir ◽  
William D Gropp

Checkpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application’s state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation.


Sign in / Sign up

Export Citation Format

Share Document