scholarly journals A Debugging Standard for High-Performance Computing

2000 ◽  
Vol 8 (2) ◽  
pp. 95-108 ◽  
Author(s):  
Joan M. Francioni ◽  
Cherri M. Pancake

Throughout 1998, the High Performance Debugging Forum worked on defining a base level standard for high performance debuggers. The standard had to meet the sometimes conflicting constraints of being useful to users, realistically implementable by developers, and architecturally independent across multiple platforms. To meet criteria for timeliness, the standard had to be defined in one year and in such a way that it could be implemented within an additional year. The Forum was successful, and in November 1998 released Version 1 of the HPD Standard. Implementations of the standard are currently underway. This paper presents an overview of Version 1 of the standard and an analysis of the process by which the standard was developed. The status of implementation efforts and plans for follow-on efforts are discussed as well.

2020 ◽  
Vol 245 ◽  
pp. 07006
Author(s):  
Cécile Cavet ◽  
Martin Souchal ◽  
Sébastien Gadrat ◽  
Gilles Grasseau ◽  
Andrea Satirana ◽  
...  

The High Performance Computing (HPC) domain aims to optimize code in order to use the latest multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These “light virtual machines” allow to encapsulate applications within its environment in Linux processes. Containers have been recently rediscovered due to their abilities to provide both multi-infrastructure environnement for developers and system administrators and reproducibility due to image building file. Two container solutions are emerging: Docker for microservices and Singularity for computing applications. We present here the status of the ComputeOps project which has the goal to study the benefit of containers for HPC applications.


Author(s):  
Claus-Peter Rückemann

This chapter gives a comprehensive overview of the current status of accounting and billing for up-todate computing environments. Accounting is the key for the management of information system resources. At this stage of evolution of accounting systems it is adequate not to separate computing environments into High Performance Computing and Grid Computing environments for allowing a “holistic” view showing the different approaches and the state of the art for integrated accounting and billing in distributed computing environments. Requirements resulting from a public survey within all communities of the German Grid infrastructure, as well as from computing centres and resource providers of High Performance Computing resources like HLRN, and ZIVGrid, within the German e-Science framework, have been considered as well as requirements resulting from various information systems and the virtualisation of organisations and resources. Additionally, conceptual, technical, economical, and legal questions also had to be taken into consideration. After the requirements have been consolidated and implementations have been done over one year ago, now the overall results and conclusions are presented in the following sections showing a case study based on the GISIG framework and the Grid- GIS framework. The focus is on how an integrated architecture can be built and used in heterogeneous environments. A prototypical implementation is outlined that is able to manage and visualise relevant accounting and billing information based on suitable monitoring data in a virtual organisation (VO) specific way regarding basic business, economic, and security issues.


2020 ◽  
Vol 245 ◽  
pp. 07060
Author(s):  
Ran Du ◽  
Jingyan Shi ◽  
Xiaowei Jiang ◽  
Jiaheng Zou

HTCondor was adopted to manage the High Throughput Computing (HTC) cluster at IHEP in 2016. In 2017 a Slurm cluster was set up to run High Performance Computing (HPC) jobs. To provide accounting services for these two clusters, we implemented a unified accounting system named Cosmos. Multiple workloads bring different accounting requirements. Briefly speaking, there are four types of jobs to account. First of all, 30 million single-core jobs run in the HTCondor cluster every year. Secondly, Virtual Machine (VM) jobs run in the legacy HTCondor VM cluster. Thirdly, parallel jobs run in the Slurm cluster, and some of these jobs are run on the GPU worker nodes to accelerate computing. Lastly, some selected HTC jobs are migrated from the HTCondor cluster to the Slurm cluster for research purposes. To satisfy all the mentioned requirements, Cosmos is implemented with four layers: acquisition, integration, statistics and presentation. Details about the issues and solutions of each layer will be presented in the paper. Cosmos has run in production for two years, and the status shows that it is a well-functioning system, also meets the requirements of the HTCondor and Slurm clusters.


MRS Bulletin ◽  
1997 ◽  
Vol 22 (10) ◽  
pp. 5-6
Author(s):  
Horst D. Simon

Recent events in the high-performance computing industry have concerned scientists and the general public regarding a crisis or a lack of leadership in the field. That concern is understandable considering the industry's history from 1993 to 1996. Cray Research, the historic leader in supercomputing technology, was unable to survive financially as an independent company and was acquired by Silicon Graphics. Two ambitious new companies that introduced new technologies in the late 1980s and early 1990s—Thinking Machines and Kendall Square Research—were commercial failures and went out of business. And Intel, which introduced its Paragon supercomputer in 1994, discontinued production only two years later.During the same time frame, scientists who had finished the laborious task of writing scientific codes to run on vector parallel supercomputers learned that those codes would have to be rewritten if they were to run on the next-generation, highly parallel architecture. Scientists who are not yet involved in high-performance computing are understandably hesitant about committing their time and energy to such an apparently unstable enterprise.However, beneath the commercial chaos of the last several years, a technological revolution has been occurring. The good news is that the revolution is over, leading to five to ten years of predictable stability, steady improvements in system performance, and increased productivity for scientific applications. It is time for scientists who were sitting on the fence to jump in and reap the benefits of the new technology.


Sign in / Sign up

Export Citation Format

Share Document