Low-latency message passing on workstation clusters using SCRAMNet

Author(s):  
V. Moorthy ◽  
M.G. Jacunski ◽  
M. Pillai ◽  
P.P. Ware ◽  
D.K. Panda ◽  
...  
Author(s):  
Matt Jacunski ◽  
Vijay Moorthy ◽  
Peter P. Ware ◽  
Manoj Pillai ◽  
Dhabaleswar K. Panda ◽  
...  
Keyword(s):  

2019 ◽  
Vol 31 (2) ◽  
Author(s):  
Sean Pennefather ◽  
Karen Bradshaw ◽  
Barry Irwin

We present the design and implementation of an indirect messaging extension for the existing NFComms framework that provides communication between a network flow processor and host CPU. This extension addresses the bulk throughput limitations of the framework and is intended to work in conjunction with existing communication mediums. Testing of the framework extensions shows an increase in throughput performance of up to 268x that of the current direct message passing framework at the cost of increased single message latency of up to 2x. This trade-off is considered acceptable as the proposed extensions are intended for bulk data transfer only while the existing message passing functionality of the framework is preserved and can be used in situations where low latency is required for small messages.


2020 ◽  
Vol 245 ◽  
pp. 09016
Author(s):  
Maria Alandes Pradillo ◽  
Nils Høimyr ◽  
Pablo Llopis Sanmillan ◽  
Markus Tapani Jylhänkangas

The CERN IT department has been maintaining different High Performance Computing (HPC) services over the past five years. While the bulk of computing facilities at CERN are running under Linux, a Windows cluster was dedicated for engineering simulations and analysis related to accelerator technology development. The Windows cluster consisted of machines with powerful CPUs, big memory, and a low-latency interconnect. The Linux cluster resources are accessible through HTCondor, and are used for general purpose parallel but single-node type jobs, providing computing power to the CERN experiments and departments for tasks such as physics event reconstruction, data analysis, and simulation. For HPC workloads that require multi-node parallel environments for Message Passing Interface (MPI) based programs, there is another Linux-based HPC service that is comprised of several clusters running under the Slurm batch system, and consist of powerful hardware with low-latency interconnects. In 2018, it was decided to consolidate compute intensive jobs in Linux to make a better use of the existing resources. Moreover, this was also in line with CERN IT strategy to reduce its dependencies on Microsoft products. This paper focuses on the migration of Ansys [1], COMSOL [2] and CST [3] users from Windows HPC to Linux clusters. Ansys, COMSOL and CST are three engineering applications used at CERN for different domains, like multiphysics simulations and electromagnetic field problems. Users of these applications are in different departments, with different needs and levels of expertise. In most cases, the users have no prior knowledge of Linux. The paper will present the technical strategy to allow the engineering users to submit their simulations to the appropriate Linux cluster, depending on their simulation requirements. We also describe the technical solution to integrate their Windows workstations in order from them to be able to submit to Linux clusters. Finally, we discuss the challenges and lessons learnt during the migration.


2019 ◽  
Vol 95 ◽  
pp. 629-638 ◽  
Author(s):  
Jie Guo ◽  
Bin Song ◽  
Yuhao Chi ◽  
Lahiru Jayasinghe ◽  
Chau Yuen ◽  
...  

2001 ◽  
Vol 11 (01) ◽  
pp. 41-56 ◽  
Author(s):  
MARCO DANELUTTO

Beowulf class clusters are gaining more and more interest as low cost parallel architectures. They deliver reasonable performance at a very reasonable cost, compared to classical MPP machines. Parallel applications are usually developed on clusters using MPI/PVM message passing or HPF programming environments. Here we discuss new implementation strategies to support structured parallel programming environments for clusters based on skeletons. The adoption of structured parallel programming models greatly reduces the time spent in developing new parallel applications on clusters. The adoption of our implementation techniques based on macro data flow allows very efficient parallel applications to be developed on clusters. We discuss experiments that demonstrate the full feasibility of the approach.


1997 ◽  
Vol 15 (10) ◽  
pp. 1369-1377 ◽  
Author(s):  
M. I. Beare ◽  
D. P. Stevens

Abstract. This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.


Sign in / Sign up

Export Citation Format

Share Document