High performance communication using a commodity network for cluster systems

Author(s):  
S. Sumimoto ◽  
H. Tezuka ◽  
A. Hori ◽  
H. Harada ◽  
T. Takahashi ◽  
...  
2019 ◽  
pp. 39-44
Author(s):  
Mikhail Rovnyagin ◽  
◽  
Iliya Chugunkov ◽  
Natalya Savchenko ◽  
◽  
...  

Author(s):  
Arutyun I. Avetisyan ◽  
Varvara V. Babkova ◽  
Sergey S. Gaissaryan ◽  
Alexander Yu. Gubar

Author(s):  
Masahiro Nakao ◽  
Tetsuya Odajima ◽  
Hitoshi Murai ◽  
Akihiro Tabuchi ◽  
Norihisa Fujita ◽  
...  

Accelerated clusters, which are cluster systems equipped with accelerators, are one of the most common systems in parallel computing. In order to exploit the performance of such systems, it is important to reduce communication latency between accelerator memories. In addition, there is also a need for a programming language that facilitates the maintenance of high performance by such systems. The goal of the present article is to evaluate XcalableACC (XACC), a parallel programming language, with tightly coupled accelerators/InfiniBand (TCAs/IB) hybrid communication on an accelerated cluster. TCA/IB hybrid communication is a combination of low-latency communication with TCA and high bandwidth with IB. The XACC language, which is a directive-based language for accelerated clusters, enables programmers to use TCA/IB hybrid communication with ease. In order to evaluate the performance of XACC with TCA/IB hybrid communication, we implemented the lattice quantum chromodynamics (LQCD) mini-application and evaluated the application on our accelerated cluster using up to 64 compute nodes. We also implemented the LQCD mini-application using a combination of CUDA and MPI (CUDA + MPI) and that of OpenACC and MPI (OpenACC + MPI) for comparison with XACC. Performance evaluation revealed that the performance of XACC with TCA/IB hybrid communication is 9% better than that of CUDA + MPI and 18% better than that of OpenACC + MPI. Furthermore, the performance of XACC was found to further increase by 7% by new expansion to XACC. Productivity evaluation revealed that XACC requires much less change from the serial LQCD code to implement the parallel LQCD code than CUDA + MPI and OpenACC + MPI. Moreover, since XACC can perform parallelization while maintaining the sequential code image, XACC is highly readable and shows excellent portability due to its directive-based approach.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Bingzheng Li ◽  
Jinchen Xu ◽  
Zijing Liu

With the development of high-performance computing and big data applications, the scale of data transmitted, stored, and processed by high-performance computing cluster systems is increasing explosively. Efficient compression of large-scale data and reducing the space required for data storage and transmission is one of the keys to improving the performance of high-performance computing cluster systems. In this paper, we present SW-LZMA, a parallel design and optimization of LZMA based on the Sunway 26010 heterogeneous many-core processor. Combined with the characteristics of SW26010 processors, we analyse the storage space requirements, memory access characteristics, and hotspot functions of the LZMA algorithm and implement the thread-level parallelism of the LZMA algorithm based on Athread interface. Furthermore, we make a fine-grained layout of LDM address space to achieve DMA double buffer cyclic sliding window algorithm, which optimizes the performance of SW-LZMA. The experimental results show that compared with the serial baseline implementation of LZMA, the parallel LZMA algorithm obtains a maximum speedup ratio of 4.1 times using the Silesia corpus benchmark, while on the large-scale data set, speedup is 5.3 times.


Author(s):  
Bobby Minola Ginting ◽  
Punit Kumar Bhola ◽  
Christoph Ertl ◽  
Ralf-Peter Mundani ◽  
Markus Disse ◽  
...  

Author(s):  
A. V. Crewe ◽  
M. Isaacson ◽  
D. Johnson

A double focusing magnetic spectrometer has been constructed for use with a field emission electron gun scanning microscope in order to study the electron energy loss mechanism in thin specimens. It is of the uniform field sector type with curved pole pieces. The shape of the pole pieces is determined by requiring that all particles be focused to a point at the image slit (point 1). The resultant shape gives perfect focusing in the median plane (Fig. 1) and first order focusing in the vertical plane (Fig. 2).


Author(s):  
N. Yoshimura ◽  
K. Shirota ◽  
T. Etoh

One of the most important requirements for a high-performance EM, especially an analytical EM using a fine beam probe, is to prevent specimen contamination by providing a clean high vacuum in the vicinity of the specimen. However, in almost all commercial EMs, the pressure in the vicinity of the specimen under observation is usually more than ten times higher than the pressure measured at the punping line. The EM column inevitably requires the use of greased Viton O-rings for fine movement, and specimens and films need to be exchanged frequently and several attachments may also be exchanged. For these reasons, a high speed pumping system, as well as a clean vacuum system, is now required. A newly developed electron microscope, the JEM-100CX features clean high vacuum in the vicinity of the specimen, realized by the use of a CASCADE type diffusion pump system which has been essentially improved over its predeces- sorD employed on the JEM-100C.


Author(s):  
John W. Coleman

In the design engineering of high performance electromagnetic lenses, the direct conversion of electron optical design data into drawings for reliable hardware is oftentimes difficult, especially in terms of how to mount parts to each other, how to tolerance dimensions, and how to specify finishes. An answer to this is in the use of magnetostatic analytics, corresponding to boundary conditions for the optical design. With such models, the magnetostatic force on a test pole along the axis may be examined, and in this way one may obtain priority listings for holding dimensions, relieving stresses, etc..The development of magnetostatic models most easily proceeds from the derivation of scalar potentials of separate geometric elements. These potentials can then be conbined at will because of the superposition characteristic of conservative force fields.


Author(s):  
J W Steeds ◽  
R Vincent

We review the analytical powers which will become more widely available as medium voltage (200-300kV) TEMs with facilities for CBED on a nanometre scale come onto the market. Of course, high performance cold field emission STEMs have now been in operation for about twenty years, but it is only in relatively few laboratories that special modification has permitted the performance of CBED experiments. Most notable amongst these pioneering projects is the work in Arizona by Cowley and Spence and, more recently, that in Cambridge by Rodenburg and McMullan.There are a large number of potential advantages of a high intensity, small diameter, focussed probe. We discuss first the advantages for probes larger than the projected unit cell of the crystal under investigation. In this situation we are able to perform CBED on local regions of good crystallinity. Zone axis patterns often contain information which is very sensitive to thickness changes as small as 5nm. In conventional CBED, with a lOnm source, it is very likely that the information will be degraded by thickness averaging within the illuminated area.


Sign in / Sign up

Export Citation Format

Share Document