scholarly journals Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster

Author(s):  
Masahiro Nakao ◽  
Tetsuya Odajima ◽  
Hitoshi Murai ◽  
Akihiro Tabuchi ◽  
Norihisa Fujita ◽  
...  

Accelerated clusters, which are cluster systems equipped with accelerators, are one of the most common systems in parallel computing. In order to exploit the performance of such systems, it is important to reduce communication latency between accelerator memories. In addition, there is also a need for a programming language that facilitates the maintenance of high performance by such systems. The goal of the present article is to evaluate XcalableACC (XACC), a parallel programming language, with tightly coupled accelerators/InfiniBand (TCAs/IB) hybrid communication on an accelerated cluster. TCA/IB hybrid communication is a combination of low-latency communication with TCA and high bandwidth with IB. The XACC language, which is a directive-based language for accelerated clusters, enables programmers to use TCA/IB hybrid communication with ease. In order to evaluate the performance of XACC with TCA/IB hybrid communication, we implemented the lattice quantum chromodynamics (LQCD) mini-application and evaluated the application on our accelerated cluster using up to 64 compute nodes. We also implemented the LQCD mini-application using a combination of CUDA and MPI (CUDA + MPI) and that of OpenACC and MPI (OpenACC + MPI) for comparison with XACC. Performance evaluation revealed that the performance of XACC with TCA/IB hybrid communication is 9% better than that of CUDA + MPI and 18% better than that of OpenACC + MPI. Furthermore, the performance of XACC was found to further increase by 7% by new expansion to XACC. Productivity evaluation revealed that XACC requires much less change from the serial LQCD code to implement the parallel LQCD code than CUDA + MPI and OpenACC + MPI. Moreover, since XACC can perform parallelization while maintaining the sequential code image, XACC is highly readable and shows excellent portability due to its directive-based approach.

1995 ◽  
Vol 06 (05) ◽  
pp. 627-638 ◽  
Author(s):  
ANDREAS FROMMER ◽  
BERTOLD NÖCKEL ◽  
STEPHAN GÜSKEN ◽  
THOMAS LIPPERT ◽  
KLAUS SCHILLING

The computational effort in the calculation of Wilson fermion quark propagators in Lattice Quantum Chromodynamics can be considerably reduced by exploiting the Wilson fermion matrix structure in inversion algorithms based on the non-symmetric Lanczos process. We consider two such methods: QMR (quasi minimal residual) and BCG (biconjugate gradients). Based on the decomposition M/κ = 1/κ−D of the Wilson mass matrix, using QMR, one can carry out inversions on a whole trajectory of masses simultaneously, merely at the computational expense of a single propagator computation. In other words, one has to compute the propagator corresponding to the lightest mass only, while all the heavier masses are given for free, at the price of extra storage. Moreover, the symmetry γ5M = M†γ5 can be used to cut the computational effort in QMR and BCG by a factor of two. We show that both methods then become — in the critical regime of small quark masses — competitive to BiCGStab and significantly better than the standard MR method, with optimal relaxation factor, and CG as applied to the normal equations.


Author(s):  
R. Levi-Setti ◽  
J. M. Chabala ◽  
R. Espinosa ◽  
M. M. Le Beau

We have shown previously that isotope-labelled nucleotides in human metaphase chromosomes can be detected and mapped by imaging secondary ion mass spectrometry (SIMS), using the University of Chicago high resolution scanning ion microprobe (UC SIM). These early studies, conducted with BrdU- and 14C-thymidine-labelled chromosomes via detection of the Br and 28CN- (14C14N-> labelcarrying signals, provided some evidence for the condensation of the label into banding patterns along the chromatids (SIMS bands) reminiscent of the well known Q- and G-bands obtained by conventional staining methods for optical microscopy. The potential of this technique has been greatly enhanced by the recent upgrade of the UC SIM, now coupled to a high performance magnetic sector mass spectrometer in lieu of the previous RF quadrupole mass filter. The high transmission of the new spectrometer improves the SIMS analytical sensitivity of the microprobe better than a hundredfold, overcoming most of the previous imaging limitations resulting from low count statistics.


SLEEP ◽  
2020 ◽  
Author(s):  
Evan D Chinoy ◽  
Joseph A Cuellar ◽  
Kirbie E Huwa ◽  
Jason T Jameson ◽  
Catherine H Watson ◽  
...  

Abstract Study Objectives Consumer sleep-tracking devices are widely used and becoming more technologically advanced, creating strong interest from researchers and clinicians for their possible use as alternatives to standard actigraphy. We therefore tested the performance of many of the latest consumer sleep-tracking devices, alongside actigraphy, versus the gold-standard sleep assessment technique, polysomnography (PSG). Methods In total, 34 healthy young adults (22 women; 28.1 ± 3.9 years, mean ± SD) were tested on three consecutive nights (including a disrupted sleep condition) in a sleep laboratory with PSG, along with actigraphy (Philips Respironics Actiwatch 2) and a subset of consumer sleep-tracking devices. Altogether, four wearable (Fatigue Science Readiband, Fitbit Alta HR, Garmin Fenix 5S, Garmin Vivosmart 3) and three non-wearable (EarlySense Live, ResMed S+, SleepScore Max) devices were tested. Sleep/wake summary and epoch-by-epoch agreement measures were compared with PSG. Results Most devices (Fatigue Science Readiband, Fitbit Alta HR, EarlySense Live, ResMed S+, SleepScore Max) performed as well as or better than actigraphy on sleep/wake performance measures, while the Garmin devices performed worse. Overall, epoch-by-epoch sensitivity was high (all ≥0.93), specificity was low-to-medium (0.18-0.54), sleep stage comparisons were mixed, and devices tended to perform worse on nights with poorer/disrupted sleep. Conclusions Consumer sleep-tracking devices exhibited high performance in detecting sleep, and most performed equivalent to (or better than) actigraphy in detecting wake. Device sleep stage assessments were inconsistent. Findings indicate that many newer sleep-tracking devices demonstrate promising performance for tracking sleep and wake. Devices should be tested in different populations and settings to further examine their wider validity and utility.


Author(s):  
Bálint Joó ◽  
Mike A. Clark

The QUDA library for optimized lattice quantum chromodynamics using GPUs, combined with a high-level application framework such as the Chroma software system, provides a powerful tool for computing quark propagators, a key step in current calculations of hadron spectroscopy, nuclear structure, and nuclear forces. In this contribution we discuss our experiences, including performance and strong scaling of the QUDA library and Chroma on the Edge Cluster at Lawrence Livermore National Laboratory and on various clusters at Jefferson Lab. We highlight some scientific successes and consider future directions for graphics processing units in lattice quantum chromodynamics calculations.


2019 ◽  
Vol 485 (3) ◽  
pp. 3370-3377 ◽  
Author(s):  
Lehman H Garrison ◽  
Daniel J Eisenstein ◽  
Philip A Pinto

Abstract We present a high-fidelity realization of the cosmological N-body simulation from the Schneider et al. code comparison project. The simulation was performed with our AbacusN-body code, which offers high-force accuracy, high performance, and minimal particle integration errors. The simulation consists of 20483 particles in a $500\ h^{-1}\, \mathrm{Mpc}$ box for a particle mass of $1.2\times 10^9\ h^{-1}\, \mathrm{M}_\odot$ with $10\ h^{-1}\, \mathrm{kpc}$ spline softening. Abacus executed 1052 global time-steps to z = 0 in 107 h on one dual-Xeon, dual-GPU node, for a mean rate of 23 million particles per second per step. We find Abacus is in good agreement with Ramses and Pkdgrav3 and less so with Gadget3. We validate our choice of time-step by halving the step size and find sub-percent differences in the power spectrum and 2PCF at nearly all measured scales, with ${\lt }0.3{{\ \rm per\ cent}}$ errors at $k\lt 10\ \mathrm{Mpc}^{-1}\, h$. On large scales, Abacus reproduces linear theory better than 0.01 per cent. Simulation snapshots are available at http://nbody.rc.fas.harvard.edu/public/S2016.


1998 ◽  
Vol 514 ◽  
Author(s):  
D. Edelstein

ABSTRACTRecently IBM announced the first implementation of full copper ULSI wiring in a CMOS technology, to be manufactured on its high-performance 0.22 um CMOS products this year. Features of this technology will be presented, as well as functional verification on CMOS chips. To reach this level, extensive yield, reliability, and stress testing had to be done on test and product-like chips, including those packaged into product modules. Data will be presented fom all aspects of this testing, ranging from experiments designed to promote Cu contamination of the MOS devices, to temperature/humidity/bias stressing of assembled functional modules. The results in all areas are shown to be equal to or better than standards set by our current AI(Cu)/Wstud technology. This demonstrates that the potential problems associated with copper wiring that have long been discussed can be overcome.


Author(s):  
P. Laurent ◽  
F. Acero ◽  
V. Beckmann ◽  
S. Brandt ◽  
F. Cangemi ◽  
...  

AbstractBased upon dual focusing techniques, the Polarimetric High-Energy Modular Telescope Observatory (PHEMTO) is designed to have performance several orders of magnitude better than the present hard X-ray instruments, in the 1–600 keV energy range. This, together with its angular resolution of around one arcsecond, and its sensitive polarimetry measurement capability, will give PHEMTO the improvements in scientific performance needed for a mission in the 2050 era in order to study AGN, galactic black holes, neutrons stars, and supernovae. In addition, its high performance will enable the study of the non-thermal processes in galaxy clusters with an unprecedented accuracy.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Hang Li ◽  
Jiamin Liu ◽  
Zhanzhong Wang ◽  
Xiaodong Liu ◽  
Xichun Yan ◽  
...  

Abstract With chili and liquid beef tallow as the main raw materials, the processing conditions of chili flavor beef tallow were explored. Gas chromatograpy-ion mobility spectrometry (GC-IMS) was used to determine the volatile compounds in chili flavor beef tallow. The capsaicin and dihydrocapsaicin in chili flavor beef tallow were determined by high performance liquid chromatography (HPLC). The optimum technological conditions were determined, and the index of chromatic aberration, cholesterol was also determined. Based on GC-IMS analysis, 102 kinds of volatile compounds were detected, and the sample III (the ratio of solid–liquid was 1:5, the frying temperature was 120 °C, and the frying time was 15 min) performed better than other samples. The preparation of chili beef tallow improves its antioxidant activity and makes its aroma more intense and more in line with the taste of Chinese people, which provides a theoretical and practical basis for the development of spice beef tallow in the future.


Sign in / Sign up

Export Citation Format

Share Document