HCGrid: A convolution-based gridding framework for radio astronomy in hybrid computing environments

Abstract Gridding operation, which is to map non-uniform data samples on to a uniformly distributed grid, is one of the key steps in radio astronomical data reduction process. One of the main bottlenecks of gridding is the poor computing performance, and a typical solution for such performance issue is the implementation of multi-core CPU platforms. Although such a method could usually achieve good results, in many cases, the performance of gridding is still restricted to an extent due to the limitations of CPU, since the main workload of gridding is a combination of a large number of single instruction, multi-data-stream operations, which is more suitable for GPU, rather than CPU implementations. To meet the challenge of massive data gridding for the modern large single-dish radio telescopes, e.g. the Five-hundred-meter Aperture Spherical radio Telescope (FAST), inspired by existing multi-core CPU gridding algorithms such as Cygrid, here we present an easy-to-install, high-performance, and open-source convolutional gridding framework, HCGrid, in CPU-GPU heterogeneous platforms. It optimises data search by employing multi-threading on CPU, and accelerates the convolution process by utilising massive parallelisation of GPU. In order to make HCGrid a more adaptive solution, we also propose the strategies of thread organisation and coarsening, as well as optimal parameter settings under various GPU architectures. A thorough analysis of computing time and performance gain with several GPU parallel optimisation strategies show that it can lead to excellent performance in hybrid computing environments.

Download Full-text

Coordinated Energy Management in Heterogeneous Processors

Scientific Programming ◽

10.1155/2014/210762 ◽

2014 ◽

Vol 22 (2) ◽

pp. 93-108 ◽

Cited By ~ 4

Author(s):

Indrani Paul ◽

Vignesh Ravi ◽

Srilatha Manne ◽

Manish Arora ◽

Sudhakar Yalamanchili

Keyword(s):

Performance Management ◽

Energy Management ◽

Performance Optimization ◽

High Performance ◽

Average Energy ◽

Management Algorithm ◽

Tightly Coupled ◽

Heterogeneous Processor ◽

And Performance ◽

Gpu Architectures

This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

Download Full-text

Replicated Computational Results (RCR) Report for “Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software”

ACM Transactions on Mathematical Software ◽

10.1145/3446000 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-4

Author(s):

Sarah Osborn

Keyword(s):

Linear Algebra ◽

High Performance ◽

Numerical Linear Algebra ◽

Practical Implementation ◽

Test Problems ◽

And Performance ◽

Nvidia Gpu ◽

Gpu Architectures ◽

Performance Results ◽

Conjugate Gradient Solver

The article by Flegar et al. titled “Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software” presents a novel, practical implementation of an adaptive precision block-Jacobi preconditioner. Performance results using state-of-the-art GPU architectures for the block-Jacobi preconditioner generation and application demonstrate the practical usability of the method, compared to a traditional full-precision block-Jacobi preconditioner. A production-ready implementation is provided in the Ginkgo numerical linear algebra library. In this report, the Ginkgo library is reinstalled and performance results are generated to perform a comparison to the original results when using Ginkgo’s Conjugate Gradient solver with either the full or the adaptive precision block-Jacobi preconditioner for a suite of test problems on an NVIDIA GPU accelerator. After completing this process, the published results are deemed reproducible.

Download Full-text

Trace nanoanalysis

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100172437 ◽

1994 ◽

Vol 52 ◽

pp. 940-941

Author(s):

D. E. Newbury ◽

R. D. Leapman

Keyword(s):

High Performance ◽

Secondary Ion Mass Spectrometry ◽

Detection Efficiency ◽

Volume Element ◽

And Performance ◽

Trace Constituents ◽

Secondary Ion ◽

Concentration Levels ◽

Structure Properties

Trace constituents, which can be very loosely defined as those present at concentration levels below 1 percent, often exert influence on structure, properties, and performance far greater than what might be estimated from their proportion alone. Defining the role of trace constituents in the microstructure, or indeed even determining their location, makes great demands on the available array of microanalytical tools. These demands become increasingly more challenging as the dimensions of the volume element to be probed become smaller. For example, a cubic volume element of silicon with an edge dimension of 1 micrometer contains approximately 5×1010 atoms. High performance secondary ion mass spectrometry (SIMS) can be used to measure trace constituents to levels of hundreds of parts per billion from such a volume element (e. g., detection of at least 100 atoms to give 10% reproducibility with an overall detection efficiency of 1%, considering ionization, transmission, and counting).

Download Full-text

https://imsciences.edu.pk/files/journals/vol12_2/New%201%20MA.864.pdf

Business & Economic Review ◽

10.22547/ber/12.2.2 ◽

2020 ◽

Vol 12 (2) ◽

pp. 19-50 ◽

Cited By ~ 1

Author(s):

Muhammad Siddique ◽

Shandana Shoaib ◽

Zahoor Jan

Keyword(s):

Organizational Performance ◽

Structural Equation ◽

High Performance ◽

Service Sector ◽

Performance Outcomes ◽

Theory And Practice ◽

Relational Coordination ◽

Multiple Sources ◽

And Performance ◽

High Level

A key aspect of work processes in service sector firms is the interconnection between tasks and performance. Relational coordination can play an important role in addressing the issues of coordinating organizational activities due to high level of interdependence complexity in service sector firms. Research has primarily supported the aspect that well devised high performance work systems (HPWS) can intensify organizational performance. There is a growing debate, however, with regard to understanding the “mechanism” linking HPWS and performance outcomes. Using relational coordination theory, this study examines a model that examine the effects of subsets of HPWS, such as motivation, skills and opportunity enhancing HR practices on relational coordination among employees working in reciprocal interdependent job settings. Data were gathered from multiple sources including managers and employees at individual, functional and unit levels to know their understanding in relation to HPWS and relational coordination (RC) in 218 bank branches in Pakistan. Data analysis via structural equation modelling, results suggest that HPWS predicted RC among officers at the unit level. The findings of the study have contributions to both, theory and practice.

Download Full-text

Study on strong spinning preparation method and mechanism of the industrial pure titanium ultrafine crystallization

Journal of Engineered Fibers and Fabrics ◽

10.1177/1558925019895256 ◽

2019 ◽

Vol 14 ◽

pp. 155892501989525

Author(s):

Yu Yang ◽

Yanyan Jia

Keyword(s):

Heat Treatment ◽

High Performance ◽

Pure Titanium ◽

Grain Structure ◽

Theoretical Research ◽

Ultrafine Grain ◽

Forming Process ◽

Spinning Process ◽

And Performance ◽

Material Utilization

Ultrafine crystallization of industrial pure titanium allowed for higher tensile strength, corrosion resistance, and thermal stability and is therefore widely used in medical instrumentation, aerospace, and passenger vehicle manufacturing. However, the ultrafine crystallizing batch preparation of tubular industrial pure titanium is limited by the development of the spinning process and has remained at the theoretical research stage. In this article, the tubular TA2 industrial pure titanium was taken as the research object, and the ultrafine crystal forming process based on “5-pass strong spin-heat treatment-3 pass-spreading-heat treatment” was proposed. Based on the spinning process test, the ultimate thinning rate of the method is explored and the evolution of the surface microstructure was analyzed by metallographic microscope. The research suggests that the multi-pass, medium–small, and thinning amount of spinning causes the grain structure to be elongated in the axial and tangential directions, and then refined, and the axial fiber uniformity is improved. The research results have certain scientific significance for reducing the consumption of high-performance metals improving material utilization and performance, which also promote the development of ultrafine-grain metals’ preparation technology.

Download Full-text

High-Performance Image Filters via Sparse Approximations

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406182 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Kersten Schuster ◽

Philip Trettner ◽

Leif Kobbelt

Keyword(s):

High Performance ◽

Hardware Acceleration ◽

Optimization Method ◽

Translation Invariant ◽

Approximation Quality ◽

Trade Offs ◽

Sparse Approximations ◽

Image Filters ◽

Good Trade ◽

And Performance

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

Download Full-text

Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

Electronics ◽

10.3390/electronics10131587 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1587

Author(s):

Duo Sheng ◽

Hsueh-Ru Lin ◽

Li Tai

Keyword(s):

High Performance ◽

Power Reduction ◽

System On Chip ◽

Timing Information ◽

Multi Stage ◽

Dynamic Voltage ◽

And Performance ◽

On Chip ◽

Maximum Measurement ◽

Maximum Measurement Error

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.

Download Full-text

Ship Handling in Unprotected Waters: A Review of New Technologies in Escort Tugs to Improve Safety

Applied Mechanics ◽

10.3390/applmech2010004 ◽

2021 ◽

Vol 2 (1) ◽

pp. 46-62

Author(s):

Santiago Iglesias-Baniela ◽

Juan Vinagre-Ríos ◽

José M. Pérez-Canosa

Keyword(s):

High Performance ◽

New Technologies ◽

Weather Conditions ◽

Hull Form ◽

Best Available Technologies ◽

Adverse Weather Conditions ◽

Adverse Weather ◽

New Type ◽

Technological Challenge ◽

And Performance

It is a well-known fact that the 1989 Exxon Valdez disaster caused the escort towing of laden tankers in many coastal areas of the world to become compulsory. In order to implement a new type of escort towing, specially designed to be employed in very adverse weather conditions, considerable changes in the hull form of escort tugs had to be made to improve their stability and performance. Since traditional winch and ropes technologies were only effective in calm waters, tugs had to be fitted with new devices. These improvements allowed the remodeled tugs to counterbalance the strong forces generated by the maneuvers in open waters. The aim of this paper is to perform a comprehensive literature review of the new high-performance automatic dynamic winches. Furthermore, a thorough analysis of the best available technologies regarding towline, essential to properly exploit the new winches, will be carried out. Through this review, the way in which the escort towing industry has faced this technological challenge is shown.

Download Full-text

On the Use of Containers in High Performance Computing Environments

2020 IEEE 13th International Conference on Cloud Computing (CLOUD) ◽

10.1109/cloud49709.2020.00048 ◽

2020 ◽

Author(s):

Subil Abraham ◽

Arnab K. Paul ◽

Redwan Ibne Seraj Khan ◽

Ali R. Butt

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing Environments ◽

Performance Computing

Download Full-text

HPC Cloud Architecture to Reduce HPC Workflow Complexity in Containerized Environments

Applied Sciences ◽

10.3390/app11030923 ◽

2021 ◽

Vol 11 (3) ◽

pp. 923

Author(s):

Guohua Li ◽

Joon Woo ◽

Sang Boem Lim

Keyword(s):

High Performance ◽

Cloud Services ◽

Workload Management ◽

Job Management ◽

Security Issues ◽

Cloud Architecture ◽

Management Efficiency ◽

Complexity Problem ◽

And Performance ◽

Hpc Cloud

The complexity of high-performance computing (HPC) workflows is an important issue in the provision of HPC cloud services in most national supercomputing centers. This complexity problem is especially critical because it affects HPC resource scalability, management efficiency, and convenience of use. To solve this problem, while exploiting the advantage of bare-metal-level high performance, container-based cloud solutions have been developed. However, various problems still exist, such as an isolated environment between HPC and the cloud, security issues, and workload management issues. We propose an architecture that reduces this complexity by using Docker and Singularity, which are the container platforms most often used in the HPC cloud field. This HPC cloud architecture integrates both image management and job management, which are the two main elements of HPC cloud workflows. To evaluate the serviceability and performance of the proposed architecture, we developed and implemented a platform in an HPC cluster experiment. Experimental results indicated that the proposed HPC cloud architecture can reduce complexity to provide supercomputing resource scalability, high performance, user convenience, various HPC applications, and management efficiency.

Download Full-text