scholarly journals Methods to Load Balance a GCR Pressure Solver Using a Stencil Framework on Multi- and Many-Core Architectures

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Milosz Ciznicki ◽  
Michal Kulczewski ◽  
Piotr Kopta ◽  
Krzysztof Kurowski

The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimisation techniques to benefit from new high-performance computing machines. Extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high-level stencil framework implemented for the EULerian or LAGrangian model (EULAG) that efficiently utilises multi- and many-cores architectures. Only an efficient usage of both many-core processors (CPUs) and graphics processing units (GPUs) with the flexible data decomposition method can lead to the maximum performance that scales the communication-intensive Generalized Conjugate Residual (GCR) elliptic solver with preconditioner.

Author(s):  
Lidong Wang

Visualization with graphs is popular in the data analysis of Information Technology (IT) networks or computer networks. An IT network is often modelled as a graph with hosts being nodes and traffic being flows on many edges. General visualization methods are introduced in this paper. Applications and technology progress of visualization in IT network analysis and big data in IT network visualization are presented. The challenges of visualization and Big Data analytics in IT network visualization are also discussed. Big Data analytics with High Performance Computing (HPC) techniques, especially Graphics Processing Units (GPUs) helps accelerate IT network analysis and visualization.


Author(s):  
Timothy Dykes ◽  
Claudio Gheller ◽  
Marzia Rivi ◽  
Mel Krokos

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.


Author(s):  
Ana Moreton–Fernandez ◽  
Hector Ortega–Arranz ◽  
Arturo Gonzalez–Escribano

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.


2021 ◽  
Author(s):  
Matthias Arzt ◽  
Joran Deschamps ◽  
Christopher Schmied ◽  
Tobias Pietzsch ◽  
Deborah Schmidt ◽  
...  

We present Labkit, a user-friendly Fiji plugin for the segmentation of microscopy image data. It offers easy to use manual and automated image segmentation routines that can be rapidly applied to single- and multi-channel images as well as to timelapse movies in 2D or 3D. Labkit is specifically designed to work efficiently on big image data and enables users of consumer laptops to conveniently work with multiple-terabyte images. This efficiency is achieved by using ImgLib2 and BigDataViewer as the foundation of our software. Furthermore, memory efficient and fast random forest based pixel classification inspired by the Waikato Environment for Knowledge Analysis (Weka) is implemented. Optionally we harness the power of graphics processing units (GPU) to gain additional runtime performance. Labkit is easy to install on virtually all laptops and workstations. Additionally, Labkit is compatible with high performance computing (HPC) clusters for distributed processing of big image data. The ability to use pixel classifiers trained in Labkit via the ImageJ macro language enables our users to integrate this functionality as a processing step in automated image processing workflows. Last but not least, Labkit comes with rich online resources such as tutorials and examples that will help users to familiarize themselves with available features and how to best use \Labkit in a number of practical real-world use-cases.


2020 ◽  
Vol 22 (5) ◽  
pp. 1217-1235 ◽  
Author(s):  
M. Morales-Hernández ◽  
M. B. Sharif ◽  
S. Gangrade ◽  
T. T. Dullo ◽  
S.-C. Kao ◽  
...  

Abstract This work presents a vision of future water resources hydrodynamics codes that can fully utilize the strengths of modern high-performance computing (HPC). The advances to computing power, formerly driven by the improvement of central processing unit processors, now focus on parallel computing and, in particular, the use of graphics processing units (GPUs). However, this shift to a parallel framework requires refactoring the code to make efficient use of the data as well as changing even the nature of the algorithm that solves the system of equations. These concepts along with other features such as the precision for the computations, dry regions management, and input/output data are analyzed in this paper. A 2D multi-GPU flood code applied to a large-scale test case is used to corroborate our statements and ascertain the new challenges for the next-generation parallel water resources codes.


2014 ◽  
Vol 27 (13) ◽  
pp. 3403-3414 ◽  
Author(s):  
Yu-Shiang Lin ◽  
Chun-Yuan Lin ◽  
Che-Lun Hung ◽  
Yeh-Ching Chung ◽  
Kual-Zheng Lee

Sign in / Sign up

Export Citation Format

Share Document