Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.

Download Full-text

Thallo – Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers

ACM Transactions on Graphics ◽

10.1145/3453986 ◽

2021 ◽

Vol 40 (5) ◽

pp. 1-14

Author(s):

Michael Mara ◽

Felix Heide ◽

Michael Zollhöfer ◽

Matthias Nießner ◽

Pat Hanrahan

Keyword(s):

Least Squares ◽

High Performance ◽

Large Scale ◽

Optimization Problems ◽

Bundle Adjustment ◽

Large Set ◽

Linear Least Squares ◽

Non Linear ◽

Recent Developments ◽

Scale Optimization

Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.

Download Full-text

Reducing communication in parallel graph search algorithms with software caches

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018762510 ◽

2018 ◽

Vol 33 (2) ◽

pp. 384-396

Author(s):

Pietro Cicotti ◽

Manu Shantharam ◽

Laura Carrington

Keyword(s):

High Performance ◽

Large Scale ◽

Small World ◽

Relevant Information ◽

Graph Search ◽

Small World Networks ◽

Problem Size ◽

Performance Improvements ◽

Reference Implementation ◽

High Degree

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on the characteristics of small-world networks and cache relevant information of high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search benchmark. Our experiment with different implementations demonstrated significant performance improvements over the reference implementation in several configurations, using 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching and shared among a set of nodes. Our evaluation demonstrates that this design reduces communication and has the potential to improve performance on large-scale systems in which the communication cost increases significantly with the distance between nodes. We also tested a memcached system as the cache server finding that its generic protocol, which does not match our usage semantics, hinders significantly the potential performance improvements and suggested that a generic system should also support a basic and lightweight communication protocol to meet the needs of high-performance computing applications. Finally, we explored different configurations to find efficient ways to utilize the resources allocated to solve a given problem size; to this extent, we found utilizing half of the compute cores per allocated node improves performance, and even in this case, caching variants always outperform the reference implementation.

Download Full-text

A Distributed Computing Tool for Generating Neural Simulation Databases

Neural Computation ◽

10.1162/neco.2006.18.12.2923 ◽

2006 ◽

Vol 18 (12) ◽

pp. 2923-2927 ◽

Cited By ~ 3

Author(s):

Robert J. Calin-Jageman ◽

Paul S. Katz

Keyword(s):

High Performance ◽

Large Scale ◽

Model Neuron ◽

Scale Model ◽

Large Set ◽

Experimental Conditions ◽

Entry Level ◽

Neural Simulation ◽

Wide Range ◽

Parameter Values

After developing a model neuron or network, it is important to systematically explore its behavior across a wide range of parameter values or experimental conditions, or both. However, compiling a very large set of simulation runs is challenging because it typically requires both access to and expertise with high-performance computing facilities. To lower the barrier for large-scale model analysis, we have developed NeuronPM, a client/server application that creates a “screen-saver” cluster for running simulations in NEURON (Hines & Carnevale, 1997). NeuronPM provides a user-friendly way to use existing computing resources to catalog the performance of a neural simulation across a wide range of parameter values and experimental conditions. The NeuronPM client is a Windows-based screen saver, and the NeuronPM server can be hosted on any Apache/PHP/MySQL server. During idle time, the client retrieves model files and work assignments from the server, invokes NEURON to run the simulation, and returns results to the server. Administrative panels make it simple to upload model files, define the parameters and conditions to vary, and then monitor client status and work progress. NeuronPM is open-source freeware and is available for download at http://neuronpm.homeip.net . It is a useful entry-level tool for systematically analyzing complex neuron and network simulations.

Download Full-text

IDNS: A High-Performance Model for Identification of DNS Infrastructures on Large-scale Traffic

2019 IEEE Symposium on Computers and Communications (ISCC) ◽

10.1109/iscc47284.2019.8969667 ◽

2019 ◽

Author(s):

Caiyun Huang ◽

Yujia Zhu ◽

Yong Sun ◽

Qingyun Liu ◽

Binxing Fang

Keyword(s):

High Performance ◽

Large Scale ◽

Performance Model

Download Full-text

Massively Parallel Simulation of Large-Scale Electromagnetic Problems Using One High-Performance Computing Scheme and Domain Decomposition Method

IEEE Transactions on Electromagnetic Compatibility ◽

10.1109/temc.2017.2656891 ◽

2017 ◽

Vol 59 (5) ◽

pp. 1523-1531 ◽

Cited By ~ 18

Author(s):

Wei-Jie Wang ◽

Ran Xu ◽

Han-Yu Li ◽

Yang Liu ◽

Xing-Yue Guo ◽

...

Keyword(s):

High Performance Computing ◽

Decomposition Method ◽

High Performance ◽

Large Scale ◽

Domain Decomposition Method ◽

Parallel Simulation ◽

Massively Parallel ◽

Electromagnetic Problems ◽

Performance Computing ◽

Computing Scheme

Download Full-text

High-Performance Modeling of Carbon Dioxide Sequestration by Coupling Reservoir Simulation and Molecular Dynamics

SPE Journal ◽

10.2118/163621-pa ◽

2016 ◽

Vol 21 (03) ◽

pp. 0853-0863 ◽

Cited By ~ 3

Author(s):

Kai Bao ◽

Mi Yan ◽

Rebecca Allen ◽

Amgad Salama ◽

Ligang Lu ◽

...

Keyword(s):

Experimental Data ◽

Carbon Dioxide ◽

Molecular Dynamics ◽

Reservoir Simulation ◽

Co2 Sequestration ◽

High Performance ◽

Large Scale ◽

Md Simulations ◽

Massively Parallel ◽

Geological Conditions

Summary The present work describes a parallel computational framework for carbon dioxide (CO2) sequestration simulation by coupling reservoir simulation and molecular dynamics (MD) on massively parallel high-performance-computing (HPC) systems. In this framework, a parallel reservoir simulator, reservoir-simulation toolbox (RST), solves the flow and transport equations that describe the subsurface flow behavior, whereas the MD simulations are performed to provide the required physical parameters. Technologies from several different fields are used to make this novel coupled system work efficiently. One of the major applications of the framework is the modeling of large-scale CO2 sequestration for long-term storage in subsurface geological formations, such as depleted oil and gas reservoirs and deep saline aquifers, which has been proposed as one of the few attractive and practical solutions to reduce CO2 emissions and address the global-warming threat. Fine grids and accurate prediction of the properties of fluid mixtures under geological conditions are essential for accurate simulations. In this work, CO2 sequestration is presented as a first example for coupling reservoir simulation and MD, although the framework can be extended naturally to the full multiphase multicomponent compositional flow simulation to handle more complicated physical processes in the future. Accuracy and scalability analysis are performed on an IBM BlueGene/P and on an IBM BlueGene/Q, the latest IBM supercomputer. Results show good accuracy of our MD simulations compared with published data, and good scalability is observed with the massively parallel HPC systems. The performance and capacity of the proposed framework are well-demonstrated with several experiments with hundreds of millions to one billion cells. To the best of our knowledge, the present work represents the first attempt to couple reservoir simulation and molecular simulation for large-scale modeling. Because of the complexity of subsurface systems, fluid thermodynamic properties over a broad range of temperature, pressure, and composition under different geological conditions are required, although the experimental results are limited. Although equations of state can reproduce the existing experimental data within certain ranges of conditions, their extrapolation out of the experimental data range is still limited. The present framework will definitely provide better flexibility and predictability compared with conventional methods.

Download Full-text

Performance Measurement and Analysis of Large-Scale Parallel Applications on Leadership Computing Systems

Scientific Programming ◽

10.1155/2008/632685 ◽

2008 ◽

Vol 16 (2-3) ◽

pp. 167-181 ◽

Cited By ~ 11

Author(s):

Brian J.N. Wylie ◽

Markus Geimer ◽

Felix Wolf

Keyword(s):

Performance Measurement ◽

Message Passing ◽

High Performance ◽

Large Scale ◽

Weather Prediction ◽

Parallel Applications ◽

Application Development ◽

Measurement And Analysis ◽

High Performance Systems ◽

Blue Gene

Developers of applications with large-scale computing requirements are currently presented with a variety of high-performance systems optimised for message-passing, however, effectively exploiting the available computing resources remains a major challenge. In addition to fundamental application scalability characteristics, application and system peculiarities often only manifest at extreme scales, requiring highly scalable performance measurement and analysis tools that are convenient to incorporate in application development and tuning activities. We present our experiences with a multigrid solver benchmark and state-of-the-art real-world applications for numerical weather prediction and computational fluid dynamics, on three quite different multi-thousand-processor supercomputer systems – Cray XT3/4, MareNostrum & Blue Gene/L – using the newly-developed SCALASCA toolset to quantify and isolate a range of significant performance issues.

Download Full-text

Semantic Network Array Processor as a massively parallel computing platform for high performance and large-scale natural language processing

10.3115/992133.992195 ◽

1992 ◽

Cited By ~ 4

Author(s):

Hiroaki Kitano ◽

Dan Moldovan

Keyword(s):

Parallel Computing ◽

Natural Language Processing ◽

Language Processing ◽

High Performance ◽

Large Scale ◽

Semantic Network ◽

Massively Parallel ◽

Array Processor ◽

Computing Platform ◽

Massively Parallel Computing

Download Full-text

Exploiting Neuron and Synapse Filter Dynamics in Spatial Temporal Learning of Deep Spiking Neural Network

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/388 ◽

2020 ◽

Cited By ~ 2

Author(s):

Haowen Fang ◽

Amar Shrestha ◽

Ziyi Zhao ◽

Qinru Qiu

Keyword(s):

High Performance ◽

Large Scale ◽

Temporal Dynamics ◽

Spike Timing ◽

Performance Model ◽

Temporal Information ◽

Infinite Impulse Response ◽

Training Algorithms ◽

Training Algorithm ◽

Public Datasets

The recently discovered spatial-temporal information processing capability of bio-inspired Spiking neural networks (SNN) has enabled some interesting models and applications. However designing large-scale and high-performance model is yet a challenge due to the lack of robust training algorithms. A bio-plausible SNN model with spatial-temporal property is a complex dynamic system. Synapses and neurons behave as filters capable of preserving temporal information. As such neuron dynamics and filter effects are ignored in existing training algorithms, the SNN downgrades into a memoryless system and loses the ability of temporal signal processing. Furthermore, spike timing plays an important role in information representation, but conventional rate-based spike coding models only consider spike trains statistically, and discard information carried by its temporal structures. To address the above issues, and exploit the temporal dynamics of SNNs, we formulate SNN as a network of infinite impulse response (IIR) filters with neuron nonlinearity. We proposed a training algorithm that is capable to learn spatial-temporal patterns by searching for the optimal synapse filter kernels and weights. The proposed model and training algorithm are applied to construct associative memories and classifiers for synthetic and public datasets including MNIST, NMNIST, DVS 128 etc. Their accuracy outperforms state-of-the-art approaches.

Download Full-text

Finite-difference methods in numerical weather prediction

Proceedings of the Royal Society of London Series A - Mathematical and Physical Sciences ◽

10.1098/rspa.1971.0105 ◽

1971 ◽

Vol 323 (1553) ◽

pp. 285-292

Keyword(s):

Finite Difference ◽

Initial Data ◽

Numerical Weather Prediction ◽

Large Scale ◽

Weather Prediction ◽

Finite Difference Methods ◽

Adequate Description ◽

Numerical Weather ◽

Difference Methods ◽

Grid Length

Some problems associated with numerical weather prediction are discussed. For an adequate description of the growth of frontal depressions a grid length at least as small as 100 km is required. Boundary conditions must be chosen to suppress the generation of short gravity wave components, and care must be taken to avoid inconsistences in the initial data which may give rise to spurious large scale oscillations.

Download Full-text