scholarly journals Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems

Author(s):  
Martin Schreiber ◽  
Pedro S Peixoto ◽  
Terry Haut ◽  
Beth Wingate

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.

2021 ◽  
Vol 40 (5) ◽  
pp. 1-14
Author(s):  
Michael Mara ◽  
Felix Heide ◽  
Michael Zollhöfer ◽  
Matthias Nießner ◽  
Pat Hanrahan

Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.


Author(s):  
Pietro Cicotti ◽  
Manu Shantharam ◽  
Laura Carrington

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on the characteristics of small-world networks and cache relevant information of high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search benchmark. Our experiment with different implementations demonstrated significant performance improvements over the reference implementation in several configurations, using 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching and shared among a set of nodes. Our evaluation demonstrates that this design reduces communication and has the potential to improve performance on large-scale systems in which the communication cost increases significantly with the distance between nodes. We also tested a memcached system as the cache server finding that its generic protocol, which does not match our usage semantics, hinders significantly the potential performance improvements and suggested that a generic system should also support a basic and lightweight communication protocol to meet the needs of high-performance computing applications. Finally, we explored different configurations to find efficient ways to utilize the resources allocated to solve a given problem size; to this extent, we found utilizing half of the compute cores per allocated node improves performance, and even in this case, caching variants always outperform the reference implementation.


2006 ◽  
Vol 18 (12) ◽  
pp. 2923-2927 ◽  
Author(s):  
Robert J. Calin-Jageman ◽  
Paul S. Katz

After developing a model neuron or network, it is important to systematically explore its behavior across a wide range of parameter values or experimental conditions, or both. However, compiling a very large set of simulation runs is challenging because it typically requires both access to and expertise with high-performance computing facilities. To lower the barrier for large-scale model analysis, we have developed NeuronPM, a client/server application that creates a “screen-saver” cluster for running simulations in NEURON (Hines & Carnevale, 1997). NeuronPM provides a user-friendly way to use existing computing resources to catalog the performance of a neural simulation across a wide range of parameter values and experimental conditions. The NeuronPM client is a Windows-based screen saver, and the NeuronPM server can be hosted on any Apache/PHP/MySQL server. During idle time, the client retrieves model files and work assignments from the server, invokes NEURON to run the simulation, and returns results to the server. Administrative panels make it simple to upload model files, define the parameters and conditions to vary, and then monitor client status and work progress. NeuronPM is open-source freeware and is available for download at http://neuronpm.homeip.net . It is a useful entry-level tool for systematically analyzing complex neuron and network simulations.


SPE Journal ◽  
2016 ◽  
Vol 21 (03) ◽  
pp. 0853-0863 ◽  
Author(s):  
Kai Bao ◽  
Mi Yan ◽  
Rebecca Allen ◽  
Amgad Salama ◽  
Ligang Lu ◽  
...  

Summary The present work describes a parallel computational framework for carbon dioxide (CO2) sequestration simulation by coupling reservoir simulation and molecular dynamics (MD) on massively parallel high-performance-computing (HPC) systems. In this framework, a parallel reservoir simulator, reservoir-simulation toolbox (RST), solves the flow and transport equations that describe the subsurface flow behavior, whereas the MD simulations are performed to provide the required physical parameters. Technologies from several different fields are used to make this novel coupled system work efficiently. One of the major applications of the framework is the modeling of large-scale CO2 sequestration for long-term storage in subsurface geological formations, such as depleted oil and gas reservoirs and deep saline aquifers, which has been proposed as one of the few attractive and practical solutions to reduce CO2 emissions and address the global-warming threat. Fine grids and accurate prediction of the properties of fluid mixtures under geological conditions are essential for accurate simulations. In this work, CO2 sequestration is presented as a first example for coupling reservoir simulation and MD, although the framework can be extended naturally to the full multiphase multicomponent compositional flow simulation to handle more complicated physical processes in the future. Accuracy and scalability analysis are performed on an IBM BlueGene/P and on an IBM BlueGene/Q, the latest IBM supercomputer. Results show good accuracy of our MD simulations compared with published data, and good scalability is observed with the massively parallel HPC systems. The performance and capacity of the proposed framework are well-demonstrated with several experiments with hundreds of millions to one billion cells. To the best of our knowledge, the present work represents the first attempt to couple reservoir simulation and molecular simulation for large-scale modeling. Because of the complexity of subsurface systems, fluid thermodynamic properties over a broad range of temperature, pressure, and composition under different geological conditions are required, although the experimental results are limited. Although equations of state can reproduce the existing experimental data within certain ranges of conditions, their extrapolation out of the experimental data range is still limited. The present framework will definitely provide better flexibility and predictability compared with conventional methods.


2008 ◽  
Vol 16 (2-3) ◽  
pp. 167-181 ◽  
Author(s):  
Brian J.N. Wylie ◽  
Markus Geimer ◽  
Felix Wolf

Developers of applications with large-scale computing requirements are currently presented with a variety of high-performance systems optimised for message-passing, however, effectively exploiting the available computing resources remains a major challenge. In addition to fundamental application scalability characteristics, application and system peculiarities often only manifest at extreme scales, requiring highly scalable performance measurement and analysis tools that are convenient to incorporate in application development and tuning activities. We present our experiences with a multigrid solver benchmark and state-of-the-art real-world applications for numerical weather prediction and computational fluid dynamics, on three quite different multi-thousand-processor supercomputer systems – Cray XT3/4, MareNostrum & Blue Gene/L – using the newly-developed SCALASCA toolset to quantify and isolate a range of significant performance issues.


Author(s):  
Haowen Fang ◽  
Amar Shrestha ◽  
Ziyi Zhao ◽  
Qinru Qiu

The recently discovered spatial-temporal information processing capability of bio-inspired Spiking neural networks (SNN) has enabled some interesting models and applications. However designing large-scale and high-performance model is yet a challenge due to the lack of robust training algorithms. A bio-plausible SNN model with spatial-temporal property is a complex dynamic system. Synapses and neurons behave as filters capable of preserving temporal information. As such neuron dynamics and filter effects are ignored in existing training algorithms, the SNN downgrades into a memoryless system and loses the ability of temporal signal processing. Furthermore, spike timing plays an important role in information representation, but conventional rate-based spike coding models only consider spike trains statistically, and discard information carried by its temporal structures. To address the above issues, and exploit the temporal dynamics of SNNs, we formulate SNN as a network of infinite impulse response (IIR) filters with neuron nonlinearity. We proposed a training algorithm that is capable to learn spatial-temporal patterns by searching for the optimal synapse filter kernels and weights. The proposed model and training algorithm are applied to construct associative memories and classifiers for synthetic and public datasets including MNIST, NMNIST, DVS 128 etc. Their accuracy outperforms state-of-the-art approaches.


Some problems associated with numerical weather prediction are discussed. For an adequate description of the growth of frontal depressions a grid length at least as small as 100 km is required. Boundary conditions must be chosen to suppress the generation of short gravity wave components, and care must be taken to avoid inconsistences in the initial data which may give rise to spurious large scale oscillations.


Sign in / Sign up

Export Citation Format

Share Document