High performance mapping for massively parallel hierarchical structures

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.

Download Full-text

TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e92.d.1062 ◽

2009 ◽

Vol E92-D (5) ◽

pp. 1062-1078 ◽

Cited By ~ 14

Author(s):

M.M. Hafizur RAHMAN ◽

Yasushi INOGUCHI ◽

Yukinori SATO ◽

Susumu HORIGUCHI

Keyword(s):

High Performance ◽

Interconnection Network ◽

Parallel Computers ◽

Massively Parallel ◽

Massively Parallel Computers

Download Full-text

Urchin-like CoO–C micro/nano hierarchical structures as high performance anode materials for Li-ion batteries

RSC Advances ◽

10.1039/c6ra26937k ◽

2017 ◽

Vol 7 (5) ◽

pp. 2637-2643 ◽

Cited By ~ 11

Author(s):

Lili Liu ◽

Lihui Mou ◽

Jia Yu ◽

Shimou Chen

Keyword(s):

Hierarchical Structure ◽

Anode Materials ◽

High Performance ◽

Hierarchical Structures ◽

Li Ion Batteries ◽

Carbon Coated ◽

Li Ion ◽

Li Storage

Urchin-like microspheres consisting of radial carbon-coated cobalt monoxide nanowires are designed, to fabricate a micro/nano hierarchical structure for efficient Li-storage.

Download Full-text

Efficient parallelization of SPH algorithm on modern multi-core CPUs and massively parallel GPUs

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500549 ◽

2021 ◽

pp. 2150054

Author(s):

Pravin Jagtap ◽

Rupesh Nasre ◽

V. S. Sanapala ◽

B. S. V. Patnaik

Keyword(s):

High Performance ◽

Performance Metrics ◽

Computational Simulation ◽

Massively Parallel ◽

Benchmark Problems ◽

Processing Unit ◽

Central Processing ◽

Neighbor Search ◽

Computational Performance ◽

Sph Algorithm

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.

Download Full-text

Comparing High-Performance Computing Techniques for Modeling Structural Impact on Battery Cells

Volume 6A: Energy ◽

10.1115/imece2014-39271 ◽

2014 ◽

Author(s):

Mehdi Gilaki ◽

Ilya Avdeev

Keyword(s):

Finite Element ◽

Parallel Processing ◽

High Performance Computing ◽

High Performance ◽

Strain Curve ◽

Thin Layers ◽

Massively Parallel ◽

Structural Impact ◽

Performance Computing ◽

Battery Cells

In this study, we have investigated feasibility of using commercial explicit finite element code LS-DYNA on massively parallel super-computing cluster for accurate modeling of structural impact on battery cells. Physical and numerical lateral impact tests have been conducted on cylindrical cells using a flat rigid drop cart in a custom-built drop test apparatus. The main component of cylindrical cell, jellyroll, is a layered spiral structure which consists of thin layers of electrodes and separator. Two numerical approaches were considered: (1) homogenized model of the cell and (2) heterogeneous (full) 3-D cell model. In the first approach, the jellyroll was considered as a homogeneous material with an effective stress-strain curve obtained through experiments. In the second model, individual layers of anode, cathode and separator were accounted for in the model, leading to extremely complex and computationally expensive finite element model. To overcome limitations of desktop computers, high-performance computing (HPC) techniques on a HPC cluster were needed in order to get the results of transient simulations in a reasonable solution time. We have compared two HPC methods used for this model is shared memory parallel processing (SMP) and massively parallel processing (MPP). Both the homogeneous and the heterogeneous models were considered for parallel simulations utilizing different number of computational nodes and cores and the performance of these models was compared. This work brings us one step closer to accurate modeling of structural impact on the entire battery pack that consists of thousands of cells.

Download Full-text

Fracture Toughness of Biological Composites With Multilevel Structural Hierarchy

Journal of Applied Mechanics ◽

10.1115/1.4046845 ◽

2020 ◽

Vol 87 (7) ◽

Author(s):

Fan Wang ◽

Kui Liu ◽

Dechang Li ◽

Baohua Ji

Keyword(s):

Fracture Toughness ◽

High Performance ◽

Nonlinear Deformation ◽

Crack Path ◽

Hierarchical Structures ◽

Soft Matrix ◽

Structural Hierarchy ◽

Biological Composites ◽

Underlying Mechanisms ◽

Dynamics Simulations

Abstract It is well known that the biological composites have superior mechanical properties due to their exquisite multilevel structural hierarchy. However, the underlying mechanisms of the roles of this hierarchical design in the toughness of the biocomposites remain elusive. In this paper, the deformation and fracture mechanism of multilevel hierarchical structures are explored by molecular dynamics simulations. The effects of the multilevel design on fracture toughness, nonlinear deformation of soft matrix, and the crack path pattern were quantitatively analyzed. We showed that the toughness of composites is closely associated with the pattern of the crack path and the nonlinear deformation of the matrix. Additionally, the structure with a higher level of hierarchy exhibit higher toughness, which is less sensitive to the geometrical change of inclusions, such as the aspect ratio and the staggered ratio. This work provides more theoretical evidence of the toughening mechanism of the multilevel hierarchy in fracture toughness of biological materials via new methods of analyzing fracture of multilevel structures and provides guidelines for the design of high-performance engineering materials.

Download Full-text

Optimizing Investment Strategies with the Reconfigurable Hardware Platform RIVYERA

International Journal of Reconfigurable Computing ◽

10.1155/2012/646984 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Christoph Starke ◽

Vasco Grossmann ◽

Lars Wienbrandt ◽

Sven Koschnicke ◽

John Carstens ◽

...

Keyword(s):

Financial Markets ◽

High Performance ◽

Investment Strategy ◽

Reconfigurable Hardware ◽

Massively Parallel ◽

Processing Element ◽

Investment Strategies ◽

Hardware Platform ◽

Time Periods ◽

Hardware Structure

The hardware structure of a processing element used for optimization of an investment strategy for financial markets is presented. It is shown how this processing element can be multiply implemented on the massively parallel FPGA-machine RIVYERA. This leads to a speedup of a factor of about 17,000 in comparison to one single high-performance PC, while saving more than 99% of the consumed energy. Furthermore, it is shown for a special security and different time periods that the optimized investment strategy delivers an outperformance between 2 and 14 percent in relation to a buy and hold strategy.

Download Full-text

The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

Scientific Programming ◽

10.1155/1995/278064 ◽

1995 ◽

Vol 4 (1) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

Matthew O'keefe ◽

Terence Parr ◽

B. Kevin Edgar ◽

Steve Anderson ◽

Paul Woodward ◽

...

Keyword(s):

High Performance ◽

Parallel Machines ◽

Parallel Processors ◽

Massively Parallel ◽

Automatic Translation ◽

Efficient Code ◽

Self Similar ◽

User Friendly ◽

Application Codes ◽

Fortran 77

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

Download Full-text

Hierarchical NiSe2 spheres composed of tiny nanoparticles for high performance asymmetric supercapacitors

CrystEngComm ◽

10.1039/c8ce01805g ◽

2019 ◽

Vol 21 (6) ◽

pp. 994-1000 ◽

Cited By ~ 15

Author(s):

Jiaqin Yang ◽

Zhiying Sun ◽

Jiahui Wang ◽

Jing Zhang ◽

Yujiao Qin ◽

...

Keyword(s):

High Performance ◽

Electrode Materials ◽

Hierarchical Structures ◽

Superior Performance ◽

Asymmetric Supercapacitors

To achieve superior performance of electrode materials, the design of rational and advantageous hierarchical structures has been confirmed as an effective and feasible approach.

Download Full-text

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2017.94 ◽

2017 ◽

Cited By ~ 10

Author(s):

Benjamin Klenk ◽

Holger Froening ◽

Hans Eberle ◽

Larry Dennison

Keyword(s):

Message Passing ◽

High Performance ◽

Massively Parallel

Download Full-text