High performance mapping for massively parallel hierarchical structures

Author(s):  
S.G. Ziavras
Author(s):  
Martin Schreiber ◽  
Pedro S Peixoto ◽  
Terry Haut ◽  
Beth Wingate

This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory partial differential equations, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper, we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of high-performance computing resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3× for spectral methods and 1503.0× for finite-difference methods with the parallelization-in-time approach. A developed and calibrated performance model gives the scalability limitations a priori for this new approach and allows us to extrapolate the performance of the method towards large-scale systems. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.


RSC Advances ◽  
2017 ◽  
Vol 7 (5) ◽  
pp. 2637-2643 ◽  
Author(s):  
Lili Liu ◽  
Lihui Mou ◽  
Jia Yu ◽  
Shimou Chen

Urchin-like microspheres consisting of radial carbon-coated cobalt monoxide nanowires are designed, to fabricate a micro/nano hierarchical structure for efficient Li-storage.


Author(s):  
Pravin Jagtap ◽  
Rupesh Nasre ◽  
V. S. Sanapala ◽  
B. S. V. Patnaik

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.


2014 ◽  
Author(s):  
Mehdi Gilaki ◽  
Ilya Avdeev

In this study, we have investigated feasibility of using commercial explicit finite element code LS-DYNA on massively parallel super-computing cluster for accurate modeling of structural impact on battery cells. Physical and numerical lateral impact tests have been conducted on cylindrical cells using a flat rigid drop cart in a custom-built drop test apparatus. The main component of cylindrical cell, jellyroll, is a layered spiral structure which consists of thin layers of electrodes and separator. Two numerical approaches were considered: (1) homogenized model of the cell and (2) heterogeneous (full) 3-D cell model. In the first approach, the jellyroll was considered as a homogeneous material with an effective stress-strain curve obtained through experiments. In the second model, individual layers of anode, cathode and separator were accounted for in the model, leading to extremely complex and computationally expensive finite element model. To overcome limitations of desktop computers, high-performance computing (HPC) techniques on a HPC cluster were needed in order to get the results of transient simulations in a reasonable solution time. We have compared two HPC methods used for this model is shared memory parallel processing (SMP) and massively parallel processing (MPP). Both the homogeneous and the heterogeneous models were considered for parallel simulations utilizing different number of computational nodes and cores and the performance of these models was compared. This work brings us one step closer to accurate modeling of structural impact on the entire battery pack that consists of thousands of cells.


2020 ◽  
Vol 87 (7) ◽  
Author(s):  
Fan Wang ◽  
Kui Liu ◽  
Dechang Li ◽  
Baohua Ji

Abstract It is well known that the biological composites have superior mechanical properties due to their exquisite multilevel structural hierarchy. However, the underlying mechanisms of the roles of this hierarchical design in the toughness of the biocomposites remain elusive. In this paper, the deformation and fracture mechanism of multilevel hierarchical structures are explored by molecular dynamics simulations. The effects of the multilevel design on fracture toughness, nonlinear deformation of soft matrix, and the crack path pattern were quantitatively analyzed. We showed that the toughness of composites is closely associated with the pattern of the crack path and the nonlinear deformation of the matrix. Additionally, the structure with a higher level of hierarchy exhibit higher toughness, which is less sensitive to the geometrical change of inclusions, such as the aspect ratio and the staggered ratio. This work provides more theoretical evidence of the toughening mechanism of the multilevel hierarchy in fracture toughness of biological materials via new methods of analyzing fracture of multilevel structures and provides guidelines for the design of high-performance engineering materials.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Christoph Starke ◽  
Vasco Grossmann ◽  
Lars Wienbrandt ◽  
Sven Koschnicke ◽  
John Carstens ◽  
...  

The hardware structure of a processing element used for optimization of an investment strategy for financial markets is presented. It is shown how this processing element can be multiply implemented on the massively parallel FPGA-machine RIVYERA. This leads to a speedup of a factor of about 17,000 in comparison to one single high-performance PC, while saving more than 99% of the consumed energy. Furthermore, it is shown for a special security and different time periods that the optimized investment strategy delivers an outperformance between 2 and 14 percent in relation to a buy and hold strategy.


1995 ◽  
Vol 4 (1) ◽  
pp. 1-21 ◽  
Author(s):  
Matthew O'keefe ◽  
Terence Parr ◽  
B. Kevin Edgar ◽  
Steve Anderson ◽  
Paul Woodward ◽  
...  

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.


CrystEngComm ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 994-1000 ◽  
Author(s):  
Jiaqin Yang ◽  
Zhiying Sun ◽  
Jiahui Wang ◽  
Jing Zhang ◽  
Yujiao Qin ◽  
...  

To achieve superior performance of electrode materials, the design of rational and advantageous hierarchical structures has been confirmed as an effective and feasible approach.


Sign in / Sign up

Export Citation Format

Share Document