Custom Built of Smart Computing Platform for Supporting Optimization Methods and Artificial Intelligence Research

Author(s):  
Indar Sugiarto ◽  
Doddy Prayogo ◽  
Henry Palit ◽  
Felix Pasila ◽  
Resmana Lim ◽  
...  

This paper describes a prototype of a computing platform dedicated to artificial intelligence explorations. The platform, dubbed as PakCarik, is essentially a high throughput computing platform with GPU (graphics processing units) acceleration. PakCarik is an Indonesian acronym for Platform Komputasi Cerdas Ramah Industri Kreatif, which can be translated as “Creative Industry friendly Intelligence Computing Platform”. This platform aims to provide complete development and production environment for AI-based projects, especially to those that rely on machine learning and multiobjective optimization paradigms. The method for constructing PakCarik was based on a computer hardware assembling technique that uses commercial off-the-shelf hardware and was tested on several AI-related application scenarios. The testing methods in this experiment include: high-performance lapack (HPL) benchmarking, message passing interface (MPI) benchmarking, and TensorFlow (TF) benchmarking. From the experiment, the authors can observe that PakCarik's performance is quite similar to the commonly used cloud computing services such as Google Compute Engine and Amazon EC2, even though falls a bit behind the dedicated AI platform such as Nvidia DGX-1 used in the benchmarking experiment. Its maximum computing performance was measured at 326 Gflops. The authors conclude that PakCarik is ready to be deployed in real-world applications and it can be made even more powerful by adding more GPU cards in it.

Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Author(s):  
Héctor Martínez ◽  
Sergio Barrachina ◽  
Maribel Castillo ◽  
Joaquín Tárraga ◽  
Ignacio Medina ◽  
...  

The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target multicore processors, only a few can also exploit graphics processing units, and a much smaller set will run in clusters equipped with any of these multi-threaded architecture technologies. Furthermore, the examples that can be used on clusters today are all strongly coupled with a particular aligner. In this paper we introduce an alignment framework that can be leveraged to coordinately run any “single-node” aligner, taking advantage of the resources of a cluster without having to modify any portion of the original software. The key to our transparent migration lies in hiding the complexity associated with the multi-node execution (such as coordinating the processes running in the cluster nodes) inside the generic-aligner framework. Moreover, following the design and operation in our Message Passing Interface (MPI) version of HPG Aligner RNA BWT, we organize the framework into two stages in order to be able to execute different aligners in each one of them. With this configuration, for example, the first stage can ideally apply a fast aligner to accelerate the process, while the second one can be tuned to act as a refinement stage that further improves the global alignment process with little cost.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


2021 ◽  
Author(s):  
Jiecheng Zhang ◽  
George Moridis ◽  
Thomas Blasingame

Abstract The Reservoir GeoMechanics Simulator (RGMS), a geomechanics simulator based on the finite element method and parallelized using the Message Passing Interface (MPI), is developed in this work to model the stresses and deformations in subsurface systems. RGMS can be used stand-alone, or coupled with flow and transport models. pT+H V1.5, a parallel MPI-based version of the serial T+H V1.5 code that describes mass and heat flow in hydrate-bearing porous media, is also developed. Using the fixed-stress split iterative scheme, RGMS is coupled with the pT+H V1.5 to investigate the geomechanical responses associated with gas production from hydrate accumulations. The code development and testing process involve evaluation of the parallelization and of the coupling method, as well as verification and validation of the results. The parallel performance of the codes is tested on the Ada Linux cluster of the Texas A&M High Performance Research Computing using up to 512 processors, and on a Mac Pro computer with 12 processors. The investigated problems are: Group 1: Geomechanical problems solved by RGMS in 2D Cartesian and cylindrical domains and a 3D problem, involving 4x106 and 3.375 x106 elements, respectively; Group 2: Realistic problems of gas production from hydrates using pT+H V1.5 in 2D and 3D systems with 2.45x105 and 3.6 x106 elements, respectively; Group 3: The 3D problem in Group 2 solved with the coupled RGMS-pT+H V1.5 simulator, fully accounting for geomechanics. Two domain partitioning options are investigated on the Ada Linux cluster and the Mac Pro, and the code parallel performance is monitored. On the Ada Linux cluster using 512 processors, the simulation speedups (a) of RGMS are 218.89, 188.13, and 284.70 in the Group 1 problems, (b) of pT+H V1.5 are 174.25 and 341.67 in the Group 2 cases, and (c) of the coupled simulators is 331.80 in Group 3. The results produced in this work show the necessity of using full geomechanics simulators in marine hydrate-related studies because of the associated pronounced geomechanical effects on production and displacements and (b) the effectiveness of the parallel simulators developed in this study, which can be the only realistic option in these complex simulations of large multi-dimensional domains.


Author(s):  
Vladimir Stegailov ◽  
Ekaterina Dlinnova ◽  
Timur Ismagilov ◽  
Mikhail Khalilov ◽  
Nikolay Kondratyuk ◽  
...  

In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes connected by a low-latency high-bandwidth Angara interconnect with torus topology. This supercomputer is aimed at cost-effective classical molecular dynamics calculations. Desmos serves as a test bed for the Angara interconnect that supports 3-D and 4-D torus network topologies and verifies its ability to unite massively parallel programming systems speeding-up effectively message-passing interface (MPI)-based applications. We describe the Angara interconnect presenting typical MPI benchmarks. Desmos benchmarks results for GROMACS, LAMMPS, VASP and CP2K are compared with the data for other high-performance computing (HPC) systems. Also, we consider the job scheduling statistics for several months of Desmos deployment.


Author(s):  
Roberto Porcù ◽  
Edie Miglio ◽  
Nicola Parolini ◽  
Mattia Penati ◽  
Noemi Vergopolan

Helicopters can experience brownout when flying close to a dusty surface. The uplifting of dust in the air can remarkably restrict the pilot’s visibility area. Consequently, a brownout can disorient the pilot and lead to the helicopter collision against the ground. Given its risks, brownout has become a high-priority problem for civil and military operations. Proper helicopter design is thus critical, as it has a strong influence over the shape and density of the cloud of dust that forms when brownout occurs. A way forward to improve aircraft design against brownout is the use of particle simulations. For simulations to be accurate and comparable to the real phenomenon, billions of particles are required. However, using a large number of particles, serial simulations can be slow and too computationally expensive to be performed. In this work, we investigate an message passing interface (MPI) + graphics processing unit (multi-GPU) approach to simulate brownout. In specific, we use a semi-implicit Euler method to consider the particle dynamics in a Lagrangian way, and we adopt a precomputed aerodynamic field. Here, we do not include particle–particle collisions in the model; this allows for independent trajectories and effective model parallelization. To support our methodology, we provide a speedup analysis of the parallelization concerning the serial and pure-MPI simulations. The results show (i) very high speedups of the MPI + multi-GPU implementation with respect to the serial and pure-MPI ones, (ii) excellent weak and strong scalability properties of the implemented time-integration algorithm, and (iii) the possibility to run realistic simulations of brownout with billions of particles at a relatively small computational cost. This work paves the way toward more realistic brownout simulations, and it highlights the potential of high-performance computing for aiding and advancing aircraft design for brownout mitigation.


2012 ◽  
Vol 433-440 ◽  
pp. 2892-2898
Author(s):  
Guang Lei Fei ◽  
Jian Guo Ning ◽  
Tian Bao Ma

Parallel computing has been applied in many fields, and the parallel computing platform system, PC cluster based on MPI (Message Passing Interface) library under Linux operating system is a cost-effectiveness approach to parallel compute. In this paper, the key algorithm of parallel program of explosion and impact is presented. The techniques of solving data dependence and realizing communication between subdomain are proposed. From the test of program, the portability of MMIC-3D parallel program is satisfied, and compared with the single computer, PC cluster can improve the calculation speed and enlarge the scale greatly.


Sign in / Sign up

Export Citation Format

Share Document