The Taiwan Central Weather Bureau Regional Spectral Model for Seasonal Prediction: Multiparallel Implementation and Preliminary Results

Abstract A regional spectral model (RSM) is developed at the Taiwan Central Weather Bureau (CWB). It is based on the same model structure, dynamics, and physics of the CWB global spectral model (GSM) and the perturbation concept of the National Centers for Environmental Prediction (NCEP) RSM for lateral boundary treatment. The advantages of this new regional model include minimization of possible inconsistency between GSM and RSM through lateral boundary influence and reduction of resources used to manage and maintain the model. One-dimensional decomposition is utilized to slice the model into subdomains to run on a massive parallel-processor machine. The Message-Passing Interface (MPI) is adopted to communicate among each subdomain. The computational dependency, such as the summation in spectral transformation, is a restriction for the decomposition, so that the reproducibility using different numbers of processors is achieved. The performance in terms of wall-clock time follows the theoretical curve of parallelization. It can reach 95% parallelization by “homemade” PC Linux cluster, and 90% by CWB Fujitsu VPP5000. One case is selected to perform 2-month integration in a simulation mode and a forecast mode. The results indicate a reasonable monsoon frontal evolution as compared with analysis, and it has similar or less root-mean-square error (rmse) as compared to that of CWB GSM. The same run with NCEP RSM nested into CWB GSM shows a larger rmse than CWB RSM; it demonstrates the advantage of having the same model structure, dynamics, and physics between CWB GSM and CWB RSM.

Download Full-text

Experimental Weekly to Seasonal U.S. Forecasts with the Regional Spectral Model

Bulletin of the American Meteorological Society ◽

10.1175/bams-85-12-1887 ◽

2004 ◽

Vol 85 (12) ◽

pp. 1887-1902 ◽

Cited By ~ 15

Author(s):

J. Roads

Keyword(s):

Spectral Model ◽

Planetary Boundary Layer Height ◽

Forecast System ◽

Global Spectral Model ◽

Layer Height ◽

Boundary Layer Height ◽

Regional Spectral Model ◽

Environmental Prediction ◽

Climate Prediction Center ◽

Regional Forecast

Since 27 September 1997, the Scripps Experimental Climate Prediction Center (ECPC) has been making near real-time experimental global and regional dynamical forecasts with the National Centers for Environmental Prediction (NCEP) global spectral model (GSM) and the corresponding regional spectral model (RSM), which is based on the GSM, but which provides higher-resolution simulations and forecasts for limited regions. The global and regional forecast skill of the GSM was previously described in several papers. The purpose of this paper is to describe the RSM-based U.S. regional forecast system, various biases and errors in these regional U.S. forecasts, as well as the significant skill of the of temperature, precipitation, soil moisture, relative humidity, wind speed, and planetary boundary layer height forecasts at weekly to seasonal time scales. The skill of these RSM forecasts is comparable to the skill of the GSM forecasts.

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

An Implementation of Single Precision Fast Spherical Harmonic Transform in Yin‐He Global Spectral Model

Quarterly Journal of the Royal Meteorological Society ◽

10.1002/qj.4026 ◽

2021 ◽

Author(s):

Fukang Yin ◽

Junqiang Song ◽

Jianping Wu ◽

Weimin Zhang

Keyword(s):

Spherical Harmonic ◽

Spectral Model ◽

Single Precision ◽

Global Spectral Model

Download Full-text

Distributed Singular Value Decomposition Method for Fast Data Processing in Recommendation Systems

Energies ◽

10.3390/en14082284 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2284

Author(s):

Krzysztof Przystupa ◽

Mykola Beshley ◽

Olena Hordiichuk-Bublivska ◽

Marian Kyryk ◽

Halyna Beshley ◽

...

Keyword(s):

Distributed Systems ◽

Singular Value Decomposition ◽

Data Processing ◽

Message Passing ◽

Message Passing Interface ◽

Recommendation Systems ◽

Singular Value ◽

Singular Value Decomposition Method ◽

Value Decomposition ◽

Svd Method

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.

Download Full-text

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing ◽

10.1016/0167-8191(96)00024-5 ◽

1996 ◽

Vol 22 (6) ◽

pp. 789-828 ◽

Cited By ~ 1155

Author(s):

William Gropp ◽

Ewing Lusk ◽

Nathan Doss ◽

Anthony Skjellum

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

Based on Numerical Simulation of High-Performance Parallel Machine Muffler Experimental Calibration

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.1645 ◽

2013 ◽

Vol 718-720 ◽

pp. 1645-1650

Author(s):

Gen Yin Cheng ◽

Sheng Chen Yu ◽

Zhi Yong Wei ◽

Shao Jie Chen ◽

You Cheng

Keyword(s):

Numerical Simulation ◽

Finite Element ◽

Boundary Element ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Parallel Machine ◽

Simulation Software ◽

Experimental Calibration ◽

The Cost

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Message-Passing-Interface MPI Parallelization of Iteratively Coupled Fluid Flow and Geomechanics Codes for the Simulation of System Behavior in Hydrate-Bearing Geologic Media

10.2118/206161-ms ◽

2021 ◽

Author(s):

Jiecheng Zhang ◽

George Moridis ◽

Thomas Blasingame

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Gas Production ◽

Parallel Performance ◽

Group 3 ◽

Linux Cluster ◽

Group 2 ◽

3D Problem ◽

Group 1

Abstract The Reservoir GeoMechanics Simulator (RGMS), a geomechanics simulator based on the finite element method and parallelized using the Message Passing Interface (MPI), is developed in this work to model the stresses and deformations in subsurface systems. RGMS can be used stand-alone, or coupled with flow and transport models. pT+H V1.5, a parallel MPI-based version of the serial T+H V1.5 code that describes mass and heat flow in hydrate-bearing porous media, is also developed. Using the fixed-stress split iterative scheme, RGMS is coupled with the pT+H V1.5 to investigate the geomechanical responses associated with gas production from hydrate accumulations. The code development and testing process involve evaluation of the parallelization and of the coupling method, as well as verification and validation of the results. The parallel performance of the codes is tested on the Ada Linux cluster of the Texas A&M High Performance Research Computing using up to 512 processors, and on a Mac Pro computer with 12 processors. The investigated problems are: Group 1: Geomechanical problems solved by RGMS in 2D Cartesian and cylindrical domains and a 3D problem, involving 4x106 and 3.375 x106 elements, respectively; Group 2: Realistic problems of gas production from hydrates using pT+H V1.5 in 2D and 3D systems with 2.45x105 and 3.6 x106 elements, respectively; Group 3: The 3D problem in Group 2 solved with the coupled RGMS-pT+H V1.5 simulator, fully accounting for geomechanics. Two domain partitioning options are investigated on the Ada Linux cluster and the Mac Pro, and the code parallel performance is monitored. On the Ada Linux cluster using 512 processors, the simulation speedups (a) of RGMS are 218.89, 188.13, and 284.70 in the Group 1 problems, (b) of pT+H V1.5 are 174.25 and 341.67 in the Group 2 cases, and (c) of the coupled simulators is 331.80 in Group 3. The results produced in this work show the necessity of using full geomechanics simulators in marine hydrate-related studies because of the associated pronounced geomechanical effects on production and displacements and (b) the effectiveness of the parallel simulators developed in this study, which can be the only realistic option in these complex simulations of large multi-dimensional domains.

Download Full-text

Custom Built of Smart Computing Platform for Supporting Optimization Methods and Artificial Intelligence Research

Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences ◽

10.53560/ppasa(58-sp1)733 ◽

2021 ◽

Vol 58 (S) ◽

pp. 59-64

Author(s):

Indar Sugiarto ◽

Doddy Prayogo ◽

Henry Palit ◽

Felix Pasila ◽

Resmana Lim ◽

...

Keyword(s):

Artificial Intelligence ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Message Passing Interface ◽

Optimization Methods ◽

Computer Hardware ◽

Production Environment ◽

Computing Platform ◽

Commercial Off The Shelf

This paper describes a prototype of a computing platform dedicated to artificial intelligence explorations. The platform, dubbed as PakCarik, is essentially a high throughput computing platform with GPU (graphics processing units) acceleration. PakCarik is an Indonesian acronym for Platform Komputasi Cerdas Ramah Industri Kreatif, which can be translated as “Creative Industry friendly Intelligence Computing Platform”. This platform aims to provide complete development and production environment for AI-based projects, especially to those that rely on machine learning and multiobjective optimization paradigms. The method for constructing PakCarik was based on a computer hardware assembling technique that uses commercial off-the-shelf hardware and was tested on several AI-related application scenarios. The testing methods in this experiment include: high-performance lapack (HPL) benchmarking, message passing interface (MPI) benchmarking, and TensorFlow (TF) benchmarking. From the experiment, the authors can observe that PakCarik's performance is quite similar to the commonly used cloud computing services such as Google Compute Engine and Amazon EC2, even though falls a bit behind the dedicated AI platform such as Nvidia DGX-1 used in the benchmarking experiment. Its maximum computing performance was measured at 326 Gflops. The authors conclude that PakCarik is ready to be deployed in real-world applications and it can be made even more powerful by adding more GPU cards in it.

Download Full-text