Parallel approach of Schrödinger-based quantum corrections for ultrascaled semiconductor devices

Journal of Computational Electronics ◽

10.1007/s10825-021-01823-3 ◽

2021 ◽

Author(s):

Gabriel Espiñeira ◽

Antonio J. García-Loureiro ◽

Natalia Seoane

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Time Frame ◽

Quantum Corrections ◽

Computational Time ◽

Perfect Agreement ◽

New Approach ◽

Classical Simulation ◽

Reasonable Time Frame ◽

Variability Study

AbstractIn the current technology node, purely classical numerical simulators lack the precision needed to obtain valid results. At the same time, the simulation of fully quantum models can be a cumbersome task in certain studies such as device variability analysis, since a single simulation can take up to weeks to compute and hundreds of device configurations need to be analyzed to obtain statistically significative results. A good compromise between fast and accurate results is to add corrections to the classical simulation that are able to reproduce the quantum nature of matter. In this context, we present a new approach of Schrödinger equation-based quantum corrections. We have implemented it using Message Passing Interface in our in-house built semiconductor simulation framework called VENDES, capable of running in distributed systems that allow for more accurate results in a reasonable time frame. Using a 12-nm-gate-length gate-all-around nanowire FET (GAA NW FET) as a benchmark device, the new implementation shows an almost perfect agreement in the output data with less than a 2% difference between the cases using 1 and 16 processes. Also, a reduction of up to 98% in the computational time has been found comparing the sequential and the 16 process simulation. For a reasonably dense mesh of 150k nodes, a variability study of 300 individual simulations can be now performed with VENDES in approximately 2.5 days instead of an estimated sequential execution of 137 days.

Download Full-text

Parallel Approach of Schr¨Odinger Based Quantum Corrections for Ultrascaled Semiconductor Devices

10.21203/rs.3.rs-787168/v1 ◽

2021 ◽

Author(s):

G. Espiñeira ◽

A. J. Garc´ıa-Loureiro ◽

N. Seoane

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Time Frame ◽

Quantum Corrections ◽

Computational Time ◽

Perfect Agreement ◽

New Approach ◽

Classical Simulation ◽

Reasonable Time Frame ◽

Variability Study

Abstract In the current technology node, purely classical numerical simulators lack the precision needed to obtain valid results. At the same time, the simulation of fully quantum models can be a cumbersome task in certain studies such as device variability analysis, since a single simulation can take up to weeks to compute and hundreds of device configurations need to be analyzed to obtain statistically significative results. A good compromise between fast and accurate results is to add corrections to the classical simulation that are able to reproduce the quantum nature of matter. In this context, we present a new approach of Schrödinger equation-based quantum corrections. We have implemented it using Message Passing Interface (MPI) in our in-house built semiconductor simulation framework called VENDES, capable of running in distributed systems that allow for more accurate results in a reasonable time frame. Using a 12 nm gate length Gate-AllAround Nanowire FET (GAA NW FET) as an application example, the new implementation shows an almost perfect agreement in the output data with less than a 2% difference between the cases using 1 and 16 processes. Also, a reduction of up to 98% in the computational time has been found comparing the sequential and the 16 process simulation. For a reasonably dense mesh of 150k nodes, a variability study of 300 individual simulations, can be now performed with VENDES in approximately 2.5 days instead of an estimated sequential execution of 137 days.

Download Full-text

Porting marine ecosystem model spin-up using transport matrices to GPUs

Geoscientific Model Development ◽

10.5194/gmd-6-17-2013 ◽

2013 ◽

Vol 6 (1) ◽

pp. 17-28 ◽

Cited By ~ 8

Author(s):

E. Siewertsen ◽

J. Piwonski ◽

T. Slawig

Keyword(s):

Ocean Circulation ◽

Message Passing ◽

Graphics Processing Units ◽

Message Passing Interface ◽

Marine Ecosystem ◽

Computational Effort ◽

Computational Time ◽

Biogeochemical Model ◽

Ecosystem Models ◽

Marine Ecosystem Model

Abstract. We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.

Download Full-text

Hurricane Irma Simulation at South Florida Using the Parallel CEST Model

Frontiers in Climate ◽

10.3389/fclim.2021.609688 ◽

2021 ◽

Vol 3 ◽

Author(s):

Yuepeng Li ◽

Qiang Chen ◽

Dave M. Kelly ◽

Keqi Zhang

Keyword(s):

Storm Surge ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Field Measurements ◽

South Florida ◽

Computational Time ◽

Linear Speedup ◽

Lagrangian Advection ◽

Hurricane Irma

In this study, a parallel extension of the Coastal and Estuarine Storm Tide (CEST) model is developed and applied to simulate the storm surge tide at South Florida induced by hurricane Irma occurred in 2017. An improvement is also made to the existing advection algorithm in CEST. This is achieved through the introduction of high-order, monotone Semi-Lagrangian advection. Distributed memory parallelization is developed via the Message Passing Interface (MPI) library. The parallel CEST model can therefore be run efficiently on machines ranging from multicore laptops to massively High Performance Computing (HPC) system. The principle advantage of being able to run the CEST model on multiple cores is that relatively low run-time is possible for real world storm surge simulations on grids with high resolution, especially in the locality where the hurricane makes landfall. The computational time is critical for storm surge model forecast to finish simulations in 30 min, and results are available to users before the arrival of the next advisory. In this study, simulation of hurricane Irma induced storm surge was approximately 22 min for 4 day simulation, with the results validated by field measurements. Further efficiency analysis reveals that the parallel CEST model can achieve linear speedup when the number of processors is not very large.

Download Full-text

Machine learning for parameter auto-tuning in molecular dynamics simulations: Efficient dynamics of ions near polarizable nanoparticles

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019899457 ◽

2020 ◽

Vol 34 (3) ◽

pp. 357-374 ◽

Cited By ~ 1

Author(s):

JCS Kadupitiya ◽

Geoffrey C Fox ◽

Vikram Jadhao

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Message Passing ◽

Message Passing Interface ◽

Md Simulations ◽

Computing Methods ◽

Coarse Grained ◽

Computational Time ◽

Optimization Parameters ◽

Auto Tuning

Simulating the dynamics of ions near polarizable nanoparticles (NPs) using coarse-grained models is extremely challenging due to the need to solve the Poisson equation at every simulation timestep. Recently, a molecular dynamics (MD) method based on a dynamical optimization framework bypassed this obstacle by representing the polarization charge density as virtual dynamic variables and evolving them in parallel with the physical dynamics of ions. We highlight the computational gains accessible with the integration of machine learning (ML) methods for parameter prediction in MD simulations by demonstrating how they were realized in MD simulations of ions near polarizable NPs. An artificial neural network–based regression model was integrated with MD simulation and predicted the optimal simulation timestep and optimization parameters characterizing the virtual system with 94.3% success. The ML-enabled auto-tuning of parameters generated accurate dynamics of ions for ≈ 10 million steps while improving the stability of the simulation by over an order of magnitude. The integration of ML-enhanced framework with hybrid Open Multi-Processing / Message Passing Interface (OpenMP/MPI) parallelization techniques reduced the computational time of simulating systems with thousands of ions and induced charges from thousands of hours to tens of hours, yielding a maximum speedup of ≈ 3 from ML-only acceleration and a maximum speedup of ≈ 600 from the combination of ML and parallel computing methods. Extraction of ionic structure in concentrated electrolytes near oil–water emulsions demonstrates the success of the method. The approach can be generalized to select optimal parameters in other MD applications and energy minimization problems.

Download Full-text

Parallel Implementation of the Deterministic Ensemble Kalman Filter for Reservoir History Matching

Processes ◽

10.3390/pr9111980 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1980

Author(s):

Lihua Shen ◽

Hui Liu ◽

Zhangxin Chen

Keyword(s):

Kalman Filter ◽

Ensemble Kalman Filter ◽

Message Passing ◽

History Matching ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computational Time ◽

Ensemble Size ◽

Three Phase ◽

Reservoir History Matching

In this paper, the deterministic ensemble Kalman filter is implemented with a parallel technique of the message passing interface based on our in-house black oil simulator. The implementation is separated into two cases: (1) the ensemble size is greater than the processor number and (2) the ensemble size is smaller than or equal to the processor number. Numerical experiments for estimations of three-phase relative permeabilities represented by power-law models with both known endpoints and unknown endpoints are presented. It is shown that with known endpoints, good estimations can be obtained. With unknown endpoints, good estimations can still be obtained using more observations and a larger ensemble size. Computational time is reported to show that the run time is greatly reduced with more CPU cores. The MPI speedup is over 70% for a small ensemble size and 77% for a large ensemble size with up to 640 CPU cores.

Download Full-text

Porting marine ecosystem model spin-up using transport matrices to GPUs

Geoscientific Model Development Discussions ◽

10.5194/gmdd-5-2179-2012 ◽

2012 ◽

Vol 5 (3) ◽

pp. 2179-2214 ◽

Cited By ~ 1

Author(s):

E. Siewertsen ◽

J. Piwonski ◽

T. Slawig

Keyword(s):

Ocean Circulation ◽

Message Passing ◽

Graphics Processing Units ◽

Message Passing Interface ◽

Marine Ecosystem ◽

Computational Effort ◽

Computational Time ◽

Biogeochemical Model ◽

Ecosystem Models ◽

Marine Ecosystem Model

Abstract. We have ported an implementation of the spin-up for marine ecosystem models based on the "Transport Matrix Method" to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the PETSc library that is based on the "Message Passing Interface (MPI)" standard. The spin-up computes a steady seasonal cycle of the ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented in so-called "transport matrices". Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the CUDA standard, a specialized version of the PETSc toolkit and a CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplary ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can beat a significant number of cluster CPUs without further code optimization.

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Development of Solid–Fluid Reaction Models—A Literature Review

ChemEngineering ◽

10.3390/chemengineering5030036 ◽

2021 ◽

Vol 5 (3) ◽

pp. 36

Author(s):

Leilei Dong ◽

Italo Mazzarino ◽

Alessio Alexiadis

Keyword(s):

Process Design ◽

Feasibility Studies ◽

Time Frame ◽

Physicochemical Process ◽

Computational Time ◽

Comprehensive Review ◽

Multi Scale ◽

Reaction Models ◽

Feasible Solutions ◽

Further Development

A comprehensive review is carried out on the models and correlations for solid/fluid reactions that result from a complex multi-scale physicochemical process. A simulation of this process with CFD requires various complicated submodels and significant computational time, which often makes it undesirable and impractical in many industrial activities requiring a quick solution within a limited time frame, such as new product/process design, feasibility studies, and the evaluation or optimization of the existing processes, etc. In these circumstances, the existing models and correlations developed in the last few decades are of significant relevance and become a useful simulation tool. However, despite the increasing research interests in this area in the last thirty years, there is no comprehensive review available. This paper is thus motivated to review the models developed so far, as well as provide the selection guidance for model and correlations for the specific application to help engineers and researchers choose the most appropriate model for feasible solutions. Therefore, this review is also of practical relevance to professionals who need to perform engineering design or simulation work. The areas needing further development in solid–fluid reaction modelling are also identified and discussed.

Download Full-text

Clinical correlates of anti-SARS-CoV-2 antibody profiles in Spanish COVID-19 patients from a high incidence region

Scientific Reports ◽

10.1038/s41598-021-83969-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Robert Markewitz ◽

Antje Torge ◽

Klaus-Peter Wandinger ◽

Daniela Pauli ◽

Andre Franke ◽

...

Keyword(s):

Case Fatality ◽

Time Frame ◽

Epidemic Outbreak ◽

Sample Collection ◽

Rt Pcr ◽

High Incidence ◽

Duration Of Hospitalization ◽

Reasonable Time Frame ◽

Support Treatment ◽

Diagnostic And Prognostic Value

AbstractLaboratory testing for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) consists of two pillars: the detection of viral RNA via rt-PCR as the diagnostic gold standard in acute cases, and the detection of antibodies against SARS-CoV-2. However, concerning the latter, questions remain about their diagnostic and prognostic value and it is not clear whether all patients develop detectable antibodies. We examined sera from 347 Spanish COVID-19 patients, collected during the peak of the epidemic outbreak in Spain, for the presence of IgA and IgG antibodies against SARS-CoV-2 and evaluated possible associations with age, sex and disease severity (as measured by duration of hospitalization, kind of respiratory support, treatment in ICU and death). The presence and to some degree the levels of anti-SARS-CoV-2 antibodies depended mainly on the amount of time between onset of symptoms and the collection of serum. A subgroup of patients did not develop antibodies at the time of sample collection. Compared to the patients that did, no differences were found. The presence and level of antibodies was not associated with age, sex, duration of hospitalization, treatment in the ICU or death. The case-fatality rate increased exponentially with older age. Neither the presence, nor the levels of anti-SARS-CoV-2 antibodies served as prognostic markers in our cohort. This is discussed as a possible consequence of the timing of the sample collection. Age is the most important risk factor for an adverse outcome in our cohort. Some patients appear not to develop antibodies within a reasonable time frame. It is unclear, however, why that is, as these patients differ in no respect examined by us from those who developed antibodies.

Download Full-text

Distributed Singular Value Decomposition Method for Fast Data Processing in Recommendation Systems

Energies ◽

10.3390/en14082284 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2284

Author(s):

Krzysztof Przystupa ◽

Mykola Beshley ◽

Olena Hordiichuk-Bublivska ◽

Marian Kyryk ◽

Halyna Beshley ◽

...

Keyword(s):

Distributed Systems ◽

Singular Value Decomposition ◽

Data Processing ◽

Message Passing ◽

Message Passing Interface ◽

Recommendation Systems ◽

Singular Value ◽

Singular Value Decomposition Method ◽

Value Decomposition ◽

Svd Method

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.

Download Full-text