Designing Parallel Adaptive Laplacian Smoothing for Improving Tetrahedral Mesh Quality on the GPU

Mesh quality is a critical issue in numerical computing because it directly impacts both computational efficiency and accuracy. Tetrahedral meshes are widely used in various engineering and science applications. However, in large-scale and complicated application scenarios, there are a large number of tetrahedrons, and in this case, the improvement of mesh quality is computationally expensive. Laplacian mesh smoothing is a simple mesh optimization method that improves mesh quality by changing the locations of nodes. In this paper, by exploiting the parallelism features of the modern graphics processing unit (GPU), we specifically designed a parallel adaptive Laplacian smoothing algorithm for improving the quality of large-scale tetrahedral meshes. In the proposed adaptive algorithm, we defined the aspect ratio as a metric to judge the mesh quality after each iteration to ensure that every smoothing improves the mesh quality. The adaptive algorithm avoids the shortcoming of the ordinary Laplacian algorithm to create potential invalid elements in the concave area. We conducted 5 groups of comparative experimental tests to evaluate the performance of the proposed parallel algorithm. The results demonstrated that the proposed adaptive algorithm is up to 23 times faster than the serial algorithms; and the accuracy of the tetrahedral mesh is satisfactorily improved after adaptive Laplacian mesh smoothing. Compared with the ordinary Laplacian algorithm, the proposed adaptive Laplacian algorithm is more applicable, and can effectively deal with those tetrahedrons with extremely poor quality. This indicates that the proposed parallel algorithm can be applied to improve the mesh quality in large-scale and complicated application scenarios.

Download Full-text

Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU

Applied Sciences ◽

10.3390/app9245437 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5437

Author(s):

Lei Xiao ◽

Guoxiang Yang ◽

Kunyang Zhao ◽

Gang Mei

Keyword(s):

Large Scale ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Three Dimensional ◽

Smoothing Method ◽

Three Dimensions ◽

Processing Unit ◽

Mesh Quality ◽

Mesh Smoothing ◽

Laplacian Smoothing

In numerical modeling, mesh quality is one of the decisive factors that strongly affects the accuracy of calculations and the convergence of iterations. To improve mesh quality, the Laplacian mesh smoothing method, which repositions nodes to the barycenter of adjacent nodes without changing the mesh topology, has been widely used. However, smoothing a large-scale three dimensional mesh is quite computationally expensive, and few studies have focused on accelerating the Laplacian mesh smoothing method by utilizing the graphics processing unit (GPU). This paper presents a GPU-accelerated parallel algorithm for Laplacian smoothing in three dimensions by considering the influence of different data layouts and iteration forms. To evaluate the efficiency of the GPU implementation, the parallel solution is compared with the original serial solution. Experimental results show that our parallel implementation is up to 46 times faster than the serial version.

Download Full-text

An efficient method to improve the quality of tetrahedron mesh with MFRC

Scientific Reports ◽

10.1038/s41598-021-02187-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yuzheng Ma ◽

Monan Wang

Keyword(s):

Quality Improvement ◽

Large Scale ◽

Tetrahedral Mesh ◽

Limited Area ◽

Target Area ◽

Mesh Quality ◽

Edge Removal ◽

Mesh Improvement ◽

Improvement Method ◽

Vertex Insertion

AbstractIn this paper, we proposed a novel operation to reconstruction tetrahedrons within a certain region, which we call MFRC (Multi-face reconstruction). During the existing tetrahedral mesh improvement methods, the flip operation is one of the very important components. However, due to the limited area affected by the flip, the improvement of the mesh quality by the flip operation is also very limited. The proposed MFRC algorithm solves this problem. MFRC can reconstruct the local mesh in a larger range and can find the optimal tetrahedron division in the target area within acceptable time complexity. Therefore, based on the MFRC algorithm, we combined other operations including smoothing, edge removal, face removal, and vertex insertion/deletion to develop an effective mesh quality improvement method. Numerical experiments of dozens of meshes show that the algorithm can effectively improve the low-quality elements in the tetrahedral mesh, and can effectively reduce the running time, which has important significance for the quality improvement of large-scale mesh.

Download Full-text

A Novel Parallel Algorithm with Map Segmentation for Multiple Geographical Feature Label Placement Problem

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10120826 ◽

2021 ◽

Vol 10 (12) ◽

pp. 826

Author(s):

Mohammad Naser Lessani ◽

Jiqiu Deng ◽

Zhiyong Guo

Keyword(s):

Parallel Algorithm ◽

Large Scale ◽

Computational Time ◽

Processing Unit ◽

Placement Problem ◽

Central Processing ◽

Label Placement ◽

Map Segmentation ◽

Real World Datasets ◽

Geographical Feature

Multiple geographical feature label placement (MGFLP) is an NP-hard problem that can negatively influence label position accuracy and the computational time of the algorithm. The complexity of such a problem is compounded as the number of features for labeling increases, causing the execution time of the algorithms to grow exponentially. Additionally, in large-scale solutions, the algorithm possibly gets trapped in local minima, which imposes significant challenges in automatic label placement. To address the mentioned challenges, this paper proposes a novel parallel algorithm with the concept of map segmentation which decomposes the problem of multiple geographical feature label placement (MGFLP) to achieve a more intuitive solution. Parallel computing is then utilized to handle each decomposed problem simultaneously on a separate central processing unit (CPU) to speed up the process of label placement. The optimization component of the proposed algorithm is designed based on the hybrid of discrete differential evolution and genetic algorithms. Our results based on real-world datasets confirm the usability and scalability of the algorithm and illustrate its excellent performance. Moreover, the algorithm gained superlinear speedup compared to the previous studies that applied this hybrid algorithm.

Download Full-text

Preface to Special Issue on Large-scale Numerical Computing Technology for Electromagnetic Field Analysis

IEEJ Transactions on Power and Energy ◽

10.1541/ieejpes.139.636 ◽

2019 ◽

Vol 139 (11) ◽

pp. 636-636

Author(s):

Fumiaki Ikeda ◽

Daigo Yonetsu ◽

Takeshi Mifune ◽

Kengo Sugahara

Keyword(s):

Electromagnetic Field ◽

Large Scale ◽

Field Analysis ◽

Special Issue ◽

Computing Technology ◽

Numerical Computing

Download Full-text

A parallel algorithm for large-scale linear programs with a special structure

Proceedings of IEEE Scalable High Performance Computing Conference ◽

10.1109/shpcc.1994.296716 ◽

2002 ◽

Author(s):

Seyoung Oh ◽

S.Y. Shin

Keyword(s):

Parallel Algorithm ◽

Large Scale ◽

Special Structure ◽

Linear Programs

Download Full-text

Enabling Large-Scale Simulations of Quantum Transport with Manycore Computing

Electronics ◽

10.3390/electronics10030253 ◽

2021 ◽

Vol 10 (3) ◽

pp. 253

Author(s):

Yosang Jeong ◽

Hoon Ryu

Keyword(s):

Quantum Transport ◽

Large Scale ◽

Performance Enhancement ◽

Silicon Nanowire ◽

Matrix Multiplication ◽

Tight Binding ◽

Optimization Techniques ◽

Wide Energy Range ◽

Processing Unit ◽

Binding Model

The non-equilibrium Green’s function (NEGF) is being utilized in the field of nanoscience to predict transport behaviors of electronic devices. This work explores how much performance improvement can be driven for quantum transport simulations with the aid of manycore computing, where the core numerical operation involves a recursive process of matrix multiplication. Major techniques adopted for performance enhancement are data restructuring, matrix tiling, thread scheduling, and offload computing, and we present technical details on how they are applied to optimize the performance of simulations in computing hardware, including Intel Xeon Phi Knights Landing (KNL) systems and NVIDIA general purpose graphic processing unit (GPU) devices. With a target structure of a silicon nanowire that consists of 100,000 atoms and is described with an atomistic tight-binding model, the effects of optimization techniques on the performance of simulations are rigorously tested in a KNL node equipped with two Quadro GV100 GPU devices, and we observe that computation is accelerated by a factor of up to ∼20 against the unoptimized case. The feasibility of handling large-scale workloads in a huge computing environment is also examined with nanowire simulations in a wide energy range, where good scalability is procured up to 2048 KNL nodes.

Download Full-text

Power-to-Green Methanol via CO2 Hydrogenation—A Concept Study Including Oxyfuel Fluidized Bed Combustion of Biomass

Energies ◽

10.3390/en14154638 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4638

Author(s):

Simon Pratschner ◽

Pavel Skopec ◽

Jan Hrdlicka ◽

Franz Winter

Keyword(s):

Large Scale ◽

Positive Impact ◽

Renewable Energy Sources ◽

Air Separation ◽

Model Parameters ◽

Processing Unit ◽

Fluidized Bed Combustion ◽

Separation Unit ◽

Oxyfuel Combustion ◽

Wind Park

A revolution of the global energy industry is without an alternative to solving the climate crisis. However, renewable energy sources typically show significant seasonal and daily fluctuations. This paper provides a system concept model of a decentralized power-to-green methanol plant consisting of a biomass heating plant with a thermal input of 20 MWth. (oxyfuel or air mode), a CO2 processing unit (DeOxo reactor or MEA absorption), an alkaline electrolyzer, a methanol synthesis unit, an air separation unit and a wind park. Applying oxyfuel combustion has the potential to directly utilize O2 generated by the electrolyzer, which was analyzed by varying critical model parameters. A major objective was to determine whether applying oxyfuel combustion has a positive impact on the plant’s power-to-liquid (PtL) efficiency rate. For cases utilizing more than 70% of CO2 generated by the combustion, the oxyfuel’s O2 demand is fully covered by the electrolyzer, making oxyfuel a viable option for large scale applications. Conventional air combustion is recommended for small wind parks and scenarios using surplus electricity. Maximum PtL efficiencies of ηPtL,Oxy = 51.91% and ηPtL,Air = 54.21% can be realized. Additionally, a case study for one year of operation has been conducted yielding an annual output of about 17,000 t/a methanol and 100 GWhth./a thermal energy for an input of 50,500 t/a woodchips and a wind park size of 36 MWp.

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Realtime cerebellum: A large-scale spiking network model of the cerebellum that runs in realtime using a graphics processing unit

Neural Networks ◽

10.1016/j.neunet.2013.01.019 ◽

2013 ◽

Vol 47 ◽

pp. 103-111 ◽

Cited By ~ 47

Author(s):

Tadashi Yamazaki ◽

Jun Igarashi

Keyword(s):

Network Model ◽

Large Scale ◽

Graphics Processing Unit ◽

Processing Unit ◽

Spiking Network ◽

Graphics Processing

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text