Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer's Notebook

Abstract In the paper authors verify the advantages of GPU computing applied to fuzzy c-means segmentation. Three different algorithms implementing FCM method have been compared by their execution times. All tests refer to the images of polyurethane foam matrices filled with fungus (mould). They are aimed at separating mould regions from the matrix base. The authors proposed a method using CUDA programming tools, which significantly speedsup FCM computations with multiple cores built in a graphic card.

Download Full-text

HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTING

Parallel Processing Letters ◽

10.1142/s0129626411000187 ◽

2011 ◽

Vol 21 (02) ◽

pp. 245-272 ◽

Cited By ~ 106

Author(s):

DUANE MERRILL ◽

ANDREW GRIMSHAW

Keyword(s):

High Performance ◽

Gpu Computing ◽

State Of The Art ◽

Design Strategies ◽

Kernel Fusion ◽

Parallel Prefix ◽

Scan Data ◽

Many Core ◽

Global Data

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.

Download Full-text

Use of GPU Computing for Uncertainty Quantification in Computational Mechanics: A Case Study

Scientific Programming ◽

10.1155/2011/730213 ◽

2011 ◽

Vol 19 (4) ◽

pp. 199-212 ◽

Cited By ~ 3

Author(s):

Gaurav ◽

Steven F. Wojtkiewicz

Keyword(s):

Parallel Computing ◽

Uncertainty Quantification ◽

Graphics Processing Units ◽

Computational Mechanics ◽

Gpu Computing ◽

Single Instruction Multiple Data ◽

Performance Constraints ◽

Multiple Data ◽

Graphics Processing

Graphics processing units (GPUs) are rapidly emerging as a more economical and highly competitive alternative to CPU-based parallel computing. As the degree of software control of GPUs has increased, many researchers have explored their use in non-gaming applications. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing alternatives in single-instruction multiple-data (SIMD) strategies. This study explores the use of GPUs for uncertainty quantification in computational mechanics. Five types of analysis procedures that are frequently utilized for uncertainty quantification of mechanical and dynamical systems have been considered and their GPU implementations have been developed. The numerical examples presented in this study show that considerable gains in computational efficiency can be obtained for these procedures. It is expected that the GPU implementations presented in this study will serve as initial bases for further developments in the use of GPUs in the field of uncertainty quantification and will (i) aid the understanding of the performance constraints on the relevant GPU kernels and (ii) provide some guidance regarding the computational and the data structures to be utilized in these novel GPU implementations.

Download Full-text

AUTOMATIC DETECTION OF TIMBER-CRACKS IN WOODEN ARCHITECTURAL HERITAGE USING YOLOv3 ALGORITHM

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-1471-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 1471-1476

Author(s):

Y. Liu ◽

M. Hou ◽

A. Li ◽

Y. Dong ◽

L. Xie ◽

...

Keyword(s):

Deep Learning ◽

Crack Detection ◽

Gpu Computing ◽

Point Clouds ◽

Learning Technology ◽

Architectural Heritage ◽

Detection Model ◽

The Status ◽

Computing Performance

Abstract. As there usually exist widespread crack, decay, deformation and other damages in the wooden architectural heritage (WAH). It is of great significance to detect the damages automatically and rapidly in order to grasp the status for daily repairs. Traditional methods use artificial feature-driven point clouds and image processing technology for object detection. With the development of big data and GPU computing performance, data-driven deep learning technology has been widely used for monitoring WAH. Deep learning technology is more accurate, faster, and more robust than traditional methods.In this paper, we conducted a case study to detect timber-crack damages in WAH, and selected the YOLOv3 algorithm with DarkNet-53 as the backbone network in the deep learning technology according to the characteristics of the crack. A large timber-crack dataset was first constructed, based on which the timber-crack detection model was trained and tested. The results were analyzed both qualitatively and quantitatively, showing that our proposed method was able to reach an accuracy of more than 90% through processing each image for less than 0.1s. The promising results illustrate the validity of our self-constructed dataset as well as the reliability of YOLOv3 algorithm for the crack detection of wooden heritage.

Download Full-text

Flexible job shop problem – parallel tabu search algorithm for multi-GPU

Archives of Control Sciences ◽

10.2478/v10170-011-0030-2 ◽

2012 ◽

Vol 22 (4) ◽

pp. 389-397 ◽

Cited By ~ 3

Author(s):

Wojciech Bożejko ◽

Mariusz Uchroński ◽

Mieczysław Wodecki

Keyword(s):

Tabu Search ◽

Job Shop ◽

Job Shop Scheduling ◽

Gpu Computing ◽

Optimization Problems ◽

Search Algorithm ◽

Flexible Job Shop ◽

Tabu Search Algorithm ◽

New Framework

In the paper we propose a new framework for the distributed tabu search algorithm designed to be executed with the use of a multi-GPU cluster, in which cluster of nodes are equipped with multicore GPU computing units. The proposed methodology is designed specially to solve difficult discrete optimization problems, such as a flexible job shop scheduling problem, which we introduce as a case study used to analyze the efficiency of the designed synchronous algorithm.

Download Full-text

Regional Seismic Damage Prediction Based on High-Performance GPU Computing: A Case Study of Tsinghua University Campus

Computing in Civil and Building Engineering (2014) ◽

10.1061/9780784413616.147 ◽

2014 ◽

Author(s):

Zhen Xu ◽

Xinzheng Lu ◽

Bo Han ◽

Chen Xiong ◽

Aizhu Ren

Keyword(s):

High Performance ◽

Gpu Computing ◽

Seismic Damage ◽

University Campus ◽

Damage Prediction ◽

Tsinghua University

Download Full-text

GPU Computing Using Concurrent Kernels: A Case Study

Recent Advances in Computer Science and Information Engineering - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-642-25766-7_23 ◽

2012 ◽

pp. 173-181

Author(s):

Fengshun Lu ◽

Junqiang Song ◽

Fukang Yin ◽

Xiaoqian Zhu

Keyword(s):

Gpu Computing

Download Full-text

Comparison of GPU Computing Methodologies for Safety-Critical Systems: An Avionics Case Study

10.23919/date51398.2021.9474060 ◽

2021 ◽

Author(s):

Marc Benito ◽

Matina Maria Trompouki ◽

Leonidas Kosmidis ◽

Juan David Garcia ◽

Sergio Carretero ◽

...

Keyword(s):

Gpu Computing ◽

Critical Systems ◽

Safety Critical ◽

Computing Methodologies ◽

Safety Critical Systems

Download Full-text

The Genesis and Internal Dynamics of El Salvador's People's Revolutionary Army, 1970–1976

Behavioral and Brain Sciences ◽

10.1017/s0140525x13009850 ◽

2014 ◽

Vol 38 (01) ◽

pp. 102-129

Author(s):

ALBERTO MARTÍN ÁLVAREZ ◽

EUDALD CORTINA ORERO

Keyword(s):

Latin America ◽

El Salvador ◽

Early Years ◽

Left Wing ◽

Internal Dynamics ◽

Internal Coherence ◽

The 1960S

AbstractUsing interviews with former militants and previously unpublished documents, this article traces the genesis and internal dynamics of the Ejército Revolucionario del Pueblo (People's Revolutionary Army, ERP) in El Salvador during the early years of its existence (1970–6). This period was marked by the inability of the ERP to maintain internal coherence or any consensus on revolutionary strategy, which led to a series of splits and internal fights over control of the organisation. The evidence marshalled in this case study sheds new light on the origins of the armed Salvadorean Left and thus contributes to a wider understanding of the processes of formation and internal dynamics of armed left-wing groups that emerged from the 1960s onwards in Latin America.

Download Full-text