Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer's Notebook

2010 ◽  
Vol 52 (3) ◽  
pp. 116-122 ◽  
Author(s):  
Danilo De Donno ◽  
Alessandra Esposito ◽  
Luciano Tarricone ◽  
Luca Catarinucci
2012 ◽  
Vol 17 (4) ◽  
pp. 191-200 ◽  
Author(s):  
Zdzisława Rowińska ◽  
Jarosław Gocławski

Abstract In the paper authors verify the advantages of GPU computing applied to fuzzy c-means segmentation. Three different algorithms implementing FCM method have been compared by their execution times. All tests refer to the images of polyurethane foam matrices filled with fungus (mould). They are aimed at separating mould regions from the matrix base. The authors proposed a method using CUDA programming tools, which significantly speedsup FCM computations with multiple cores built in a graphic card.


2011 ◽  
Vol 21 (02) ◽  
pp. 245-272 ◽  
Author(s):  
DUANE MERRILL ◽  
ANDREW GRIMSHAW

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.


2011 ◽  
Vol 19 (4) ◽  
pp. 199-212 ◽  
Author(s):  
Gaurav ◽  
Steven F. Wojtkiewicz

Graphics processing units (GPUs) are rapidly emerging as a more economical and highly competitive alternative to CPU-based parallel computing. As the degree of software control of GPUs has increased, many researchers have explored their use in non-gaming applications. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing alternatives in single-instruction multiple-data (SIMD) strategies. This study explores the use of GPUs for uncertainty quantification in computational mechanics. Five types of analysis procedures that are frequently utilized for uncertainty quantification of mechanical and dynamical systems have been considered and their GPU implementations have been developed. The numerical examples presented in this study show that considerable gains in computational efficiency can be obtained for these procedures. It is expected that the GPU implementations presented in this study will serve as initial bases for further developments in the use of GPUs in the field of uncertainty quantification and will (i) aid the understanding of the performance constraints on the relevant GPU kernels and (ii) provide some guidance regarding the computational and the data structures to be utilized in these novel GPU implementations.


Author(s):  
Y. Liu ◽  
M. Hou ◽  
A. Li ◽  
Y. Dong ◽  
L. Xie ◽  
...  

Abstract. As there usually exist widespread crack, decay, deformation and other damages in the wooden architectural heritage (WAH). It is of great significance to detect the damages automatically and rapidly in order to grasp the status for daily repairs. Traditional methods use artificial feature-driven point clouds and image processing technology for object detection. With the development of big data and GPU computing performance, data-driven deep learning technology has been widely used for monitoring WAH. Deep learning technology is more accurate, faster, and more robust than traditional methods.In this paper, we conducted a case study to detect timber-crack damages in WAH, and selected the YOLOv3 algorithm with DarkNet-53 as the backbone network in the deep learning technology according to the characteristics of the crack. A large timber-crack dataset was first constructed, based on which the timber-crack detection model was trained and tested. The results were analyzed both qualitatively and quantitatively, showing that our proposed method was able to reach an accuracy of more than 90% through processing each image for less than 0.1s. The promising results illustrate the validity of our self-constructed dataset as well as the reliability of YOLOv3 algorithm for the crack detection of wooden heritage.


2012 ◽  
Vol 22 (4) ◽  
pp. 389-397 ◽  
Author(s):  
Wojciech Bożejko ◽  
Mariusz Uchroński ◽  
Mieczysław Wodecki

In the paper we propose a new framework for the distributed tabu search algorithm designed to be executed with the use of a multi-GPU cluster, in which cluster of nodes are equipped with multicore GPU computing units. The proposed methodology is designed specially to solve difficult discrete optimization problems, such as a flexible job shop scheduling problem, which we introduce as a case study used to analyze the efficiency of the designed synchronous algorithm.


2021 ◽  
Author(s):  
Marc Benito ◽  
Matina Maria Trompouki ◽  
Leonidas Kosmidis ◽  
Juan David Garcia ◽  
Sergio Carretero ◽  
...  

2014 ◽  
Vol 38 (01) ◽  
pp. 102-129
Author(s):  
ALBERTO MARTÍN ÁLVAREZ ◽  
EUDALD CORTINA ORERO

AbstractUsing interviews with former militants and previously unpublished documents, this article traces the genesis and internal dynamics of the Ejército Revolucionario del Pueblo (People's Revolutionary Army, ERP) in El Salvador during the early years of its existence (1970–6). This period was marked by the inability of the ERP to maintain internal coherence or any consensus on revolutionary strategy, which led to a series of splits and internal fights over control of the organisation. The evidence marshalled in this case study sheds new light on the origins of the armed Salvadorean Left and thus contributes to a wider understanding of the processes of formation and internal dynamics of armed left-wing groups that emerged from the 1960s onwards in Latin America.


Sign in / Sign up

Export Citation Format

Share Document