CUDA Based Parallel Computation for Gauss Elimination Method

Author(s):  
Xiao Liu ◽  
Lei Xu

The Central Processing Unit (CPU) parallel algorithm based on Computing Unified Device Architecture (CUDA) has shown great power of computing speedup ability. What performance will the new technique show in the field of structural computation? We choose the Gauss elimination method as the research object. In this study, the parallel Gauss elimination is realized in CUDA on GPU. Furthermore, we carry out two groups of numerical experiments. The first group investigates the effect of Matrix Bandwidths (MBs) and Node Numbers (NNs) on speedup ratio. The second one compares our method with the commercial software by analyzing two actual structural problems in ocean engineering.

2014 ◽  
Vol 6 (2) ◽  
pp. 129-133
Author(s):  
Evaldas Borcovas ◽  
Gintautas Daunys

Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I), NVidia GeForce GT320M CUDAcompliable graphics card (GPU I) and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II), NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II).Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have. Vaizdų apdorojimas, kompiuterinė rega ir kiti sudėtingi algoritmai, apdorojantys optinę informaciją, naudoja dideliusskaičiavimo išteklius. Dažnai šiuos algoritmus reikia realizuoti realiuoju laiku. Šį uždavinį išspręsti naudojant tik vienoCPU (angl. Central processing unit) pajėgumus yra sudėtinga. nVidia pasiūlyta CUDA (angl. Compute unified device architecture)technologija leidžia panaudoti GPU (angl. Graphic processing unit) išteklius. Tyrimui atlikti buvo pasirinkti du skirtingiCPU: Intel Pentium Dual-Core T4500 ir Intel Core I5 2500K, bei GPU: nVidia GeForce GT320M ir NVidia GeForce 560.Tyrime buvo panaudotos vaizdų apdorojimo bibliotekos: OpenCV 2.1 ir OpenCV 2.4. Tyrimui buvo pasirinktas šablonų atitiktiesalgoritmas. Algoritmui realizuoti reikalingas analizuojamas vaizdas ir ieškomo objekto vaizdo šablonas. Tyrimo metu buvokeičiamas vaizdo ir šablono dydis bei stebima, kaip tai veikia algoritmo vykdymo trukmę ir vykdomų operacijų skaičių persekundę. Iš gautų rezultatų galima teigti, kad apdorojant didelį duomenų kiekį GPU realizuoja algoritmą iki 24 kartų greičiaunei tik CPU. Dirbant su nedideliu duomenų kiekiu, skirtumas tarp CPU ir GPU yra minimalus. Lyginant skaičiavimus dviejuoseGPU, pastebėta, kad skaičiavimų sparta yra tiesiogiai proporcinga GPU turimų branduolių kiekiui. Mūsų tyrimo atvejuspartesniame GPU jų buvo 16 kartų daugiau, tad ir skaičiavimai vyko 16 kartų sparčiau.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Ronglin Jiang ◽  
Shugang Jiang ◽  
Yu Zhang ◽  
Ying Xu ◽  
Lei Xu ◽  
...  

This paper introduces a (finite difference time domain) FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI) and Open Multiprocessing (OpenMP). Since both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with16×18elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.


2013 ◽  
Vol 13 (1) ◽  
pp. 15
Author(s):  
Harry S.J Koleangan

ANALISIS KONSENTRASI CAMPURAN SENYAWA MENGGUNAKAN VB 2008 ABSTRAK Telah dibuat sebuah program aplikasi menggunakan VB 2008 yang ditujukan untuk menganalisis suatu larutan yang berisi campuran senyawa etil-benzena, o-silena, m-silena, dan p-silena. Konsentrasi dari masing-masing senyawa ini ditentukan menggunakan metode eliminasi Gauss dalam bentuk program komputer yang ditulis menggunakan bahasa pemrograman Visual Basic 2008. Penggunaan program ini terhadap suatu data sekunder, memberikan hasil konsentrasi (dalam satuan molar) sebagai berikut: etil-benzena = 0,04153, o-silena = 0,04067, m-silena = 0,02772,  dan p-silena = 0,02522. Kata kunci: Metode Gauss, VB 2008   ANALYSIS OF MIXED CPMPOUND CONCENTRATION USING VB 2008 ABSTRACT A VB 2008-based  application program  to analyze  a solution containing four different compounds, which are ethyl-benzene, o-xylene, m-xylene, and p-xylene, has been built. Concentration of each compound was then determined by using Gauss elimination method in the form of computer program written in Visual Basic 2008 programming language. Application of the program using the secondary data shows that concentrations (in molar) of each compuound are as follows: ethyl-benzene = 0,04153, o-xylena = 0,04067, m- xylena = 0,02772,  and p- xylena = 0,02522. Keywords: Gauss method, VB 2008


2017 ◽  
Vol 14 (1) ◽  
pp. 789-795
Author(s):  
V Saveetha ◽  
S Sophia

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.


Sign in / Sign up

Export Citation Format

Share Document