GPU Optimization of Convolution for Large 3-D Real Images

Author(s):  
Pavel Karas ◽  
David Svoboda ◽  
Pavel Zemčík
Keyword(s):  
2019 ◽  
Vol 37 (6) ◽  
pp. 1-12 ◽  
Author(s):  
Ming Gao ◽  
Xinlei Wang ◽  
Kui Wu ◽  
Andre Pradhana ◽  
Eftychios Sifakis ◽  
...  

2016 ◽  
Vol 80 ◽  
pp. 2158-2168
Author(s):  
Eduardo C. Vasconcellos ◽  
Esteban W.G. Clua ◽  
Reinaldo R. Rosa ◽  
João G. F.M. Gazolla ◽  
Nuno César da R. Ferreira ◽  
...  

2014 ◽  
Vol 29 ◽  
pp. 172-183 ◽  
Author(s):  
Christoph Riesinger ◽  
Tobias Neckel ◽  
Florian Rupp ◽  
Alfredo Parra Hinojosa ◽  
Hans-Joachim Bungartz

2013 ◽  
Vol 35 (5) ◽  
pp. S209-S228 ◽  
Author(s):  
Daniel Lowell ◽  
Jeswin Godwin ◽  
Justin Holewinski ◽  
Deepan Karthik ◽  
Chekuri Choudary ◽  
...  

2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Federico Raimondo ◽  
Juan E. Kamienkowski ◽  
Mariano Sigman ◽  
Diego Fernandez Slezak

In recent years, Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for online analysis of the data, used in brain computing interfaces, results are almost completely prohibitive. We show an increase with almost no cost (a rapid video card) of speed of ICA by about 25 fold. The EEG data, which is a repetition of many independent signals in multiple channels, is very suitable for processing using the vector processors included in the graphical units. We profiled the implementation of this algorithm and detected two main types of operations responsible of the processing bottleneck and taking almost 80% of computing time: vector-matrix and matrix-matrix multiplications. By replacing function calls to basic linear algebra functions to the standard CUBLAS routines provided by GPU manufacturers, it does not increase performance due to CUDA kernel launch overhead. Instead, we developed a GPU-based solution that, comparing with the original BLAS and CUBLAS versions, obtains a 25x increase of performance for the ICA calculation.


Sign in / Sign up

Export Citation Format

Share Document