AN EVALUATION OF MULTIPLE FEED-FORWARD NETWORKS ON GPUs

2011 ◽  
Vol 21 (01) ◽  
pp. 31-47 ◽  
Author(s):  
NOEL LOPES ◽  
BERNARDETE RIBEIRO

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

2021 ◽  
Author(s):  
Airidas Korolkovas ◽  
Alexander Katsevich ◽  
Michael Frenkel ◽  
William Thompson ◽  
Edward Morton

X-ray computed tomography (CT) can provide 3D images of density, and possibly the atomic number, for large objects like passenger luggage. This information, while generally very useful, is often insufficient to identify threats like explosives and narcotics, which can have a similar average composition as benign everyday materials such as plastics, glass, light metals, etc. A much more specific material signature can be measured with X-ray diffraction (XRD). Unfortunately, XRD signal is very faint compared to the transmitted one, and also challenging to reconstruct for objects larger than a small laboratory sample. In this article we analyze a novel low-cost scanner design which captures CT and XRD signals simultaneously, and uses the least possible collimation to maximize the flux. To simulate a realistic instrument, we derive a formula for the resolution of any diffraction pathway, taking into account the polychromatic spectrum, and the finite size of the source, detector, and each voxel. We then show how to reconstruct XRD patterns from a large phantom with multiple diffracting objects. Our approach includes a reasonable amount of photon counting noise (Poisson statistics), as well as measurement bias, in particular incoherent Compton scattering. The resolution of our reconstruction is sufficient to provide significantly more information than standard CT, thus increasing the accuracy of threat detection. Our theoretical model is implemented in GPU (Graphics Processing Unit) accelerated software which can be used to assess and further optimize scanner designs for specific applications in security, healthcare, and manufacturing quality control.


2019 ◽  
Vol 23 (2) ◽  
pp. 1505-1516 ◽  
Author(s):  
Mohammad Hossein Shafiabadi ◽  
Hossein Pedram ◽  
Midia Reshadi ◽  
Akram Reza

2009 ◽  
Vol 17 (25) ◽  
pp. 22320 ◽  
Author(s):  
Claudio Vinegoni ◽  
Lyuba Fexon ◽  
Paolo Fumene Feruglio ◽  
Misha Pivovarov ◽  
Jose-Luiz Figueiredo ◽  
...  

2017 ◽  
Author(s):  
Richard Wilton ◽  
Xin Li ◽  
Andrew P. Feinberg ◽  
Alexander S. Szalay

AbstractThe alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can potentially be addressed by appropriate software-engineering and algorithmic improvements. One strategy is to integrate this additional programming logic into the read-alignment implementation in a way that the software becomes amenable to optimizations that lead to both higher speed and greater sensitivity than can be achieved without this integration.We have evaluated this approach using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by the most widely used BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.


2012 ◽  
Vol 53 ◽  
Author(s):  
Beatričė Andziulienė ◽  
Evaldas Žulkas ◽  
Audrius Kuprinavičius

In this work Fast Fourier transformation algorithm for general purpose graphics processing unit processing (GPGPU) is discussed. Algorithm structure and individual stages performance were analysed. With performance analysis method algorithm distribution and data allocation possibilities were determined, depending on algorithm stages execution speed and algorithm structure. Ratio between CPU and GPU execution during Fast Fourier transform signal processing was determined using computer-generated data with frequency. When adopting CPU code for CUDA execution, it not becomes more complex, even if stream procesor parallelization and data transfering algorith stages are considered. But central processing unit serial execution).


Sign in / Sign up

Export Citation Format

Share Document