High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units

AbstractThe technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPU). However, electronics systems are limited with respect to power dissipation and delay, due to wire-charging challenges related to interconnect capacitance. Here we present a silicon photonics-based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently by executing the convolution as a multiplication in the Fourier-domain. The algorithmic executing time is determined by the time-of-flight of the signal through this photonic reconfigurable passive FFT ‘filter’ circuit and is on the order of 10’s of picosecond short. A sensitivity analysis shows that this optical processor must be thermally phase stabilized corresponding to a few degrees. Furthermore, we find that for a small sample number, the obtainable number of convolutions per {time, power, and chip area) outperforms GPUs by about two orders of magnitude. Lastly, we show that, conceptually, the optical FFT and convolution-processing performance is indeed directly linked to optoelectronic device-level, and improvements in plasmonics, metamaterials or nanophotonics are fueling next generation densely interconnected intelligent photonic circuits with relevance for edge-computing 5G networks by processing tensor operations optically.

Download Full-text

End-to-End High Speed Forward Error Correction Using Graphics Processing Units

Lecture Notes in Electrical Engineering - Mobile, Ubiquitous, and Intelligent Computing ◽

10.1007/978-3-642-40675-1_8 ◽

2014 ◽

pp. 47-53

Author(s):

Md Shohidul Islam ◽

Jong-Myon Kim

Keyword(s):

Error Correction ◽

Graphics Processing Units ◽

High Speed ◽

Forward Error Correction ◽

End To End ◽

Forward Error ◽

Graphics Processing

Download Full-text

High-Speed Nonlinear Finite Element Analysis for Surgical Simulation Using Graphics Processing Units

IEEE Transactions on Medical Imaging ◽

10.1109/tmi.2007.913112 ◽

2008 ◽

Vol 27 (5) ◽

pp. 650-663 ◽

Cited By ~ 109

Author(s):

Z.A. Taylor ◽

M. Cheng ◽

S. Ourselin

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Graphics Processing Units ◽

High Speed ◽

Surgical Simulation ◽

Nonlinear Finite Element Analysis ◽

Nonlinear Finite Element ◽

Element Analysis ◽

Graphics Processing

Download Full-text

Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration

Journal of Biomedical Optics ◽

10.1117/1.3041496 ◽

2008 ◽

Vol 13 (6) ◽

pp. 060504 ◽

Cited By ~ 232

Author(s):

Erik Alerstam ◽

Tomas Svensson ◽

Stefan Andersson-Engels

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Parallel Computing ◽

Graphics Processing Units ◽

High Speed ◽

Photon Migration ◽

Graphics Processing

Download Full-text

A GPU-accelerated image reduction pipeline

Publications of the Astronomical Society of Japan ◽

10.1093/pasj/psaa091 ◽

2020 ◽

Author(s):

Masafumi Niwano ◽

Katsuhiro L Murata ◽

Ryo Adachi ◽

Sili Wang ◽

Yutaro Tachibana ◽

...

Keyword(s):

Image Processing ◽

Graphics Processing Units ◽

High Speed ◽

Emission Measure ◽

Robotic Telescope ◽

Graphics Processing ◽

High Speed Image Processing ◽

Python Package ◽

Telescope System

Abstract We developed a high-speed image reduction pipeline using Graphics Processing Units (GPUs) as hardware accelerators. Astronomers desire to detect the emission measure counterpart of gravitational-wave sources as soon as possible and to share in the systematic follow-up observation. Therefore, high-speed image processing is important. We developed a new image-reduction pipeline for our robotic telescope system, which uses a GPU via the Python package CuPy for high-speed image processing. As a result, the new pipeline has increased in processing speed by more than 40 times compared with the current one, while maintaining the same functions.

Download Full-text

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

Journal of Information Processing Systems ◽

10.3745/jips.2012.8.1.159 ◽

2012 ◽

Vol 8 (1) ◽

pp. 159-174 ◽

Cited By ~ 6

Author(s):

Sang-Pil Lee ◽

Deok-Ho Kim ◽

Jae-Young Yi ◽

Won-Woo Ro

Keyword(s):

Graphics Processing Units ◽

Block Cipher ◽

Many Core ◽

Graphics Processing

Download Full-text

Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

PeerJ Computer Science ◽

10.7717/peerj-cs.185 ◽

2019 ◽

Vol 5 ◽

pp. e185 ◽

Cited By ~ 2

Author(s):

Mahdi Abbasi ◽

Razieh Tahouri ◽

Milad Rafiee

Keyword(s):

Graphics Processing Units ◽

High Speed ◽

Parallel Implementation ◽

Packet Classification ◽

Experimental Results ◽

Analysis Method ◽

Network Systems ◽

Computationally Intensive ◽

Bit Vector ◽

Graphics Processing

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls that enable different functionalities through discriminating incoming traffic. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software classifiers. The aggregated bit vector is a highly parallelizable packet classification algorithm. In this work, first we present a parallel kernel for running this algorithm on GPUs. Next, we adapt an asymptotic analysis method which predicts any empirical result of the proposed kernel. Experimental results not only confirm the efficiency of the proposed parallel kernel but also reveal the accuracy of the analysis method in predicting important trends in experimental results.

Download Full-text

High-speed simulation of the dynamic neural responses of retinal and cortical simple neurons to complex visual scenes using general purpose computing on graphics processing units.

Frontiers in Neuroinformatics ◽

10.3389/conf.fninf.2014.08.00074 ◽

2014 ◽

Vol 8 ◽

Author(s):

Shouno Osamu ◽

Tsujino Hiroshi

Keyword(s):

Graphics Processing Units ◽

High Speed ◽

General Purpose ◽

Neural Responses ◽

Visual Scenes ◽

Graphics Processing ◽

Speed Simulation

Download Full-text

LDPC Decoding on GPU for Mobile Device

Mobile Information Systems ◽

10.1155/2016/7048482 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Yiqin Lu ◽

Weiyue Su ◽

Jiancheng Qin

Keyword(s):

Mobile Device ◽

Graphics Processing Units ◽

High Speed ◽

Ldpc Decoder ◽

Ldpc Decoding ◽

Code Capacity ◽

The Cost ◽

Graphics Processing ◽

Video Calling ◽

Efficient Software

A flexible software LDPC decoder that exploits data parallelism for simultaneous multicode words decoding on the mobile device is proposed in this paper, supported by multithreading on OpenCL based graphics processing units. By dividing the check matrix into several parts to make full use of both the local memory and private memory on GPU and properly modify the code capacity each time, our implementation on a mobile phone shows throughputs above 100 Mbps and delay is less than 1.6 millisecond in decoding, which make high-speed communication like video calling possible. To realize efficient software LDPC decoding on the mobile device, the LDPC decoding feature on communication baseband chip should be replaced to save the cost and make it easier to upgrade decoder to be compatible with a variety of channel access schemes.

Download Full-text

High-speed Phase Recovery Using Chromatic Transport of Intensity Computation in Graphics Processing Units

Biomedical Optics and 3-D Imaging ◽

10.1364/biomed.2010.jma7 ◽

2010 ◽

Author(s):

Nick Loomis ◽

Laura Waller ◽

George Barbastathis

Keyword(s):

Graphics Processing Units ◽

High Speed ◽

Phase Recovery ◽

Graphics Processing

Download Full-text