scholarly journals LDPC Decoding on GPU for Mobile Device

2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Yiqin Lu ◽  
Weiyue Su ◽  
Jiancheng Qin

A flexible software LDPC decoder that exploits data parallelism for simultaneous multicode words decoding on the mobile device is proposed in this paper, supported by multithreading on OpenCL based graphics processing units. By dividing the check matrix into several parts to make full use of both the local memory and private memory on GPU and properly modify the code capacity each time, our implementation on a mobile phone shows throughputs above 100 Mbps and delay is less than 1.6 millisecond in decoding, which make high-speed communication like video calling possible. To realize efficient software LDPC decoding on the mobile device, the LDPC decoding feature on communication baseband chip should be replaced to save the cost and make it easier to upgrade decoder to be compatible with a variety of channel access schemes.

Nanophotonics ◽  
2020 ◽  
Vol 9 (13) ◽  
pp. 4097-4108 ◽  
Author(s):  
Moustafa Ahmed ◽  
Yas Al-Hadeethi ◽  
Ahmed Bakry ◽  
Hamed Dalir ◽  
Volker J. Sorger

AbstractThe technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPU). However, electronics systems are limited with respect to power dissipation and delay, due to wire-charging challenges related to interconnect capacitance. Here we present a silicon photonics-based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently by executing the convolution as a multiplication in the Fourier-domain. The algorithmic executing time is determined by the time-of-flight of the signal through this photonic reconfigurable passive FFT ‘filter’ circuit and is on the order of 10’s of picosecond short. A sensitivity analysis shows that this optical processor must be thermally phase stabilized corresponding to a few degrees. Furthermore, we find that for a small sample number, the obtainable number of convolutions per {time, power, and chip area) outperforms GPUs by about two orders of magnitude. Lastly, we show that, conceptually, the optical FFT and convolution-processing performance is indeed directly linked to optoelectronic device-level, and improvements in plasmonics, metamaterials or nanophotonics are fueling next generation densely interconnected intelligent photonic circuits with relevance for edge-computing 5G networks by processing tensor operations optically.


Author(s):  
Masafumi Niwano ◽  
Katsuhiro L Murata ◽  
Ryo Adachi ◽  
Sili Wang ◽  
Yutaro Tachibana ◽  
...  

Abstract We developed a high-speed image reduction pipeline using Graphics Processing Units (GPUs) as hardware accelerators. Astronomers desire to detect the emission measure counterpart of gravitational-wave sources as soon as possible and to share in the systematic follow-up observation. Therefore, high-speed image processing is important. We developed a new image-reduction pipeline for our robotic telescope system, which uses a GPU via the Python package CuPy for high-speed image processing. As a result, the new pipeline has increased in processing speed by more than 40 times compared with the current one, while maintaining the same functions.


2019 ◽  
Vol 5 ◽  
pp. e185 ◽  
Author(s):  
Mahdi Abbasi ◽  
Razieh Tahouri ◽  
Milad Rafiee

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls that enable different functionalities through discriminating incoming traffic. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software classifiers. The aggregated bit vector is a highly parallelizable packet classification algorithm. In this work, first we present a parallel kernel for running this algorithm on GPUs. Next, we adapt an asymptotic analysis method which predicts any empirical result of the proposed kernel. Experimental results not only confirm the efficiency of the proposed parallel kernel but also reveal the accuracy of the analysis method in predicting important trends in experimental results.


2021 ◽  
Vol 13 (2) ◽  
pp. 7
Author(s):  
Maria Pantoja

Currently, practical network packet processing used for In-trusion Detection Systems/Intrusion Prevention Systems (IDS/IPS) tendto belong to one of two disjoint categories: software-only implementa-tions running on general-purpose CPUs, or highly specialized networkhardware implementations using ASICs or FPGAs for the most commonfunctions, general-purpose CPUs for the rest. These approaches cover tryto maximize the performance and minimize the cost, but neither system,when implemented effectively, is affordable to any clients except for thoseat the well-funded enterprise level. In this paper, we aim to improve theperformance of affordable network packet processing in heterogeneoussystems with consumer Graphics Processing Units (GPUs) hardware byoptimizing latency-tolerant packet processing operations, notably IDS,to obtain maximum throughput required by such systems in networkssophisticated enough to demand a dedicated IDS/IPS system, but notenough to justify the high cost of cutting-edge specialized hardware. Inparticular, this project investigated increasing the granularity of OSIlayer-based packet batching over that of previous batching approaches.We demonstrate that highly granular GPU-enabled packet processing isgenerally impractical, compared with existing methods, by implementingour own solution that we call Corvyd, a heterogeneous real-time packetprocessing engine.


Author(s):  
Hua He ◽  
Jimmy Lin ◽  
Adam Lopez

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.


Sign in / Sign up

Export Citation Format

Share Document