scholarly journals Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension

Electronics ◽  
2018 ◽  
Vol 7 (9) ◽  
pp. 180 ◽  
Author(s):  
Javier Acevedo ◽  
Robert Scheffel ◽  
Simon Wunderlich ◽  
Mattis Hasler ◽  
Sreekrishna Pandi ◽  
...  

Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF ( 2 8 ) and GF ( 2 16 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Dau-Chyrh Chang ◽  
Lihong Zhang ◽  
Xiaoling Yang ◽  
Shao-Hsiang Yen ◽  
Wenhua Yu

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.


2021 ◽  
Vol 13 (7) ◽  
pp. 1367
Author(s):  
Yuanzhi Cai ◽  
Hong Huang ◽  
Kaiyang Wang ◽  
Cheng Zhang ◽  
Lei Fan ◽  
...  

Over the last decade, a 3D reconstruction technique has been developed to present the latest as-is information for various objects and build the city information models. Meanwhile, deep learning based approaches are employed to add semantic information to the models. Studies have proved that the accuracy of the model could be improved by combining multiple data channels (e.g., XYZ, Intensity, D, and RGB). Nevertheless, the redundant data channels in large-scale datasets may cause high computation cost and time during data processing. Few researchers have addressed the question of which combination of channels is optimal in terms of overall accuracy (OA) and mean intersection over union (mIoU). Therefore, a framework is proposed to explore an efficient data fusion approach for semantic segmentation by selecting an optimal combination of data channels. In the framework, a total of 13 channel combinations are investigated to pre-process data and the encoder-to-decoder structure is utilized for network permutations. A case study is carried out to investigate the efficiency of the proposed approach by adopting a city-level benchmark dataset and applying nine networks. It is found that the combination of IRGB channels provide the best OA performance, while IRGBD channels provide the best mIoU performance.


2021 ◽  
Vol 30 (1) ◽  
pp. 22-34
Author(s):  
Chawarat Rotejanaprasert ◽  
Duncan Lee ◽  
Nattwut Ekapirat ◽  
Prayuth Sudathip ◽  
Richard J Maude

In much of the Greater Mekong Sub-region, malaria is now confined to patches and small foci of transmission. Malaria transmission is seasonal with the spatiotemporal patterns being associated with variation in environmental and climatic factors. However, the possible effect at different lag periods between meteorological variables and clinical malaria has not been well studied in the region. Thus, in this study we developed distributed lagged modelling accounting for spatiotemporal excessive zero cases in a malaria elimination setting. A multivariate framework was also extended to incorporate multiple data streams and investigate the spatiotemporal patterns from multiple parasite species via their lagged association with climatic variables. A simulation study was conducted to examine robustness of the methodology and a case study is provided of weekly data of clinical malaria cases at sub-district level in Thailand.


2012 ◽  
Vol 2 (1) ◽  
pp. 1-18 ◽  
Author(s):  
P. Grabher ◽  
J. Großschädl ◽  
S. Hoerder ◽  
K. Järvinen ◽  
D. Page ◽  
...  

2016 ◽  
Vol 18 (6) ◽  
pp. 1149-1162 ◽  
Author(s):  
Jin Wang ◽  
Jianping Wang ◽  
Kejie Lu ◽  
Yi Qian ◽  
Naijie Gu

Sign in / Sign up

Export Citation Format

Share Document