Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension

Javier Acevedo; Robert Scheffel; Simon Wunderlich; Mattis Hasler; Sreekrishna Pandi; Juan Cabrera; Frank Fitzek; Gerhard Fettweis; Martin Reisslein

doi:10.3390/electronics7090180

Hardware Acceleration for RLNC: A Case Study Based on the Xtensa Processor with the Tensilica Instruction-Set Extension

Electronics ◽

10.3390/electronics7090180 ◽

2018 ◽

Vol 7 (9) ◽

pp. 180 ◽

Cited By ~ 2

Author(s):

Javier Acevedo ◽

Robert Scheffel ◽

Simon Wunderlich ◽

Mattis Hasler ◽

Sreekrishna Pandi ◽

...

Keyword(s):

Hardware Acceleration ◽

Code Word ◽

Instruction Set ◽

Linear Network ◽

Galois Fields ◽

Linear Network Coding ◽

Multiple Data ◽

Instruction Set Extension ◽

Energy Constrained

Random linear network coding (RLNC) can greatly aid data transmission in lossy wireless networks. However, RLNC requires computationally complex matrix multiplications and inversions in finite fields (Galois fields). These computations are highly demanding for energy-constrained mobile devices. The presented case study evaluates hardware acceleration strategies for RLNC in the context of the Tensilica Xtensa LX5 processor with the tensilica instruction set extension (TIE). More specifically, we develop TIEs for multiply-accumulate (MAC) operations for accelerating matrix multiplications in Galois fields, single instruction multiple data (SIMD) instructions operating on consecutive memory locations, as well as the flexible-length instruction extension (FLIX). We evaluate the number of clock cycles required for RLNC encoding and decoding without and with the MAC, SIMD, and FLIX acceleration strategies. We also evaluate the RLNC encoding and decoding throughput and energy consumption for a range of RLNC generation and code word sizes. We find that for GF ( 2 8 ) and GF ( 2 16 ) RLNC encoding, the SIMD and FLIX acceleration strategies achieve speedups of approximately four hundred fold compared to a benchmark C code implementation without TIE. We also find that the unicore Xtensa LX5 with SIMD has seven to thirty times higher RLNC encoding and decoding throughput than the state-of-the-art ODROID XU3 system-on-a-chip (SoC) operating with a single core; the Xtensa LX5 with FLIX, in turn, increases the throughput by roughly 25% compared to utilizing only SIMD. Furthermore, the Xtensa LX5 with FLIX consumes roughly four orders of magnitude less energy than the ODROID XU3 SoC.

Download Full-text

A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

International Journal of Antennas and Propagation ◽

10.1155/2012/851465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Dau-Chyrh Chang ◽

Lihong Zhang ◽

Xiaoling Yang ◽

Shao-Hsiang Yen ◽

Wenhua Yu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Computer Cluster ◽

Simulation Performance ◽

Acceleration Technique ◽

Multiple Data ◽

Difference Time

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.

Download Full-text

Instruction set extension and hardware acceleration for SVM application toward a vector processor

2017 International SoC Design Conference (ISOCC) ◽

10.1109/isocc.2017.8368818 ◽

2017 ◽

Author(s):

Yalong Pang ◽

Jun Han ◽

Jianmin Zeng ◽

Yujie Huang ◽

Xiaoyang Zeng

Keyword(s):

Hardware Acceleration ◽

Instruction Set ◽

Vector Processor ◽

Instruction Set Extension

Download Full-text

The Failure Probabilities of Random Linear Network Coding at Sink Nodes

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e99.a.1255 ◽

2016 ◽

Vol E99.A (6) ◽

pp. 1255-1259 ◽

Cited By ~ 1

Author(s):

Dan LI ◽

Xuan GUANG ◽

Fang-Wei FU

Keyword(s):

Network Coding ◽

Linear Network ◽

Linear Network Coding ◽

Random Linear Network Coding ◽

Failure Probabilities

Download Full-text

A case study of VAX-11 instruction set usage for compiler execution

ACM SIGARCH Computer Architecture News ◽

10.1145/964750.801841 ◽

1982 ◽

Vol 10 (2) ◽

pp. 177-184 ◽

Cited By ~ 2

Author(s):

Cheryl A. Wiecek

Keyword(s):

Instruction Set

Download Full-text

Selecting Optimal Combination of Data Channels for Semantic Segmentation in City Information Modelling (CIM)

Remote Sensing ◽

10.3390/rs13071367 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1367

Author(s):

Yuanzhi Cai ◽

Hong Huang ◽

Kaiyang Wang ◽

Cheng Zhang ◽

Lei Fan ◽

...

Keyword(s):

Large Scale ◽

Semantic Segmentation ◽

Optimal Combination ◽

Reconstruction Technique ◽

Process Data ◽

Redundant Data ◽

Multiple Data ◽

Information Models ◽

Efficient Data

Over the last decade, a 3D reconstruction technique has been developed to present the latest as-is information for various objects and build the city information models. Meanwhile, deep learning based approaches are employed to add semantic information to the models. Studies have proved that the accuracy of the model could be improved by combining multiple data channels (e.g., XYZ, Intensity, D, and RGB). Nevertheless, the redundant data channels in large-scale datasets may cause high computation cost and time during data processing. Few researchers have addressed the question of which combination of channels is optimal in terms of overall accuracy (OA) and mean intersection over union (mIoU). Therefore, a framework is proposed to explore an efficient data fusion approach for semantic segmentation by selecting an optimal combination of data channels. In the framework, a total of 13 channel combinations are investigated to pre-process data and the encoder-to-decoder structure is utilized for network permutations. A case study is carried out to investigate the efficiency of the proposed approach by adopting a city-level benchmark dataset and applying nine networks. It is found that the combination of IRGB channels provide the best OA performance, while IRGBD channels provide the best mIoU performance.

Download Full-text

Spatiotemporal distributed lag modelling of multiple Plasmodium species in a malaria elimination setting

Statistical Methods in Medical Research ◽

10.1177/0962280220938977 ◽

2021 ◽

Vol 30 (1) ◽

pp. 22-34

Author(s):

Chawarat Rotejanaprasert ◽

Duncan Lee ◽

Nattwut Ekapirat ◽

Prayuth Sudathip ◽

Richard J Maude

Keyword(s):

Data Streams ◽

Malaria Elimination ◽

Parasite Species ◽

Climatic Factors ◽

Plasmodium Species ◽

Clinical Malaria ◽

Spatiotemporal Patterns ◽

Distributed Lag ◽

Multiple Data

In much of the Greater Mekong Sub-region, malaria is now confined to patches and small foci of transmission. Malaria transmission is seasonal with the spatiotemporal patterns being associated with variation in environmental and climatic factors. However, the possible effect at different lag periods between meteorological variables and clinical malaria has not been well studied in the region. Thus, in this study we developed distributed lagged modelling accounting for spatiotemporal excessive zero cases in a malaria elimination setting. A multivariate framework was also extended to incorporate multiple data streams and investigate the spatiotemporal patterns from multiple parasite species via their lagged association with climatic variables. A simulation study was conducted to examine robustness of the methodology and a case study is provided of weekly data of clinical malaria cases at sub-district level in Thailand.

Download Full-text

Instruction-set-extension exploration using decomposable heuristic search

19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06) ◽

10.1109/vlsid.2006.106 ◽

2006 ◽

Author(s):

S. Das ◽

P.P. Chakrabarti ◽

P. Dasgupta

Keyword(s):

Heuristic Search ◽

Instruction Set ◽

Instruction Set Extension

Download Full-text

Modeling the Reliability of Complex Systems with Multiple Data Sources: A Case Study on Making Statistical Tools Accessible to Engineers

Quality Engineering ◽

10.1080/08982112.2012.641152 ◽

2012 ◽

Vol 24 (2) ◽

pp. 280-291 ◽

Cited By ~ 6

Author(s):

Christine M. Anderson-Cook ◽

Richard M. Klamann ◽

Jerome Morzinski

Keyword(s):

Complex Systems ◽

Data Sources ◽

Statistical Tools ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

An exploration of mechanisms for dynamic cryptographic instruction set extension

Journal of Cryptographic Engineering ◽

10.1007/s13389-011-0025-8 ◽

2012 ◽

Vol 2 (1) ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

P. Grabher ◽

J. Großschädl ◽

S. Hoerder ◽

K. Järvinen ◽

D. Page ◽

...

Keyword(s):

Instruction Set ◽

Instruction Set Extension

Download Full-text

On the Optimal Linear Network Coding Design for Information Theoretically Secure Unicast Streaming

IEEE Transactions on Multimedia ◽

10.1109/tmm.2016.2545403 ◽

2016 ◽

Vol 18 (6) ◽

pp. 1149-1162 ◽

Cited By ~ 6

Author(s):

Jin Wang ◽

Jianping Wang ◽

Kejie Lu ◽

Yi Qian ◽

Naijie Gu

Keyword(s):

Network Coding ◽

Linear Network ◽

Linear Network Coding ◽

Optimal Linear

Download Full-text