An Implementation of Configurable SIMD Core on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.336-338.1925 ◽

2013 ◽

Vol 336-338 ◽

pp. 1925-1929

Author(s):

Guang Wang ◽

Yin Sheng Gao

Keyword(s):

Wireless Communications ◽

Data Processing ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Instruction Set Architecture ◽

Multiple Data ◽

4G Wireless ◽

Main Components ◽

Computing Speed

In order to meet the computing speed required by 4G wireless communications, and to provide the different data processing widths required by different algorithms, an SIMD (Single Instruction Multiple Data) core has been designed. The ISA (Instruction Set Architecture) and main components of the SIMD core are discussed focus on how the SIMD core can be configured. Finally, the simulation result of the multiplication of two 8*8 matrices is presented to show the execution of instructions in the proposed SIMD core, and the result verifies the correctness of the SIMD core design.

Download Full-text

A Video Specific Instruction Set Architecture for ASIP design

VLSI Design ◽

10.1155/2007/58431 ◽

2007 ◽

Vol 2007 ◽

pp. 1-7 ◽

Cited By ~ 5

Author(s):

Zheng Shen ◽

Hu He ◽

Yanjun Zhang ◽

Yihe Sun

Keyword(s):

Video Coding ◽

Digital Signal ◽

Digital Signal Processors ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Instruction Set Architecture ◽

Specific Instruction ◽

Multiple Data ◽

Signal Processors

This paper describes a novel video specific instruction set architecture for ASIP design. With single instruction multiple data (SIMD) instructions, two destination modes, and video specific instructions, an instruction set architecture is introduced to enhance the performance for video applications. Furthermore, we quantify the improvement on H.263 encoding. In this paper, we evaluate and compare the performance of VS-ISA, other DSPs (digital signal processors), and conventional SIMD media extensions in the context of video coding. Our evaluation results show that VS-ISA improves the processor's performance by approximately 5x on H.263 encoding, and VS-ISA outperforms other architectures by 1.6x to 8.57x in computing IDCT.

Download Full-text

A scalable ASIP for BP Polar decoding with multiple code lengths

MATEC Web of Conferences ◽

10.1051/matecconf/201823201046 ◽

2018 ◽

Vol 232 ◽

pp. 01046

Author(s):

Wan Qiao ◽

Dake Liu

Keyword(s):

Cmos Technology ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Maximum Throughput ◽

Specific Instruction ◽

Area Efficiency ◽

Multiple Data ◽

High Area ◽

Multiple Code ◽

Application Specific

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.

Download Full-text

A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

International Journal of Antennas and Propagation ◽

10.1155/2012/851465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Dau-Chyrh Chang ◽

Lihong Zhang ◽

Xiaoling Yang ◽

Shao-Hsiang Yen ◽

Wenhua Yu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Computer Cluster ◽

Simulation Performance ◽

Acceleration Technique ◽

Multiple Data ◽

Difference Time

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.

Download Full-text

An Efficient Implementation of Semi-numerical Computation of the Hartree-Fock Exchange Matrix on the Intel Phi Processor

10.26434/chemrxiv.5639950.v1 ◽

2017 ◽

Author(s):

fenglai liu ◽

Jing Kong

Keyword(s):

Data Processing ◽

Numerical Computation ◽

Basis Set ◽

Single Instruction Multiple Data ◽

Processing Unit ◽

Efficient Utilization ◽

Data Processing Unit ◽

Hartree Fock ◽

Multiple Data ◽

Exchange Matrix

In this work we present an efficient semi-numerical integral implementation specially designed for the Intel Phi processor to calculate the Hartree-Fock exchange matrix and the energy. Compared with the implementation for the CPU platform, to achieve a productive implementation one needs to focus on the efficient utilization of the SIMD(Single instruction, multiple data) processing unit and maximum cache usage in the Phi processor. For evaluating the efficiency of the implementation, we performed benchmark calculations on the buckyball molecules C60, C100, C180 and C240. For the calculations with basis set 6-311G(2df) and cc-pvtz the benchmark test shows 7-12 times of speedup on the Knight Landing Phi processor 7250 in comparison with traditional four-center electron repulsion integral calculation performed on a six-core Xeon E5-1650 CPU.<br>

Download Full-text

SIMD (Single Instruction Multiple Data Processing)

10.1007/springerreference_73063 ◽

2012 ◽

Keyword(s):

Data Processing ◽

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

SIMD (Single Instruction Multiple Data) Processing

Encyclopedia of Multimedia ◽

10.1007/0-387-30038-4_226 ◽

2006 ◽

pp. 818-819

Keyword(s):

Data Processing ◽

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

SIMD (Single Instruction Multiple Data Processing)

Encyclopedia of Multimedia ◽

10.1007/978-0-387-78414-4_220 ◽

2008 ◽

pp. 817-819 ◽

Cited By ~ 1

Keyword(s):

Data Processing ◽

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

Addressing Mode and Bit Extensions to the Thumb-2 Instruction Set Architecture

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2021.5.2.308 ◽

2021 ◽

Vol 5 (2) ◽

pp. 13-17

Author(s):

Dae-Hwan Kim

Keyword(s):

Data Processing ◽

Embedded Processors ◽

Instruction Set ◽

Instruction Set Architecture ◽

Type Conversion ◽

Processing Instruction ◽

Aggregated Data ◽

Processing Operation ◽

Speed Up ◽

Zero Extension

Thumb-2 is the most recent instruction set architecture for ARM processors which are one of the most widely used embedded processors. In this paper, two extensions are proposed to improve the performance of the Thumb-2 instruction set architecture, which are addressing mode extensions and sign/zero extensions combined with data processing instructions. To speed up access to an element of an aggregated data, the proposed approach first introduces three new addressing modes for load and store instructions. They are register-plus-immediate offset addressing mode, negative register offset addressing mode, and post-increment register offset addressing mode. Register-plus-immediate offset addressing mode permits two offsets and negative register offset allows offset to be a negative value of a register content. Post-increment register offset mode automatically modifies the offset address after the memory operation. The second is the sign/zero extension combined with a data processing instruction which allows the result of a data processing operation to be sign/zero extended to accelerate a type conversion. Several least frequently used instructions are reduced to provide the encoding space for the new extensions. Experiments show that the proposed approach improves performance by an average of 8.6% when compared to the Thumb-2 instruction set architecture.

Download Full-text