A high-performance low-power asynchronous matrix-vector multiplier for discrete cosine transform

The rapid advancements in technology in recent years have led to a massive increase in the exchange of data (images, videos, audio, etc.) between portable devices. This has invoked the necessity for building algorithms which consume low power with no compromise in the performance. In this paper, the above captioned issue is taken into account and accordingly an image compression technique using Repetitive Iteration CORDIC (RICO) architecture has been proposed. The proposed method is power efficient as it uses RICO for Discrete Cosine Transform (DCT) coefficient generation, and performs equally well when compared to Joint Photographic Experts Group (JPEG) standard. Results have been obtained via Matrix Laboratory (MATLAB) and they show that the proposed technique performs equally well and consumes less power in comparison with the other techniques.

Download Full-text

An asynchronous matrix-vector multiplier for discrete cosine transform

ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514) ◽

10.1109/lpe.2000.155295 ◽

2000 ◽

Author(s):

Kyeounsoo Kim ◽

P.A. Beerel ◽

Youpyo Hong

Keyword(s):

Discrete Cosine Transform ◽

Cosine Transform ◽

Matrix Vector

Download Full-text

Accelerating IDCT Algorithm on Xeon Phi Coprocessor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3114 ◽

2013 ◽

Vol 756-759 ◽

pp. 3114-3120

Author(s):

Jin Qi ◽

Can Qun Yang ◽

Cheng Chen ◽

Qiang Wu ◽

Tao Tang

Keyword(s):

Discrete Cosine Transform ◽

High Performance ◽

Xeon Phi ◽

Cosine Transform ◽

Inverse Discrete Cosine Transform ◽

The Many ◽

Many Integrated Core ◽

Beta Version ◽

Very High ◽

Intel Xeon

Inverse Discrete Cosine Transform (IDCT) is an important operation for image and videos decompression. How to accelerate the IDCT algorithm has been frequently studied. Recently Intel has proposed Xeon Phi coprocessors based on the many integrated core (MIC) architecture. Xeon Phi is integrated with 61 cores and 512-bit SIMD extension within each core, thus providing very high performance. In this paper, we employ the Knights Corner (a beta version of Xeon Phi) to accelerate the IDCT algorithm. By employing the 512-bit SIMD instruction and data pre-fetching optimization, our implementation achieves (1) averagely 5.82 speedup over the none-SIMD version, (2) averagely 27.3% performance benefit with the data pre-fetching optimization, and (3) averagely 1.53 speedup on one Knights Corner coprocessor over the implementation on one octal-core Intel Xeon E5-2670 CPU.

Download Full-text