A unified VLSI algorithm for a high performance systolic array implementation of type IV DCT/DST

Author(s):  
Doru Florin Chiper ◽  
M. O. Ahmad ◽  
M. N. S. Swamy
2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Doru Chiper

AbstractA new VLSI algorithm and its associated systolic array architecture for a prime length type IV discrete cosine transform is presented. They represent the basis of an efficient design approach for deriving a linear systolic array architecture for type IV DCT. The proposed algorithm uses a regular computational structure called pseudoband correlation structure that is appropriate for a VLSI implementation. The proposed algorithm is then mapped onto a linear systolic array with a small number of I/O channels and low I/O bandwidth. The proposed architecture can be unified with that obtained for type IV DST due to a similar kernel. A highly efficient VLSI chip can be thus obtained with good performance in the architectural topology, computing parallelism, processing speed, hardware complexity and I/O costs similar to those obtained for circular correlation and cyclic convolution computational structures.


1992 ◽  
Vol 02 (03) ◽  
pp. 247-263
Author(s):  
CHEIN-WEI JEN ◽  
CHI-MIN LIU

Two-level pipelined systolic array can attain parallelism down to lower levels and provide much higher throughput and computational speed than conventional ones. This paper presents a design procedure starting from an algorithm representation, called Dependence Graph (DG). Arrays with different performances can be obtained by applying the various linear transformation matrices on DG. Image resampling is a process for image construction and display. It has important applications in image processing or in digital TV. In this paper, two design considerations are applied to build high-performance VLSI image resampler. First, two-level pipelined systolic array is designed to maximize parallelism and also make VLSI implementation highly feasible. Second, a modified two-pass resampling scheme is devised to reduce the amount of required storage and increase the concurrency between two passes of resampling. This image resampler can get a throughput of one pixel per clock period being smaller than the latency of an adder. The requirement for storage is only several line buffers.


Sign in / Sign up

Export Citation Format

Share Document