subword parallelism
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Shafqat Khan ◽  
Emmanuel Casseau ◽  
Daniel Menard

In this paper an efficient two-dimensional discrete cosine transform (DCT) operator is proposed for multimedia applications. It is based on the DCT operator proposed in Kovac and Ranganathan, 1995. Speed-up is obtained by using multimedia oriented subword parallelism (SWP). Rather than operating on a single pixel, the SWP-based DCT operator performs parallel computations on multiple pixels packed in word size input registers so that the performance of the operator is increased. Special emphasis is made to increase the coordination between pixel sizes and subword sizes to maximize resource utilization rate. Rather than using classical subword sizes (8, 16, and 32 bits), multimedia oriented subword sizes (8, 10, 12, and 16 bits) are used in the proposed DCT operator. The proposed SWP DCT operator unit can be used as a coprocessor for multimedia applications.


Author(s):  
Sanket Dessai ◽  
Krishna Bhushan Vutukuru

Graphical Processing Units (GPUs) have become an integral part of today’s mainstream computing systems. They are also being used as reprogrammable General Purpose GPUs (GP-GPUs) to perform complex scientific computations. Reconfigurability is an attractive approach to embedded systems allowing hardware level modification. Hence, there is a high demand for GPU designs based on reconfigurable hardware. Stream processor consists of clusters of functional units which provide a bandwidth hierarchy, supporting hundreds of arithmetic units. The arithmetic cluster units are designed to exploit instruction level parallelism and subword parallelism within a cluster and data parallelism across the clusters.For decreasing the area and power, a single controller is used to control data flow between clusters and between host processor and GPU. The designed of stream processor unit has been carried out in Verilog on Altera Quartus II and simulated using ModelSim tools. The functionality of the modelled blocks is verified using test inputs in the simulator.The simulated execution time of 8-bit pipelined multiplier is 60 ps and 100 ns for 8-bit pipelined adder while operating at 90 MHz.


1999 ◽  
Vol 45 (10) ◽  
pp. 781-790 ◽  
Author(s):  
Mark Christiaens ◽  
Bjorn De Sutter ◽  
Koen De Bosschere ◽  
Jan Van Campenhout ◽  
Ignace Lemahieu
Keyword(s):  

1998 ◽  
Vol 24 (9-10) ◽  
pp. 1537-1556 ◽  
Author(s):  
Bjorn De Sutter ◽  
Mark Christiaens ◽  
Koen De Bosschere ◽  
Jan Van Campenhout

Sign in / Sign up

Export Citation Format

Share Document