VLSI design of inner-product computers using distributed arithmetic

2016 ◽

Vol 5 (2) ◽

pp. 124

Author(s):

P. Hemanthkumar ◽

Y. Sai Kiran ◽

V. Nava Teja

Keyword(s):

Performance Metrics ◽

Finite Impulse Response ◽

Vlsi Design ◽

Fir Filter ◽

Inner Product ◽

Distributed Arithmetic ◽

Power Efficient ◽

Consumption Energy ◽

The Look ◽

Suitable Area

<p>Here, we exhibit the design optimization of one- and two-dimensional fully-pipelined computing structures for area-delay-power-efficient implementation of finite impulse response (FIR) filter by systolic decomposition of distributed arithmetic (DA)-based inner-product computation. This plan is found to offer a flexible choice of the address length of the look-up-tables (LUT) for DA-based computation to determine suitable area-time trade-off. It is seen that by using smaller address-lengths for DA-based computing units, it is possible to decrease the memory-size but on the other side that leads to increase of adder complexity and the latency. For efficient DA-based realization of FIR filters of different orders, the flexible linear systolic design is implemented on a Xilinx Virtex-E XCV2000E FPGA using a hybrid combination of Handel-C and parameterizable VHDL cores. Various key performance metrics such as number of slices, maximum usable frequency, dynamic power consumption, energy density and energy throughput are estimated for different filter orders and address-lengths. Obtained results on analysis shows that performance metrics of the proposed implementation is broadly in line with theoretical expectations. We have seen that the choice of address-length M=4 gives the best of area-delay power-efficient realizations of the FIR filter for different filter orders. Moreover, the proposed FPGA implementation is found to involve significantly less area-delay complexity compared with the existing DA-based implementations of FIR filter.</p>

Download Full-text

A VLSI design methodology for distributed arithmetic

The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology ◽

10.1007/bf00925468 ◽

1991 ◽

Vol 2 (4) ◽

pp. 235-252 ◽

Cited By ~ 24

Author(s):

Wayne P. Burleson ◽

Louis L. Scharf

Keyword(s):

Design Methodology ◽

Vlsi Design ◽

Distributed Arithmetic ◽

Vlsi Design Methodology

Download Full-text

A modular distributed arithmetic implementation of inner product including quantization

Proceedings of the 33rd Midwest Symposium on Circuits and Systems ◽

10.1109/mwscas.1990.140850 ◽

2002 ◽

Author(s):

A.S. de la Vega ◽

P.S.R. Diniz ◽

A.C. Mesquita

Keyword(s):

Inner Product ◽

Distributed Arithmetic

Download Full-text

Low Power High Throughput Memory Less Adaptive Filter using Distributed Arithmetic

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9219.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3381-3384

Keyword(s):

Power Consumption ◽

Low Power ◽

High Throughput ◽

Adaptive Filter ◽

Inner Product ◽

Filter Coefficient ◽

Distributed Arithmetic ◽

Area Efficient

This paper briefs an area efficient, low power and high throughput LMS adaptive filter using Distributed Arithmetic architecture. The throughput is increased because of parallel updating of filter coefficient and computing the inner product simultaneously. Here we have proposed memory-less design of distributed arithmetic (MLDA) unit. The proposed design uses 2:1 multiplexer’s architecture to replace LUT of the conventional DA to reduce the overall area of the filter. Enhanced compressor adder is used for accumulation of the partial products, which further helps to reduce the area. Parallel updating of the generation and accumulation enhance the throughput of the design. The proposed architecture requires more than half area that required for the existing LUT based inner product block. The proposed design is implemented in synopsis design compiler and the result shows that the area decreased by 52.7% and also the MUX based DA for the Adaptive filter causes 69.25% less power consumption for filter tap N=16, 32 and 64. Proposed design provides 36.50% less Area Delay Product (ADP).

Download Full-text