ANALYSIS OF EFFECTS OF USING 9/7 WAVELET COEFFICIENTS IN MULTI-RESOLUTION ANALYSIS

The most common technique used for image processing applications is ‘The wavelet transformation’. The Discrete Wavelet Transform (DWT) keeps the time as well as frequency information depend on a multi resolution analysis structure, where the other classical transforms like Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) will not do that. Because of this feature, the quality of the repaired image is improved when comparing to the other transforms. To implement the DWT on a real time codec, a fast device needs to be targeted. While comparing with the other implementation such as PCs, ARM processors, DSPs etc, Field Programmable Gate Array (FPGA) implementation of DWT had better processing speed and costs were vey less. A Fast Architecture based DWT using Kogge Stone Adder is proposed in this paper where the coefficients of lifting scheme are calculated by using Shift adder and Kogge Stone Adder where other techniques used multiplier. The most important intention of the suggested technique is to use minimum calculation and limited memory. The simulation of the suggested design is dole out on the Xilinx 14.1 style tool and also the performance is evaluated and compared with the present architectures.

Download Full-text

Fifty years of Electronic Hardware Implementations of First and Higher Order Neural Networks

Artificial Higher Order Neural Networks for Computer Science and Engineering ◽

10.4018/978-1-61520-711-4.ch012 ◽

2010 ◽

pp. 269-285 ◽

Cited By ~ 3

Author(s):

David R. Selviah ◽

Janti Shawash

Keyword(s):

Neural Networks ◽

Real Time ◽

High Speed ◽

Higher Order ◽

Low Latency ◽

Real Time Control ◽

Practical Applications ◽

Field Programmable ◽

On Chip ◽

Electronic Hardware

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.

Download Full-text

A Low Power Shift Add Multiplier for Lifting Based Dwt using Kogge Stone Adder

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6211.029320 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1080-1086

Author(s):

A. Akilandeswari ◽

◽

Annie Grace Vimala ◽

D. Sungeetha ◽

◽

...

Keyword(s):

The Other ◽

Discrete Wavelet ◽

Limited Memory ◽

Power Shift ◽

Multi Resolution Analysis ◽

Suggested Technique ◽

Field Programmable ◽

Resolution Analysis ◽

Common Technique

The most common technique used for image processing applications is ‘The wavelet transformation’. The Discrete Wavelet Transform (DWT) keeps the time as well as frequency information depend on a multi resolution analysis structure, where the other classical transforms like Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) will not do that. Because of this feature, the quality of the repaired image is improved when comparing to the other transforms. To implement the DWT on a real time codec, a fast device needs to be targeted. While comparing with the other implementation such as PCs, ARM processors, DSPs etc, Field Programmable Gate Array (FPGA) implementation of DWT had better processing speed and costs were vey less. A Fast Architecture based DWT using Kogge Stone Adder is proposed in this paper where the coefficients of lifting scheme are calculated by using Shift adder and Kogge Stone Adder where other techniques used multiplier. The most important intention of the suggested technique is to use minimum calculation and limited memory. The simulation of the suggested design is dole out on the Xilinx 14.1 style tool and also the performance is evaluated and compared with the present architectures.

Download Full-text

Design of an Area-Efficient Various N-point Support radix-2/22 FFT using Modified Butterfly Units

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8604.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10189-10198 ◽

Cited By ~ 3

Keyword(s):

High Speed ◽

Complex Multiplication ◽

Cmos Technology ◽

Hardware Complexity ◽

Processing Application ◽

Point Support ◽

Multiplication Operation ◽

Field Programmable ◽

Cadence Tool ◽

Area Efficient

Fast Fourier Transform (FFT) acts as an element in the high-speed signal processing application, which involves the following subsequent operations, namely complex addition, complex subtraction and complex multiplication. Due to the complex multiplication operation, the FFT structures lead to more hardware demand. Hence, this work introduces an area-efficient various N-point support radix-2 and radix-22 FFT structure by using proposed modified butterfly units and radix-2/22 butterfly unit. The proposed modified butterfly units are used to reduce the number of complex multipliers effectively. For this reason, it is using for certain conditions in FFT design instead of existing radix-2/22 butterfly unit. Further, the proposed design supported to perform various size of FFT in a single architecture without increasing the extra element demand. Moreover, the proposed FFT structure designed and implemented using a Xilinx Virtex-6 Field-Programmable Gate Array (FPGA) device (6vcx75tff484-2) and Cadence tool with 45nm CMOS technology. The implementation results demonstrate that the proposed N-point (N=16, 32 and 64) DIF-FFT design attains the less hardware complexity when compared with existing multi-mode FFT design. Then the proposed area-efficient 16-point, 32-point and 64-point radix-2 FFT architectures reduce the total area by 20.99%, 11% and 4.9% respectively. As well, the proposed area-efficient 16-point, 32-point and 64-point radix-22 FFT architectures reduce the total area by 32%, 19% and 11% respectively.

Download Full-text

Deploying Multi-tenant FPGAs within Linux-based Cloud Infrastructure

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3474058 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-31

Author(s):

Joel Mandebi Mbongue ◽

Danielle Tchuinkou Kwadjo ◽

Alex Shuping ◽

Christophe Bobda

Keyword(s):

Software Architecture ◽

Hardware Acceleration ◽

Maximum Frequency ◽

Cloud Infrastructure ◽

Fpga Design ◽

Data Movement ◽

Field Programmable ◽

Minimal Data ◽

On Chip ◽

Cloud Users

Cloud deployments now increasingly exploit Field-Programmable Gate Array (FPGA) accelerators as part of virtual instances. While cloud FPGAs are still essentially single-tenant, the growing demand for efficient hardware acceleration paves the way to FPGA multi-tenancy. It then becomes necessary to explore architectures, design flows, and resource management features that aim at exposing multi-tenant FPGAs to the cloud users. In this article, we discuss a hardware/software architecture that supports provisioning space-shared FPGAs in Kernel-based Virtual Machine (KVM) clouds. The proposed hardware/software architecture introduces an FPGA organization that improves hardware consolidation and support hardware elasticity with minimal data movement overhead. It also relies on VirtIO to decrease communication latency between hardware and software domains. Prototyping the proposed architecture with a Virtex UltraScale+ FPGA demonstrated near specification maximum frequency for on-chip data movement and high throughput in virtual instance access to hardware accelerators. We demonstrate similar performance compared to single-tenant deployment while increasing FPGA utilization, which is one of the goals of virtualization. Overall, our FPGA design achieved about 2× higher maximum frequency than the state of the art and a bandwidth reaching up to 28 Gbps on 32-bit data width.

Download Full-text

xDNN: Inference for Deep Convolutional Neural Networks

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3473334 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-29

Author(s):

Paolo D'Alberto ◽

Victor Wu ◽

Aaron Ng ◽

Rahul Nimaiyar ◽

Elliott Delaye ◽

...

Keyword(s):

Neural Networks ◽

Power Efficiency ◽

Digital Signal ◽

Fpga Design ◽

Deep Convolutional Neural Networks ◽

Parametric Function ◽

Field Programmable ◽

Scale Down ◽

On Chip ◽

Numerical Precision

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

Download Full-text

High Speed and Area Efficient Discrete Wavelet Transform Using Vedic Multiplier

2015 International Conference on Computational Intelligence and Communication Networks (CICN) ◽

10.1109/cicn.2015.78 ◽

2015 ◽

Cited By ~ 1

Author(s):

Satyendra Tripathi ◽

Ashutosh Kumar Singh

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

High Speed ◽

Discrete Wavelet ◽

Vedic Multiplier ◽

Area Efficient

Download Full-text

Dynamic Task Distribution Model for On-Chip Reconfigurable High Speed Computing System

International Journal of Reconfigurable Computing ◽

10.1155/2015/783237 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12

Author(s):

Mahendra Vucha ◽

Arvind Rajawat

Keyword(s):

High Speed ◽

High Performance ◽

Real Life ◽

Computing System ◽

Distribution Model ◽

Design Parameters ◽

Dynamic Task ◽

Task Distribution ◽

Field Programmable ◽

On Chip

Modern embedded systems are being modeled as Reconfigurable High Speed Computing System (RHSCS) where Reconfigurable Hardware, that is, Field Programmable Gate Array (FPGA), and softcore processors configured on FPGA act as computing elements. As system complexity increases, efficient task distribution methodologies are essential to obtain high performance. A dynamic task distribution methodology based on Minimum Laxity First (MLF) policy (DTD-MLF) distributes the tasks of an application dynamically onto RHSCS and utilizes available RHSCS resources effectively. The DTD-MLF methodology takes the advantage of runtime design parameters of an application represented as DAG and considers the attributes of tasks in DAG and computing resources to distribute the tasks of an application onto RHSCS. In this paper, we have described the DTD-MLF model and verified its effectiveness by distributing some of real life benchmark applications onto RHSCS configured on Virtex-5 FPGA device. Some benchmark applications are represented as DAG and are distributed to the resources of RHSCS based on DTD-MLF model. The performance of the MLF based dynamic task distribution methodology is compared with static task distribution methodology. The comparison shows that the dynamic task distribution model with MLF criteria outperforms the static task distribution techniques in terms of schedule length and effective utilization of available RHSCS resources.

Download Full-text

A NOVEL FPGA-BASED APPROACH FOR DIGITAL WAVEFORM GENERATION USING ORTHOGONAL FUNCTIONS

Journal of Circuits System and Computers ◽

10.1142/s021812660700409x ◽

2007 ◽

Vol 16 (06) ◽

pp. 895-909 ◽

Cited By ~ 2

Author(s):

SYED MANZOOR QASIM ◽

SHUJA AHMAD ABBASI

Keyword(s):

High Speed ◽

Cost Effective ◽

Digital Design ◽

Waveform Generator ◽

Orthogonal Functions ◽

Orthogonal Function ◽

Waveform Generation ◽

Novel Approach ◽

Field Programmable ◽

On Chip

This paper presents a novel approach for the generation of periodic waveforms in digital form using Field Programmable Gate Array (FPGA) and orthogonal functions. The orthogonal function consists of a set of Rademacher–Walsh Functions, and utilizing these functions, virtually any periodic waveform can be synthesized. Recent technological advancements in FPGA and availability of sophisticated digital design tools have made it possible to realize high-speed waveform generator in a cost-effective way. We demonstrate the proposed technique for the successful generation of Trapezoidal, Sinusoidal, Triangular waveforms, and a complex version of these waveforms. Simulation results for the various waveforms implemented in Xilinx Spartan-3 (XC3S200-4FT256) FPGA are presented both in analog and digital forms, and validated in MATLAB. The designed circuit can be easily integrated as a module for System-on-Chip (SoC) for on-chip waveform generation

Download Full-text

High speed 2-D discrete wavelet transform using distributed arithmetic and kogge stone adder technique

2017 International Conference on Communication and Signal Processing (ICCSP) ◽

10.1109/iccsp.2017.8286439 ◽

2017 ◽

Cited By ~ 1

Author(s):

Samit Kumar Dubey ◽

Arvind Kumar Kourav ◽

Shilpi Sharma

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

High Speed ◽

Discrete Wavelet ◽

Distributed Arithmetic

Download Full-text