scholarly journals ANALYSIS OF EFFECTS OF USING 9/7 WAVELET COEFFICIENTS IN MULTI-RESOLUTION ANALYSIS

2016 ◽  
Vol 2 (1) ◽  
Author(s):  
Manish Sharma ◽  
Prof. Sonu Lal

Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) design, and it features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed area efficient 1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic (NEDA) Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NEDA can also expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture supports any size of image pixel value and any level of decomposition. The parallel structure has 100% hardware utilization efficiency.

The most common technique used for image processing applications is ‘The wavelet transformation’. The Discrete Wavelet Transform (DWT) keeps the time as well as frequency information depend on a multi resolution analysis structure, where the other classical transforms like Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) will not do that. Because of this feature, the quality of the repaired image is improved when comparing to the other transforms. To implement the DWT on a real time codec, a fast device needs to be targeted. While comparing with the other implementation such as PCs, ARM processors, DSPs etc, Field Programmable Gate Array (FPGA) implementation of DWT had better processing speed and costs were vey less. A Fast Architecture based DWT using Kogge Stone Adder is proposed in this paper where the coefficients of lifting scheme are calculated by using Shift adder and Kogge Stone Adder where other techniques used multiplier. The most important intention of the suggested technique is to use minimum calculation and limited memory. The simulation of the suggested design is dole out on the Xilinx 14.1 style tool and also the performance is evaluated and compared with the present architectures.


Author(s):  
David R. Selviah ◽  
Janti Shawash

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.


Author(s):  
A. Akilandeswari ◽  
◽  
Annie Grace Vimala ◽  
D. Sungeetha ◽  
◽  
...  

The most common technique used for image processing applications is ‘The wavelet transformation’. The Discrete Wavelet Transform (DWT) keeps the time as well as frequency information depend on a multi resolution analysis structure, where the other classical transforms like Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) will not do that. Because of this feature, the quality of the repaired image is improved when comparing to the other transforms. To implement the DWT on a real time codec, a fast device needs to be targeted. While comparing with the other implementation such as PCs, ARM processors, DSPs etc, Field Programmable Gate Array (FPGA) implementation of DWT had better processing speed and costs were vey less. A Fast Architecture based DWT using Kogge Stone Adder is proposed in this paper where the coefficients of lifting scheme are calculated by using Shift adder and Kogge Stone Adder where other techniques used multiplier. The most important intention of the suggested technique is to use minimum calculation and limited memory. The simulation of the suggested design is dole out on the Xilinx 14.1 style tool and also the performance is evaluated and compared with the present architectures.


2019 ◽  
Vol 8 (4) ◽  
pp. 10189-10198 ◽  

Fast Fourier Transform (FFT) acts as an element in the high-speed signal processing application, which involves the following subsequent operations, namely complex addition, complex subtraction and complex multiplication. Due to the complex multiplication operation, the FFT structures lead to more hardware demand. Hence, this work introduces an area-efficient various N-point support radix-2 and radix-22 FFT structure by using proposed modified butterfly units and radix-2/22 butterfly unit. The proposed modified butterfly units are used to reduce the number of complex multipliers effectively. For this reason, it is using for certain conditions in FFT design instead of existing radix-2/22 butterfly unit. Further, the proposed design supported to perform various size of FFT in a single architecture without increasing the extra element demand. Moreover, the proposed FFT structure designed and implemented using a Xilinx Virtex-6 Field-Programmable Gate Array (FPGA) device (6vcx75tff484-2) and Cadence tool with 45nm CMOS technology. The implementation results demonstrate that the proposed N-point (N=16, 32 and 64) DIF-FFT design attains the less hardware complexity when compared with existing multi-mode FFT design. Then the proposed area-efficient 16-point, 32-point and 64-point radix-2 FFT architectures reduce the total area by 20.99%, 11% and 4.9% respectively. As well, the proposed area-efficient 16-point, 32-point and 64-point radix-22 FFT architectures reduce the total area by 32%, 19% and 11% respectively.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-31
Author(s):  
Joel Mandebi Mbongue ◽  
Danielle Tchuinkou Kwadjo ◽  
Alex Shuping ◽  
Christophe Bobda

Cloud deployments now increasingly exploit Field-Programmable Gate Array (FPGA) accelerators as part of virtual instances. While cloud FPGAs are still essentially single-tenant, the growing demand for efficient hardware acceleration paves the way to FPGA multi-tenancy. It then becomes necessary to explore architectures, design flows, and resource management features that aim at exposing multi-tenant FPGAs to the cloud users. In this article, we discuss a hardware/software architecture that supports provisioning space-shared FPGAs in Kernel-based Virtual Machine (KVM) clouds. The proposed hardware/software architecture introduces an FPGA organization that improves hardware consolidation and support hardware elasticity with minimal data movement overhead. It also relies on VirtIO to decrease communication latency between hardware and software domains. Prototyping the proposed architecture with a Virtex UltraScale+ FPGA demonstrated near specification maximum frequency for on-chip data movement and high throughput in virtual instance access to hardware accelerators. We demonstrate similar performance compared to single-tenant deployment while increasing FPGA utilization, which is one of the goals of virtualization. Overall, our FPGA design achieved about 2× higher maximum frequency than the state of the art and a bandwidth reaching up to 28 Gbps on 32-bit data width.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-29
Author(s):  
Paolo D'Alberto ◽  
Victor Wu ◽  
Aaron Ng ◽  
Rahul Nimaiyar ◽  
Elliott Delaye ◽  
...  

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.


2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
Mahendra Vucha ◽  
Arvind Rajawat

Modern embedded systems are being modeled as Reconfigurable High Speed Computing System (RHSCS) where Reconfigurable Hardware, that is, Field Programmable Gate Array (FPGA), and softcore processors configured on FPGA act as computing elements. As system complexity increases, efficient task distribution methodologies are essential to obtain high performance. A dynamic task distribution methodology based on Minimum Laxity First (MLF) policy (DTD-MLF) distributes the tasks of an application dynamically onto RHSCS and utilizes available RHSCS resources effectively. The DTD-MLF methodology takes the advantage of runtime design parameters of an application represented as DAG and considers the attributes of tasks in DAG and computing resources to distribute the tasks of an application onto RHSCS. In this paper, we have described the DTD-MLF model and verified its effectiveness by distributing some of real life benchmark applications onto RHSCS configured on Virtex-5 FPGA device. Some benchmark applications are represented as DAG and are distributed to the resources of RHSCS based on DTD-MLF model. The performance of the MLF based dynamic task distribution methodology is compared with static task distribution methodology. The comparison shows that the dynamic task distribution model with MLF criteria outperforms the static task distribution techniques in terms of schedule length and effective utilization of available RHSCS resources.


2007 ◽  
Vol 16 (06) ◽  
pp. 895-909 ◽  
Author(s):  
SYED MANZOOR QASIM ◽  
SHUJA AHMAD ABBASI

This paper presents a novel approach for the generation of periodic waveforms in digital form using Field Programmable Gate Array (FPGA) and orthogonal functions. The orthogonal function consists of a set of Rademacher–Walsh Functions, and utilizing these functions, virtually any periodic waveform can be synthesized. Recent technological advancements in FPGA and availability of sophisticated digital design tools have made it possible to realize high-speed waveform generator in a cost-effective way. We demonstrate the proposed technique for the successful generation of Trapezoidal, Sinusoidal, Triangular waveforms, and a complex version of these waveforms. Simulation results for the various waveforms implemented in Xilinx Spartan-3 (XC3S200-4FT256) FPGA are presented both in analog and digital forms, and validated in MATLAB. The designed circuit can be easily integrated as a module for System-on-Chip (SoC) for on-chip waveform generation


Sign in / Sign up

Export Citation Format

Share Document