scholarly journals EE-TCAM: An Energy-Efficient SRAM-Based TCAM on FPGA

Electronics ◽  
2018 ◽  
Vol 7 (9) ◽  
pp. 186 ◽  
Author(s):  
Inayat Ullah ◽  
Zahid Ullah ◽  
Jeong-A Lee

Ternary content-addressable memories (TCAMs) are used to design high-speed search engines. TCAM is implemented on application-specific integrated circuit (native TCAMs) and field-programmable gate array (FPGA) (static random-access memory (SRAM)-based TCAMs) platforms but both have the drawback of high power consumption. This paper presents a pre-classifier-based architecture for an energy-efficient SRAM-based TCAM. The first classification stage divides the TCAM table into several sub-tables of balanced size. The second SRAM-based implementation stage maps each of the resultant TCAM sub-tables to a separate row of configured SRAM blocks in the architecture. The proposed architecture selectively activates at most one row of SRAM blocks for each incoming TCAM word. Compared with the existing SRAM-based TCAM designs on FPGAs, the proposed design consumes significantly reduced energy as it activates a part of SRAM memory used for lookup rather than the entire SRAM memory as in the previous schemes. We implemented the proposed approach sample designs of size 512 × 36 on Xilinx Virtex-6 FPGA. The experimental results showed that the proposed design achieved at least three times lower power consumption per performance than other SRAM-based TCAM architectures.

2018 ◽  
Vol 7 (2.8) ◽  
pp. 204
Author(s):  
G A.E Satish Kumar

This paper presents the convolution operation based on the Number Theoretic Transfom for two n=8 input sequences. The convolution of two n-point sequences using Fast Fourier Transform exhibits design complexity leading to high power consumption. The Number Theoretic Transform utilizes the matrix of modulus values to evaluate the convolution. The Number Theoretic Transform is as an integer transform which makes the design comparatively simple. The convolution based Number Theoretic Transform is developed using the Very High Speed Integrated Circuit Hardware Description language.Also the real time implementation of the proposed method is validated by the Xilinx Spartan FPGA family devices. The performance analysis of power, speed and area are evaluated and compared with 3A DSP FPGA and Virtex 6 FPGA devices.


2018 ◽  
Vol 7 (4) ◽  
pp. 2569
Author(s):  
Priyanka Chauhan ◽  
Dippal Israni ◽  
Karan Jasani ◽  
Ashwin Makwana

Data acquisition is the most demanding application for the acquisition and monitoring of various sensor signals. The data received are processed in real-time environment. This paper proposes a novel Data Acquisition (DAQ) technique for better resource utilization with less power consumption. Present work has designed and compared advanced Quad Data Rate (QDR) technique with traditional Dual Data Rate (DDR) technique in terms of resource utilization and power consumption of Field Programmable Gate Array (FPGA) hardware. Xilinx ISE is used to verify results of FPGA resource utilization by QDR with state of the art DDR approach. The paper ratiocinates that QDR technique outperforms traditional DDR technique in terms of FPGA resource utilization.  


2008 ◽  
Vol 2008 ◽  
pp. 1-9 ◽  
Author(s):  
Y. Guillemenet ◽  
L. Torres ◽  
G. Sassatelli ◽  
N. Bruchon

This paper describes the integration of field-induced magnetic switching (FIMS) and thermally assisted switching (TAS) magnetic random access memories in FPGA design. The nonvolatility of the latter is achieved through the use of magnetic tunneling junctions (MTJs) in the MRAM cell. A thermally assisted switching scheme helps to reduce power consumption during write operation in comparison to the writing scheme in the FIMS-MTJ device. Moreover, the nonvolatility of such a design based on either an FIMS or a TAS writing scheme should reduce both power consumption and configuration time required at each power up of the circuit in comparison to classical SRAM-based FPGAs. A real-time reconfigurable (RTR) micro-FPGA using FIMS-MRAM or TAS-MRAM allows dynamic reconfiguration mechanisms, while featuring simple design architecture.


2014 ◽  
pp. 27-33
Author(s):  
Mounir Bouhedda ◽  
Mokhtar Attari

The aim of this paper is to introduce a new architecture using Artificial Neural Networks (ANN) in designing a 6-bit nonlinear Analog to Digital Converter (ADC). A study was conducted to synthesise an optimal ANN in view to FPGA (Field Programmable Gate Array) implementation using Very High-speed Integrated Circuit Hardware Description Language (VHDL). Simulation and tests results are carried out to show the efficiency of the designed ANN.


2019 ◽  
Author(s):  
Kevin A. Williams ◽  
Marija Trajkovic ◽  
Valeria Rustichelli ◽  
Florian Lemaître ◽  
Huub P. Ambrosius ◽  
...  

Computers ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 70
Author(s):  
Carolina Fernández ◽  
Sergio Giménez ◽  
Eduard Grasa ◽  
Steve Bunch

The lack of high-performance RINA (Recursive InterNetwork Architecture) implementations to date makes it hard to experiment with RINA as an underlay networking fabric solution for different types of networks, and to assess RINA’s benefits in practice on scenarios with high traffic loads. High-performance router implementations typically require dedicated hardware support, such as FPGAs (Field Programmable Gate Arrays) or specialized ASICs (Application Specific Integrated Circuit). With the advance of hardware programmability in recent years, new possibilities unfold to prototype novel networking technologies. In particular, the use of the P4 programming language for programmable ASICs holds great promise for developing a RINA router. This paper details the design and part of the implementation of the first P4-based RINA interior router, which reuses the layer management components of the IRATI Linux-based RINA implementation and implements the data-transfer components using a P4 program. We also describe the configuration and testing of our initial deployment scenarios, using ancillary open-source tools such as the P4 reference test software switch (BMv2) or the P4Runtime API.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 353 ◽  
Author(s):  
Anees Ullah ◽  
Ali Zahir ◽  
Noaman A. Khan ◽  
Waleed Ahmad ◽  
Alexis Ramos ◽  
...  

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.


Electronics ◽  
2018 ◽  
Vol 7 (8) ◽  
pp. 135 ◽  
Author(s):  
Nikolay Chervyakov ◽  
Pavel Lyakhov ◽  
Dmitry Kaplun ◽  
Denis Butusov ◽  
Nikolay Nagornov

In this paper, we analyze the noise quantization effects in coefficients of discrete wavelet transform (DWT) filter banks for image processing. We propose the implementation of the DWT method, making it possible to determine the effective bit-width of the filter banks coefficients at which the quantization noise does not significantly affect the image processing results according to the peak signal-to-noise ratio (PSNR). The dependence between the PSNR of the DWT image quality on the wavelet and the bit-width of the wavelet filter coefficients is analyzed. The formulas for determining the minimal bit-width of the filter coefficients at which the processed image achieves high quality (PSNR ≥ 40 dB) are given. The obtained theoretical results were confirmed through the simulation of DWT for a test image using the calculated bit-width values. All considered algorithms operate with fixed-point numbers, which simplifies their hardware implementation on modern devices: field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), etc.


2017 ◽  
Vol 2017 ◽  
pp. 1-11
Author(s):  
Yichun Sun ◽  
Hengzhu Liu ◽  
Tong Zhou

Cholesky factorization is a fundamental problem in most engineering and science computation applications. When dealing with a large sparse matrix, numerical decomposition consumes the most time. We present a vector architecture to parallelize numerical decomposition of Cholesky factorization. We construct an integrated analytical parameterized performance model to accurately predict the execution times of typical matrices under varying parameters. Our proposed approach is general for accelerator and limited by neither field-programmable gate arrays (FPGAs) nor application-specific integrated circuit. We implement a simplified module in FPGAs to prove the accuracy of the model. The experiments show that, for most cases, the performance differences between the predicted and measured execution are less than 10%. Based on the performance model, we optimize parameters and obtain a balance of resources and performance after analyzing the performance of varied parameter settings. Comparing with the state-of-the-art implementation in CPU and GPU, we find that the performance of the optimal parameters is 2x that of CPU. Our model offers several advantages, particularly in power consumption. It provides guidance for the design of future acceleration components.


Sign in / Sign up

Export Citation Format

Share Document