scholarly journals BPR-TCAM—Block and Partial Reconfiguration based TCAM on Xilinx FPGAs

Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 353 ◽  
Author(s):  
Anees Ullah ◽  
Ali Zahir ◽  
Noaman A. Khan ◽  
Waleed Ahmad ◽  
Alexis Ramos ◽  
...  

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.

Symmetry ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 1265 ◽  
Author(s):  
Zhuang Cao ◽  
Huiguo Zhang ◽  
Junnan Li ◽  
Mei Wen ◽  
Chunyuan Zhang

The development of modern networking requires that high-performance network processors be designed quickly and efficiently to support new protocols. As a very important part of the processor, the parser parses the headers of the packets—this is the precondition for further processing and finally forwarding these packets. This paper presents a framework designed to transform P4 programs to VHDL and to generate parsers on Field Programmable Gate Arrays (FPGAs). The framework includes a pipeline-based hardware architecture and a back-end compiler. The hardware architecture comprises many components with varying functionality, each of which has its own optimized VHDL template. By using the output of a standard frontend P4 compiler, our proposed compiler extracts the parameters and relationships from within the used components, which can then be mapped to corresponding templates by configuring, optimizing, and instantiating them. Finally, these templates are connected to output VHDL code. When a prototype of this framework is implemented and evaluated, the results demonstrate that the throughputs of the generated parsers achieve nearly 320 Gbps at a clock rate of around 300 MHz. Compared with state-of-the-art solutions, our proposed parsers achieve an average of twice the throughput when similar amounts of resources are being used.


2020 ◽  
Vol 34 (04) ◽  
pp. 4780-4787
Author(s):  
Yuhang Li ◽  
Xin Dong ◽  
Sai Qian Zhang ◽  
Haoli Bai ◽  
Yuanpeng Chen ◽  
...  

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.


Author(s):  
Miriam Leeser ◽  
Suranga Handagala ◽  
Michael Zink

As cloud computing grows,  the types of computational hardware available in the cloud are diversifying. Field Programmable Gate Arrays (FPGAs) are a relatively new addition to high-performance computing in the cloud, with the ability to accelerate a range of different applications, and the flexibility to offer different cloud computing models. A new and growing configuration is to have the FPGAs directly connected to the network and thus reduce the latency in delivering data to processing elements. We survey the state-of-the-art in FPGAs in the cloud and present the Open Cloud Testbed (OCT), a testbed for research and experimentation into new cloud platforms, which includes network-attached FPGAs in the cloud.


Author(s):  
ROY CROSBIE

Some applications of real-time simulation now require frame times that are shorter in duration than can be delivered by traditional methods such as real-time versions of Linux (RT-Linux). RT-Linux can be satisfactory for frames as short as 10μS, but there is now a need, for example in the simulation of power-electronic systems, for frame times as short as 1 μS or even less. Techniques based on the interfacing of digital signal processors (DSPs) to a Windows PC have achieved a 2 μS frame time for a typical power electronics application and less than 1 μS is shown to be possible using field-programmable gate arrays (FPGAs). Combining these high-speed techniques with simulations of the rest of the system necessitates the use of multi-rate techniques. Software tools, interfacing issues, and system architecture for a high-speed, real-time, distributed, multi-rate (HRDM) simulator are discussed.


Sign in / Sign up

Export Citation Format

Share Document