Field Programmable Photonic Gate Arrays

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text

Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers

American Journal of Engineering and Applied Sciences ◽

10.3844/ajeassp.2010.307.311 ◽

2010 ◽

Vol 3 (2) ◽

pp. 307-311 ◽

Cited By ~ 1

Author(s):

Rais

Keyword(s):

Field Programmable Gate Array ◽

Building Blocks ◽

Power Efficient ◽

Field Programmable ◽

Gate Array

Download Full-text

Reconfigurable processing with field programmable gate arrays

Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96 ◽

10.1109/asap.1996.542824 ◽

2002 ◽

Cited By ~ 4

Author(s):

B.K. Fawcett ◽

J. Watson

Keyword(s):

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Reconfigurable Processing

Download Full-text

mNet2FPGA: A Design Flow for Mapping a Fixed-Point CNN to Zynq SoC FPGA

Electronics ◽

10.3390/electronics9111823 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1823

Author(s):

Tomyslav Sledevič ◽

Artūras Serackis

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Processing System ◽

Design Flow ◽

Feature Maps ◽

Gate Arrays ◽

The Core ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Fully Connected

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.

Download Full-text

AN EVOLUTIONARY ALGORITHM FOR THE ALLOCATION PROBLEM IN HIGH-LEVEL SYNTHESIS

Journal of Circuits System and Computers ◽

10.1142/s0218126605002362 ◽

2005 ◽

Vol 14 (02) ◽

pp. 347-366 ◽

Cited By ~ 3

Author(s):

HAIDAR M. HARMANANI ◽

RONY SALIBA

Keyword(s):

Evolutionary Algorithm ◽

Allocation Problem ◽

High Level Synthesis ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Functional Units ◽

High Level ◽

The Cost ◽

Short Time

This paper presents an evolutionary algorithm to solve the datapath allocation problem in high-level synthesis. The method performs allocation of functional units, registers, and multiplexers in addition to controller synthesis with the objective of minimizing the cost of hardware resources. The system handles multicycle functional units as well as structural pipelining. The proposed method was implemented using C++ on a Linux workstation. We tested our method on a set of high-level synthesis benchmarks, all yielding good solutions in a short time. An integration path to Field Programmable Gate Arrays (FPGAs) is provided through VHDL.

Download Full-text

Area Optimisation for Field-Programmable Gate Arrays in SystemC Hardware Compilation

International Journal of Reconfigurable Computing ◽

10.1155/2008/674340 ◽

2008 ◽

Vol 2008 ◽

pp. 1-14

Author(s):

Johan Ditmar ◽

Steve McKeever ◽

Alex Wilson

Keyword(s):

Clock Cycle ◽

Gate Arrays ◽

Field Programmable ◽

Separate Block ◽

Programmable Gate Arrays ◽

Source Level ◽

High Level ◽

Specific Implementation ◽

Mapping Arrays ◽

Language Construct

This paper discusses a pair of synthesis algorithms that optimise a SystemC design to minimise area when targeting FPGAs. Each can significantly improve the synthesis of a high-level language construct, thus allowing a designer to concentrate more on an algorithm description and less on hardware-specific implementation details. The first algorithm is a source-level transformation implementing function exlining—where a separate block of hardware implements a function and is shared between multiple calls to the function. The second is a novel algorithm for mapping arrays to memories which involves assigning array accesses to memory ports such that no port is ever accessed more than once in a clock cycle. This algorithm assigns accesses to read/write only ports and read-write ports concurrently, solving the assignment problem more efficiently for a wider range of memories compared to existing methods. Both optimisations operate on a high-level program representation and have been implemented in a commercial SystemC compiler. Experiments show that in suitable circumstances these techniques result in significant reductions in logic utilisation for FPGAs.

Download Full-text

Minimizing the layout area of 2-input look up tables

10.32920/ryerson.14648718 ◽

2021 ◽

Author(s):

Nafiul Hyder

Keyword(s):

Systematic Approach ◽

Metal Layer ◽

Building Blocks ◽

Gate Arrays ◽

Fundamental Building Block ◽

Routing Switches ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Layout Area ◽

Metal Layers

This work investigates the minimum layout area of multiplexers, a fundamental building block of Field-Programmable Gate Arrays (FPGAs). In particular, we investigate the minimum layout area of 4:1 multiplexers, which are the building blocks of 2-input Look-Up Tables (LUTs) and can be recursively used to build higher order LUTs and multiplexer-based routing switches. We observe that previous work routes all four data inputs of 4:1 multiplexers on a single metal layer resulting in a wiring-area-dominated layout. In this work, we explore the various transistor-level placement options for implementing the 4:1 multiplexers while routing multiplexer data inputs through multiple metal layers in order to reduce wiring area. Feasible placement options with their corresponding data input distributions are then routed using an automated maze router and the routing results are then further manually refined. Through this systematic approach, we identified three 4:1 multiplexer layouts that are smaller than the previously proposed layouts by 30% to 35%. In particular, two larger layouts of the three are only 33% to 45% larger than layout area predicted by the two widely used active area models from previous FPGA architectural studies, and the smallest of the three layouts is 1% to 11% larger than the layout area predicted by these models.

Download Full-text

Minimizing the layout area of 2-input look up tables

10.32920/ryerson.14648718.v1 ◽

2021 ◽

Author(s):

Nafiul Hyder

Keyword(s):

Systematic Approach ◽

Metal Layer ◽

Building Blocks ◽

Gate Arrays ◽

Fundamental Building Block ◽

Routing Switches ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Layout Area ◽

Metal Layers

This work investigates the minimum layout area of multiplexers, a fundamental building block of Field-Programmable Gate Arrays (FPGAs). In particular, we investigate the minimum layout area of 4:1 multiplexers, which are the building blocks of 2-input Look-Up Tables (LUTs) and can be recursively used to build higher order LUTs and multiplexer-based routing switches. We observe that previous work routes all four data inputs of 4:1 multiplexers on a single metal layer resulting in a wiring-area-dominated layout. In this work, we explore the various transistor-level placement options for implementing the 4:1 multiplexers while routing multiplexer data inputs through multiple metal layers in order to reduce wiring area. Feasible placement options with their corresponding data input distributions are then routed using an automated maze router and the routing results are then further manually refined. Through this systematic approach, we identified three 4:1 multiplexer layouts that are smaller than the previously proposed layouts by 30% to 35%. In particular, two larger layouts of the three are only 33% to 45% larger than layout area predicted by the two widely used active area models from previous FPGA architectural studies, and the smallest of the three layouts is 1% to 11% larger than the layout area predicted by these models.

Download Full-text

A Gracefully Degrading and Energy-Efﬁcient FPGA Programming using LabVIEW

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v5.i3.pp165-175 ◽

2016 ◽

Vol 5 (3) ◽

pp. 165

Author(s):

B. Naresh Kumar Reddy ◽

N. Suresh ◽

J.V.N. Ramesh

Keyword(s):

Complex Signal ◽

Graphical Language ◽

Gate Arrays ◽

Signal Flow Graphs ◽

Field Programmable ◽

Signal Processing Algorithms ◽

Programmable Gate Arrays ◽

Labview Fpga ◽

High Level ◽

Flow Graphs

<p>Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise. FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.</p>

Download Full-text