Field Programmable Photonic Gate Arrays

Author(s):  
José Capmany ◽  
Daniel Pérez

The field programmable photonic gate array (FPPGA) is an integrated photonic device/subsystem that operates similarly to a field programmable gate array in electronics. It is a set of programmable photonics analogue blocks (PPABs) and of reconfigurable photonic interconnects (RPIs) implemented over a photonic chip. The PPABs provide the building blocks for implementing basic optical analogue operations (reconfigurable/independent power splitting and phase shifting). Broadly they enable reconfigurable processing just like configurable logic elements (CLE) or programmable logic blocks (PLBs) carry digital operations in electronic FPGAs or configurable analogue blocks (CABs) carry analogue operations in electronic field programmable analogue arrays (FPAAs). Reconfigurable interconnections between PPABs are provided by the RPIs. This chapter presents basic principles of integrated FPPGAs. It describes their main building blocks and discusses alternatives for their high-level layouts, design flow, technology mapping and physical implementation. Finally, it shows that waveguide meshes lead naturally to a compact solution.

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 449
Author(s):  
Mohammad Amir Mansoori ◽  
Mario R. Casu

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1823
Author(s):  
Tomyslav Sledevič ◽  
Artūras Serackis

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.


2005 ◽  
Vol 14 (02) ◽  
pp. 347-366 ◽  
Author(s):  
HAIDAR M. HARMANANI ◽  
RONY SALIBA

This paper presents an evolutionary algorithm to solve the datapath allocation problem in high-level synthesis. The method performs allocation of functional units, registers, and multiplexers in addition to controller synthesis with the objective of minimizing the cost of hardware resources. The system handles multicycle functional units as well as structural pipelining. The proposed method was implemented using C++ on a Linux workstation. We tested our method on a set of high-level synthesis benchmarks, all yielding good solutions in a short time. An integration path to Field Programmable Gate Arrays (FPGAs) is provided through VHDL.


2008 ◽  
Vol 2008 ◽  
pp. 1-14
Author(s):  
Johan Ditmar ◽  
Steve McKeever ◽  
Alex Wilson

This paper discusses a pair of synthesis algorithms that optimise a SystemC design to minimise area when targeting FPGAs. Each can significantly improve the synthesis of a high-level language construct, thus allowing a designer to concentrate more on an algorithm description and less on hardware-specific implementation details. The first algorithm is a source-level transformation implementing function exlining—where a separate block of hardware implements a function and is shared between multiple calls to the function. The second is a novel algorithm for mapping arrays to memories which involves assigning array accesses to memory ports such that no port is ever accessed more than once in a clock cycle. This algorithm assigns accesses to read/write only ports and read-write ports concurrently, solving the assignment problem more efficiently for a wider range of memories compared to existing methods. Both optimisations operate on a high-level program representation and have been implemented in a commercial SystemC compiler. Experiments show that in suitable circumstances these techniques result in significant reductions in logic utilisation for FPGAs.


2021 ◽  
Author(s):  
Nafiul Hyder

This work investigates the minimum layout area of multiplexers, a fundamental building block of Field-Programmable Gate Arrays (FPGAs). In particular, we investigate the minimum layout area of 4:1 multiplexers, which are the building blocks of 2-input Look-Up Tables (LUTs) and can be recursively used to build higher order LUTs and multiplexer-based routing switches. We observe that previous work routes all four data inputs of 4:1 multiplexers on a single metal layer resulting in a wiring-area-dominated layout. In this work, we explore the various transistor-level placement options for implementing the 4:1 multiplexers while routing multiplexer data inputs through multiple metal layers in order to reduce wiring area. Feasible placement options with their corresponding data input distributions are then routed using an automated maze router and the routing results are then further manually refined. Through this systematic approach, we identified three 4:1 multiplexer layouts that are smaller than the previously proposed layouts by 30% to 35%. In particular, two larger layouts of the three are only 33% to 45% larger than layout area predicted by the two widely used active area models from previous FPGA architectural studies, and the smallest of the three layouts is 1% to 11% larger than the layout area predicted by these models.


2021 ◽  
Author(s):  
Nafiul Hyder

This work investigates the minimum layout area of multiplexers, a fundamental building block of Field-Programmable Gate Arrays (FPGAs). In particular, we investigate the minimum layout area of 4:1 multiplexers, which are the building blocks of 2-input Look-Up Tables (LUTs) and can be recursively used to build higher order LUTs and multiplexer-based routing switches. We observe that previous work routes all four data inputs of 4:1 multiplexers on a single metal layer resulting in a wiring-area-dominated layout. In this work, we explore the various transistor-level placement options for implementing the 4:1 multiplexers while routing multiplexer data inputs through multiple metal layers in order to reduce wiring area. Feasible placement options with their corresponding data input distributions are then routed using an automated maze router and the routing results are then further manually refined. Through this systematic approach, we identified three 4:1 multiplexer layouts that are smaller than the previously proposed layouts by 30% to 35%. In particular, two larger layouts of the three are only 33% to 45% larger than layout area predicted by the two widely used active area models from previous FPGA architectural studies, and the smallest of the three layouts is 1% to 11% larger than the layout area predicted by these models.


Author(s):  
B. Naresh Kumar Reddy ◽  
N. Suresh ◽  
J.V.N. Ramesh

<p>Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise. FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.</p>


Sign in / Sign up

Export Citation Format

Share Document