Cost Effective Implementation of Fixed Point Adders for LUT based FPGAs using Technology Dependent Optimizations

Modern day field programmable gate arrays(FPGAs) have very huge and versatile logic resources resulting inthe migration of their application domain from prototypedesigning to low and medium volume production designing.Unfortunately most of the work pertaining to FPGAimplementations does not focus on the technology dependentoptimizations that can implement a desired functionality withreduced cost. In this paper we consider the mapping of simpleripple carry fixed-point adders (RCA) on look-up table (LUT)based FPGAs. The objective is to transform the given RCABoolean network into an optimized circuit netlist that canimplement the desired functionality with minimum cost. Weparticularly focus on 6-input LUTs that are inherent in all themodern day FPGAs. Technology dependent optimizations arecarried out to utilize this FPGA primitive efficiently and theresult is compared against various adder designs. Theimplementation targets the XC5VLX30-3FF324 device fromXilinx Virtex-5 FPGA family. The cost of the circuit is expressedin terms of the resources utilized, critical path delay and theamount of on-chip power dissipated. Our implementation resultsshow a reduction in resources usage by at least 50%; increase inspeed by at least 10% and reduction in dynamic powerdissipation by at least 30%. All this is achieved without anytechnology independent (architectural) modification.

Download Full-text

Exploring Shared SRAM Tables in FPGAs for Larger LUTs and Higher Degree of Sharing

International Journal of Reconfigurable Computing ◽

10.1155/2017/7021056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Ali Asghar ◽

Muhammad Mazher Iqbal ◽

Waqar Ahmed ◽

Mujahid Ali ◽

Husain Parvez ◽

...

Keyword(s):

High Performance ◽

Critical Path ◽

Path Delay ◽

Gate Arrays ◽

Area Reduction ◽

Area Overhead ◽

Logic Block ◽

Field Programmable ◽

Boolean Matching ◽

Programmable Gate Arrays

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Download Full-text

Comparative analysis of soft and hard on-chip interconnects for field-programmable gate arrays

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2011.0169 ◽

2012 ◽

Vol 6 (6) ◽

pp. 396-405 ◽

Cited By ~ 2

Author(s):

J.Y. Hur ◽

M.A. Wahlah ◽

L. Mhamdi ◽

K. Goossens

Keyword(s):

Comparative Analysis ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip

Download Full-text

Soft Core Processor Generated Based on the Machine Code of the Application

Journal of Circuits System and Computers ◽

10.1142/s0218126616500298 ◽

2016 ◽

Vol 25 (04) ◽

pp. 1650029 ◽

Cited By ~ 11

Author(s):

Adam Ziebinski ◽

Stanwlaw Swierc

Keyword(s):

Embedded System ◽

Soft Core ◽

Machine Code ◽

Correct Operation ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Instruction Set Extensions ◽

The Cost ◽

Application Specific

Currently embedded system designs aim to improve areas such as speed, energy efficiency and the cost of an application. Application-specific instruction set extensions on reconfigurable hardware provide such opportunities. The article presents a new approach for generating soft core processors that are optimized for specific tasks. In this work, we describe an automatic method for selecting custom instructions for generating software core processors that are based on the machine code of the application program. As the result, a soft core processor will contain the logic that is absolutely necessary. This solution requires fewer gates to be synthesized in the field programmable gate arrays (FPGA) and has a potential to increase the speed of the information processing that is performed by the system in the target FPGA. Experiments have confirmed the correct operation of the method that was used. After the reduction mechanism was enabled, the total number of slices blocks that were occupied decreased to 47% of its initial value in the best case for the Xilinx Spartan3 (xc3s200) and the maximum frequency increased approximately 44% in the best case for Xilinx Spartan6 (xc6slx4).

Download Full-text

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Electronics ◽

10.3390/electronics10182272 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2272

Author(s):

Safa Bouguezzi ◽

Hana Ben Fredj ◽

Tarek Belabed ◽

Carlos Valderrama ◽

Hassene Faiedh ◽

...

Keyword(s):

Recognition Rate ◽

Hardware Acceleration ◽

Implementation Model ◽

Gate Arrays ◽

Proposed Model ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computer Vision Applications ◽

On Chip ◽

Segmentation Image

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

Download Full-text

Hardware synthesis of artificial neural networks using field programmable gate arrays and fixed-point numbers

2006 IEEE Region 5 Conference ◽

10.1109/tpsd.2006.5507410 ◽

2006 ◽

Cited By ~ 2

Author(s):

Mychal Hoffman ◽

Paul Bauer ◽

Brian Hemrnelman ◽

Abul Hasan

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Artificial Neural Networks ◽

Field Programmable Gate Arrays ◽

Hardware Synthesis ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Artificial Neural

Download Full-text

Low-Complexity Nonlinear Self-Inverse Permutation for Creating Physically Clone-Resistant Identities

Cryptography ◽

10.3390/cryptography4010006 ◽

2020 ◽

Vol 4 (1) ◽

pp. 6 ◽

Cited By ~ 1

Author(s):

Saleh Mulhem ◽

Ayoub Mars ◽

Wael Adi

Keyword(s):

Field Programmable Gate Arrays ◽

Low Complexity ◽

System On Chip ◽

Large Classes ◽

Physical Unclonable Functions ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Security Levels ◽

On Chip

New large classes of permutations over ℤ 2 n based on T-Functions as Self-Inverting Permutation Functions (SIPFs) are presented. The presented classes exhibit negligible or low complexity when implemented in emerging FPGA technologies. The target use of such functions is in creating the so called Secret Unknown Ciphers (SUC) to serve as resilient Clone-Resistant structures in smart non-volatile Field Programmable Gate Arrays (FPGA) devices. SUCs concepts were proposed a decade ago as digital consistent alternatives to the conventional analog inconsistent Physical Unclonable Functions PUFs. The proposed permutation classes are designed and optimized particularly to use non-consumed Mathblock cores in programmable System-on-Chip (SoC) FPGA devices. Hardware and software complexities for realizing such structures are optimized and evaluated for a sample expected target FPGA technology. The attained security levels of the resulting SUCs are evaluated and shown to be scalable and usable even for post-quantum crypto systems.

Download Full-text

Efficient Use of On-Chip Memories and Scheduling Techniques to Eliminate the Reconfiguration Overheads in Reconfigurable Systems

Journal of Circuits System and Computers ◽

10.1142/s0218126619502463 ◽

2019 ◽

Vol 28 (14) ◽

pp. 1950246

Author(s):

I. Hariharan ◽

M. Kannan

Keyword(s):

System Performance ◽

Replacement Policy ◽

Reconfigurable Systems ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Configuration Data ◽

On Chip ◽

Time And Energy ◽

Static Systems

Modern embedded systems are packed with dedicated Field Programmable Gate Arrays (FPGAs) to accelerate the overall system performance. However, the FPGAs are susceptible to reconfiguration overheads. The reconfiguration overheads are mainly because of the configuration data being fetched from the off-chip memory at run-time and also due to the improper management of tasks during execution. To reduce these overheads, our proposed methodology mainly focuses on the prefetch heuristic, reuse technique, and the available memory hierarchy to provide an efficient mapping of tasks over the available memories. Our paper includes a new replacement policy which reduces the overall time and energy reconfiguration overheads for static systems in their subsequent iterations. It is evident from the result that most of the reconfiguration overheads are eliminated when the applications are managed and executed based on our methodology.

Download Full-text

New Mathblocks-Based Feistel-Like Ciphers for Creating Clone-Resistant FPGA Devices

Cryptography ◽

10.3390/cryptography3040028 ◽

2019 ◽

Vol 3 (4) ◽

pp. 28 ◽

Cited By ~ 2

Author(s):

Saleh Mulhem ◽

Wael Adi

Keyword(s):

Hard Core ◽

Major Elements ◽

Physical Unclonable Functions ◽

Mapping Functions ◽

The Public ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip ◽

Near Future

The Secret Unknown Cipher (SUC) concept was introduced a decade ago as a promising technique for creating pure digital clone-resistant electronic units as alternatives to the traditional non-consistent Physical Unclonable Functions (PUFs). In this work, a very special unconventional cipher design is presented. The design uses hard-core FPGA (Field Programmable Gate Arrays) -Mathblocks available in modern system-on-chip (SoC) FPGAs. Such Mathblocks are often not completely used in many FPGA applications; therefore, it seems wise to make use of such dead (unused) modules to fabricate usable physical security functions for free. Standard cipher designs usually avoid deploying multipliers in the cipher mapping functions due to their high complexity. The main target of this work is to design large cipher classes (e.g., cipher class size >2600) by mainly deploying the FPGA specific mathematical cores. The proposed cipher designs are novel hardware-oriented and new in the public literature, using fully new unusual mapping functions. If a random unknown selection of one cipher out of 2600 ciphers is self-configured in a device, then a Secret Unknown Cipher module is created within a device, making it physically hard to clone. We consider the cipher module for free (for zero cost) if the major elements in the cipher module are making use of unused reanimated Mathblocks. Such ciphers are usable in many future mass products for protecting vehicular units against cloning and modeling attacks. The required self-reconfigurable devices for that concept are not available now; however, they are expected to emerge in the near future.

Download Full-text

FPGA Prototyping of Micro-Blaze soft-processor based Multi-core System on Chip

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11416 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 57

Author(s):

G Prasad Acharya ◽

M Asha Rani

Keyword(s):

Computer Aided Design ◽

Processing System ◽

System On Chip ◽

Core System ◽

Design Cycle ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip ◽

Aided Design ◽

Level Parallelism

The increased demand for processor-level parallelism has many-folded the challenges for SoC designers to design, simulate and verify/validate today’s Multi-core System-On-Chip (SoC) due to the increased system complexity. There is also a need to reduce the design cycle time to produce a complex multi-core SOC system thereby the product can be brought into the market within an affordable time. The Computer-Aided Design (CAD) tools and Field Programmable Gate Arrays (FPGAs) provide a solution for rapidly prototyping and validating the system. This paper presents an implementation of multi-core SoC consisting of 6 Xilinx Micro-Blaze soft-core processors integrated to the Zynq Processing System (PS) using IP Integrator and these cores will be communicated through AXI bus. The functionality of the system is verified using Micro-Blaze system debugger. The hardware framework for the implemented system is implemented and verified on FPGA.

Download Full-text

SS-OCT and FD-OCT System Prototyping Using LabVIEW FPGA

10.1115/biomed2011-66007 ◽

2011 ◽

Author(s):

Zach Olson

Keyword(s):

Real Time ◽

Graphics Processing Units ◽

Image Data ◽

Light Sources ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Labview Fpga ◽

And Performance ◽

Control Light ◽

The Cost

Optical coherence tomography (OCT) techniques have opened up a number of new medical imaging applications in research and clinical applications. Key application areas include cancer research, vascular applications such as imaging arterial plaque, and ophthalmology applications such as pre and post-operative cataract surgery imaging. Emerging Technologies in galvo control, light sources, detector technologies, and parallel hardware-based processing are increasing the quality and performance of images, as well as reducing the cost and footprint of OCT systems. The parallel computing capabilities of field programmable gate arrays (FPGAs), multi-core processors, and graphics processing units (GPUs) have enabled real-time OCT image processing, which provides real-time image data to support surgical procedures.

Download Full-text