floating point unit
Recently Published Documents


TOTAL DOCUMENTS

135
(FIVE YEARS 22)

H-INDEX

13
(FIVE YEARS 1)

2021 ◽  
Vol 11 (23) ◽  
pp. 11164
Author(s):  
Luca Mocerino ◽  
Andrea Calimera

The reduction in energy consumption is key for deep neural networks (DNNs) to ensure usability and reliability, whether they are deployed on low-power end-nodes with limited resources or high-performance platforms that serve large pools of users. Leveraging the over-parametrization shown by many DNN models, convolutional neural networks (ConvNets) in particular, energy efficiency can be improved substantially preserving the model accuracy. The solution proposed in this work exploits the intrinsic redundancy of ConvNets to maximize the reuse of partial arithmetic results during the inference stages. Specifically, the weight-set of a given ConvNet is discretized through a clustering procedure such that the largest possible number of inner multiplications fall into predefined bins; this allows an off-line computation of the most frequent results, which in turn can be stored locally and retrieved when needed during the forward pass. Such a reuse mechanism leads to remarkable energy savings with the aid of a custom processing element (PE) that integrates an associative memory with a standard floating-point unit (FPU). Moreover, the adoption of an approximate associative rule based on a partial bit-match increases the hit rate over the pre-computed results, maximizing the energy reduction even further. Results collected on a set of ConvNets trained for computer vision and speech processing tasks reveal that the proposed associative-based hw-sw co-design achieves up to 77% in energy savings with less than 1% in accuracy loss.


Author(s):  
Shruthi . ◽  
Jamuna S

RISC-V is an open, free standard architecture. As its open-source architecture, it can be used in multiple applications like embedded processors, IoT, artificial intelligence, machine learning, military and defense applications. The parameters like throughput, performance, high speed etc., become essential in designing processor architecture. Pipelining is one such unique feature supported by RISC-V ISA, which basically involves the execution of multiple instructions in single cycle. This feature helps in improving the performance of the processor architecture. RISC-V ISA supports five stages of pipelining they are instruction fetch, instruction decode, execute, memory and write-back stage. The work covered in this paper involves the design and implementation of the subsystems of the RISC-V ISA which are present in different stages of pipeline architecture. The subsystems included in this work are Floating Point Unit (FPU) of Execute stage, Branch Prediction Unit (BPU) of instruction fetch stage, Forwarding Unit of execution stage, Operand Logic of decode stage and Floating-Point register file of Write-back stage. These subsystems are designed using Verilog Hardware Description Language in Xilinx ISE. Followed by the implementation the verification of the floating-point unit and the forwarding unit is performed using System Verilog Assertions in QuestaSim. The Assertion coverage report for the same is extracted.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-24
Author(s):  
Ioannis Tsiokanos ◽  
Jack Miskelly ◽  
Chongyan Gu ◽  
Maire O’neill ◽  
Georgios Karakonstantis

In recent years, physical unclonable functions (PUFs) have gained a lot of attention as mechanisms for hardware-rooted device authentication. While the majority of the previously proposed PUFs derive entropy using dedicated circuitry, software PUFs achieve this from existing circuitry in a system. Such software-derived designs are highly desirable for low-power embedded systems as they require no hardware overhead. However, these software PUFs induce considerable processing overheads that hinder their adoption in resource-constrained devices. In this article, we propose DTA-PUF, a novel, software PUF design that exploits the instruction- and data-dependent dynamic timing behaviour of pipelined cores to provide a reliable challenge-response mechanism without requiring any extra hardware. DTA-PUF accepts sequences of instructions as an input challenge and produces an output response based on the manifested timing errors under specific over-clocked settings. To lower the required processing effort, we systematically select instruction sequences that maximise error-rate. The application to a post-layout pipelined floating-point unit, which is implemented in 45 nm process technology, demonstrates the effectiveness and practicability of our PUF design. Finally, DTA-PUF requires up to 50× fewer instructions than existing software processor PUF designs, limiting processing costs and resulting in up to 26% power savings.


Author(s):  
Kishan Maladkar

A Floating Point Unit is a math co-processor that is in the most demand of Digital Signal Processing (DSP), Processors and more. It is used to perform functions or operations on floating point numbers like addition, subtraction, multiplication, division, square root and more. It is specifically designed to carry out mathematical operations and it can be emulated in CPU. Floating point unit is a common operation used in advanced Digital Signal Processing and various processor applications. The aim was to develop an optimized floating point unit so that the delay was reduced and efficiency was increased. The floating point unit has been written according to IEEE 754 standard and the entire design has been coded in Verilog HDL. The results are improved by 12% with the usage of Vedic multiplier that is a delay of 4.450ns as compared to 5.123ns with an array multiplier. Designs can be further optimized using low power designing techniques at architectural level. Different behaviour can be observed for different size and technologies.


2021 ◽  
Vol 18 (3) ◽  
pp. 1-26
Author(s):  
Sugandha Tiwari ◽  
Neel Gala ◽  
Chester Rebeiro ◽  
V. Kamakoti

Owing to the failure of Dennard’s scaling, the past decade has seen a steep growth of prominent new paradigms leveraging opportunities in computer architecture. Two technologies of interest are Posit and RISC-V. Posit was introduced in mid-2017 as a viable alternative to IEEE-754, and RISC-V provides a commercial-grade open source Instruction Set Architecture (ISA). In this article, we bring these two technologies together and propose a Configurable Posit Enabled RISC-V Core called PERI. The article provides insights on how the Single-Precision Floating Point (“F”) extension of RISC-V can be leveraged to support posit arithmetic. We also present the implementation details of a parameterized and feature-complete posit Floating Point Unit (FPU). The configurability and the parameterization features of this unit generate optimal hardware, which caters to the accuracy and energy/area tradeoffs imposed by the applications, a feature not possible with IEEE-754 implementation. The posit FPU has been integrated with the RISC-V compliant SHAKTI C-class core as an execution unit. To further leverage the potential of posit , we enhance our posit FPU to support two different exponent sizes (with posit-size being 32-bits), thereby enabling multiple-precision at runtime. To enable the compilation and execution of C programs on PERI, we have made minimal modifications to the GNU C Compiler (GCC), targeting the “F” extension of the RISC-V. We compare posit with IEEE-754 in terms of hardware area, application accuracy, and runtime. We also present an alternate methodology of integrating the posit FPU with the RISC-V core as an accelerator using the custom opcode space of RISC-V.


2020 ◽  
Vol 1716 ◽  
pp. 012047
Author(s):  
S Sushma ◽  
Smruthi Koushika Ravindran ◽  
Pavan Rajendar Nadagoudar ◽  
P. Augusta Sophy

Author(s):  
Mohammed Falih Hassan ◽  
Karime Farhood Hussein ◽  
Bahaa Al-Musawi

<p>Due to growth in demand for high-performance applications that require high numerical stability and accuracy, the need for floating-point FPGA has been increased. In this work, an open-source and efficient floating-point unit is implemented on a standard Xilinx Sparton-6 FPGA platform. The proposed design is described in a hierarchal way starting from functional block descriptions toward modules level design. Our implementation used minimal resources available on the targeting FPGA board, tested on Sparton-6 FPGA platform and verified on ModelSim. The open-source framework can be embedded or customized for low-cost FPGA devices that do not offer floating-point units.</p>


Author(s):  
Andres Gersnoviez ◽  
Maria Brox ◽  
Carlos Castillo-Marquez ◽  
Miguel A. Montijano-Vizcaino ◽  
Manuel A. Ortiz-Lopez ◽  
...  

Author(s):  
Adrián Stacul ◽  
Daniel Pastafiglia ◽  
Ariel Di Giovanni ◽  
Martín Morales ◽  
Sergio Saluzzi ◽  
...  

<span>The Institute of Scientific and Technical Research for Defense in Argentina (Instituto de Investigaciones Científicas y Técnicas para la Defensa - CITEDEF) is developing a processing hardware module based on a ARM Cortex M4 processor from STMicroelectronics. The microcontroller (MCU) has the capacity to run at a maximum clock frequency of 180 MHz, integrates a Floating Point Unit (FPU). An 8MB SDRAM was included for dynamic data allocation. This hardware will host and process the algorithms to calculate and determine the nanosatellite’s attitude. The module is intended to be Cubesat compatible, possess a flexible design, handles various inertial sensors and can manage backups on microSD memory cards with sizes up to 32GB.</span>


Sign in / Sign up

Export Citation Format

Share Document