approximate computing Latest Research Papers

Plasticine: A Cross-layer Approximation Methodology for Multi-kernel Applications through Minimally Biased, High-throughput, and Energy-efficient SIMD Soft Multiplier-divider

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3486616 ◽

2022 ◽

Vol 27 (2) ◽

pp. 1-33

Author(s):

Zahra Ebrahimi ◽

Dennis Klar ◽

Mohammad Aasim Ekhtiyar ◽

Akash Kumar

Keyword(s):

High Throughput ◽

Performance Metrics ◽

Synergistic Effects ◽

Rapid Evolution ◽

Future Research ◽

Cross Layer ◽

Approximate Computing ◽

Approximation Techniques ◽

End To End ◽

A Chain

The rapid evolution of error-resilient programs intertwined with their quest for high throughput has motivated the use of Single Instruction, Multiple Data (SIMD) components in Field-Programmable Gate Arrays (FPGAs). Particularly, to exploit the error-resiliency of such applications, Cross-layer approximation paradigm has recently gained traction, the ultimate goal of which is to efficiently exploit approximation potentials across layers of abstraction. From circuit- to application-level, valuable studies have proposed various approximation techniques, albeit linked to four drawbacks: First, most of approximate multipliers and dividers operate only in SISD mode. Second, imprecise units are often substituted, merely in a single kernel of a multi-kernel application, with an end-to-end analysis in Quality of Results (QoR) and not in the gained performance. Third, state-of-the-art (SoA) strategies neglect the fact that each kernel contributes differently to the end-to-end QoR and performance metrics. Therefore, they lack in adopting a generic methodology for adjusting the approximation knobs to maximize performance gains for a user-defined quality constraint. Finally, multi-level techniques lack in being efficiently supported, from application-, to architecture-, to circuit-level, in a cohesive cross-layer hierarchy. In this article, we propose Plasticine , a cross-layer methodology for multi-kernel applications, which addresses the aforementioned challenges by efficiently utilizing the synergistic effects of a chain of techniques across layers of abstraction. To this end, we propose an application sensitivity analysis and a heuristic that tailor the precision at constituent kernels of the application by finding the most tolerable degree of approximations for each of consecutive kernels, while also satisfying the ultimate user-defined QoR. The chain of approximations is also effectively enabled in a cross-layer hierarchy, from application- to architecture- to circuit-level, through the plasticity of SIMD multiplier-dividers, each supporting dynamic precision variability along with hybrid functionality. The end-to-end evaluations of Plasticine on three multi-kernel applications employed in bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) demonstrate 41%–64%, 39%–62%, and 70%–86% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over 32-bit fixed precision, with negligible loss in QoR. To springboard future research in reconfigurable and approximate computing communities, our implementations will be available and open-sourced at https://cfaed.tu-dresden.de/pd-downloads.

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3477047 ◽

2022 ◽

Vol 27 (2) ◽

pp. 1-16

Author(s):

Ming Han ◽

Ye Wang ◽

Jian Dong ◽

Gang Qu

Keyword(s):

Energy Consumption ◽

Low Power ◽

Energy Cost ◽

Computing Methodology ◽

High Energy ◽

Classification Error ◽

Approximate Computing ◽

Storage Allocation ◽

Original Size ◽

Iot Devices

One major challenge in deploying Deep Neural Network (DNN) in resource-constrained applications, such as edge nodes, mobile embedded systems, and IoT devices, is its high energy cost. The emerging approximate computing methodology can effectively reduce the energy consumption during the computing process in DNN. However, a recent study shows that the weight storage and access operations can dominate DNN's energy consumption due to the fact that the huge size of DNN weights must be stored in the high-energy-cost DRAM. In this paper, we propose Double-Shift, a low-power DNN weight storage and access framework, to solve this problem. Enabled by approximate decomposition and quantization, Double-Shift can reduce the data size of the weights effectively. By designing a novel weight storage allocation strategy, Double-Shift can boost the energy efficiency by trading the energy consuming weight storage and access operations for low-energy-cost computations. Our experimental results show that Double-Shift can reduce DNN weights to 3.96%–6.38% of the original size and achieve an energy saving of 86.47%–93.62%, while introducing a DNN classification error within 2%.

Leveraging Automatic High-Level Synthesis Resource Sharing to Maximize Dynamical Voltage Overscaling with Error Control

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3473909 ◽

2022 ◽

Vol 27 (2) ◽

pp. 1-18

Author(s):

Prattay Chowdhury ◽

Benjamin Carrion Schafer

Keyword(s):

Resource Sharing ◽

Error Control ◽

Supply Voltage ◽

Maximum Error ◽

Error Threshold ◽

Training Data ◽

High Level Synthesis ◽

Approximate Computing ◽

Workload Distribution ◽

High Level

Approximate Computing has emerged as an alternative way to further reduce the power consumption of integrated circuits (ICs) by trading off errors at the output with simpler, more efficient logic. So far the main approaches in approximate computing have been to simplify the hardware circuit by pruning the circuit until the maximum error threshold is met. One of the critical issues, though, is the training data used to prune the circuit. The output error can significantly exceed the maximum error if the final workload does not match the training data. Thus, most previous work typically assumes that training data matches with the workload data distribution. In this work, we present a method that dynamically overscales the supply voltage based on different workload distribution at runtime. This allows to adaptively select the supply voltage that leads to the largest power savings while ensuring that the error will never exceed the maximum error threshold. This approach also allows restoring of the original error-free circuit if no matching workload distribution is found. The proposed method also leverages the ability of High-Level Synthesis (HLS) to automatically generate circuits with different properties by setting different synthesis constraints to maximize the available timing slack and, hence, maximize the power savings. Experimental results show that our proposed method works very well, saving on average 47.08% of power as compared to the exact output circuit and 20.25% more than a traditional approximation method.

Accurate reliability analysis methods for approximate computing circuits

Tsinghua Science & Technology ◽

10.26599/tst.2020.9010032 ◽

2022 ◽

Vol 27 (4) ◽

pp. 729-740

Author(s):

Zhen Wang ◽

Guofa Zhang ◽

Jing Ye ◽

Jianhui Jiang ◽

Fengyong Li ◽

...

Keyword(s):

Reliability Analysis ◽

Approximate Computing ◽

Analysis Methods

Approximate Computing Circuits for Embedded Tactile Data Processing

Electronics ◽

10.3390/electronics11020190 ◽

2022 ◽

Vol 11 (2) ◽

pp. 190

Author(s):

Mario Osta ◽

Ali Ibrahim ◽

Maurizio Valle

Keyword(s):

Energy Consumption ◽

Support Vector ◽

Approximate Computing ◽

Energy Consumption Reduction ◽

Consumption Reduction ◽

Kernel Approach ◽

Computational Bottleneck ◽

Value Decomposition ◽

The Cost ◽

The Impact

In this paper, we demonstrate the feasibility and efficiency of approximate computing techniques (ACTs) in the embedded Support Vector Machine (SVM) tensorial kernel circuit implementation in tactile sensing systems. Improving the performance of the embedded SVM in terms of power, area, and delay can be achieved by implementing approximate multipliers in the SVD. Singular Value Decomposition (SVD) is the main computational bottleneck of the tensorial kernel approach; since digital multipliers are extensively used in SVD implementation, we aim to optimize the implementation of the multiplier circuit. We present the implementation of the approximate SVD circuit based on the Approximate Baugh-Wooley (Approx-BW) multiplier. The approximate SVD achieves an energy consumption reduction of up to 16% at the cost of a Mean Relative Error decrease (MRE) of less than 5%. We assess the impact of the approximate SVD on the accuracy of the classification; showing that approximate SVD increases the Error rate (Err) within a range of one to eight percent. Besides, we propose a hybrid evaluation test approach that consists of implementing three different approximate SVD circuits having different numbers of approximated Least Significant Bits (LSBs). The results show that energy consumption is reduced by more than five percent with the same accuracy loss.

Design of Low Power Architecture for Approximate Parallel Mid-Point Filter

10.21203/rs.3.rs-1191570/v1 ◽

2022 ◽

Author(s):

Nelson Kingsley Joel Peter Thiagarajan ◽

Vijeyakumar K N ◽

Saravanakumar S

Keyword(s):

Image Processing ◽

Low Power ◽

Filter Design ◽

Parallel Architecture ◽

Approximate Computing ◽

Error Resilient ◽

Trade Off ◽

Power Efficient ◽

Precise Estimation ◽

Power Delay Product

Abstract Approximate computing is a modern techniques for design of low power efficient arithmetic circuits for portable error resilient applications. In this work, we have proposed a Adaptive Parallel Mid-Point Filter (APMPF) architecture using proposed imprecise Max-Min Estimator (MME)targeting digital image processing. Parallel architecture for the MME can trade-off hardware at the expense of accuracy are proposed and used in the proposed APMPF. In APMPF, we use three level of sorting to estimate the mid-point of 3 x 3 window. Switching based trimmed filter is proposed for precise estimation of the selected window. Experimental Results interms of Area, Power and Delay with 90nm ASIC technology exposed that to the least, Proposed filters demonstrate 7% and 9% Area Delay Product (ADP) and Power Delay Product (PDP) reductions, respectively, compared to precise filter design.

An efficient look up table based approximate adder for field programmable gate array

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp144-151 ◽

2022 ◽

Vol 25 (1) ◽

pp. 144

Author(s):

Hadise Ramezani ◽

Majid Mohammadi ◽

Amir Sabbagh Molahosseini

Keyword(s):

Integrated Circuits ◽

High Performance ◽

Efficient Implementation ◽

Approximate Computing ◽

Gaussian Filter ◽

Gate Arrays ◽

Output Quality ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific

The approximate computing is an alternative computing approach which can lead to high-performance implementation of audio and image processing as well as deep learning applications. However, most of the available approximate adders have been designed using application specific integrated circuits (ASICs), and they would not result in an efficient implementation on field programmable gate arrays (FPGAs). In this paper, we have designed a new approximate adder customized for efficient implementation on FPGAs, and then it has been used to build the Gaussian filter. The experimental results of the implementation of Gaussian filter based on the proposed approximate adder on a Virtex-7 FPGA, indicated that the resource utilization has decreased by 20-51%, and the designed filter delay based on the modified design methodology for building approximate adders for FPGA-based systems (MDeMAS) adder has improved 10-35%, due to the obtained output quality.

Machine Learning-Based Pruning Technique for Low Power Approximate Computing

Computer Systems Science and Engineering ◽

10.32604/csse.2022.021637 ◽

2022 ◽

Vol 42 (1) ◽

pp. 397-406

Author(s):

B. Sakthivel ◽

K. Jayaram ◽

N. Manikanda Devarajan ◽

S. Mahaboob Basha ◽

S. Rajapriya

Keyword(s):

Machine Learning ◽

Low Power ◽

Approximate Computing ◽

Pruning Technique

Stochastic Computing Emulation of Memristor Cellular Nonlinear Networks

Micromachines ◽

10.3390/mi13010067 ◽

2021 ◽

Vol 13 (1) ◽

pp. 67

Author(s):

Oscar Camps ◽

Mohamad Moner Al Chawa ◽

Stavros G. Stavrinides ◽

Rodrigo Picos

Keyword(s):

Real Time ◽

Color Image ◽

Low Cost ◽

Approximate Computing ◽

Nonlinear Networks ◽

Processing Elements ◽

Stochastic Computing ◽

Time Operation ◽

Real Time Operation ◽

Image Edge

Cellular Nonlinear Networks (CNN) are a concept introduced in 1988 by Leon Chua and Lin Yang as a bio-inspired architecture capable of massively parallel computation. Since then, CNN have been enhanced by incorporating designs that incorporate memristors to profit from their processing and memory capabilities. In addition, Stochastic Computing (SC) can be used to optimize the quantity of required processing elements; thus it provides a lightweight approximate computing framework, quite accurate and effective, however. In this work, we propose utilization of SC in designing and implementing a memristor-based CNN. As a proof of the proposed concept, an example of application is presented. This application combines Matlab and a FPGA in order to create the CNN. The implemented CNN was then used to perform three different real-time applications on a 512 × 512 gray-scale and a 768 × 512 color image: storage of the image, edge detection, and image sharpening. It has to be pointed out that the same CNN was used for the three different tasks, with the sole change of some programmable parameters. Results show an excellent capability with significant accompanying advantages, such as the low number of needed elements further allowing for a low cost FPGA-based system implementation, something confirming the system’s capacity for real time operation.

Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

Electronics ◽

10.3390/electronics11010039 ◽

2021 ◽

Vol 11 (1) ◽

pp. 39

Author(s):

Ioannis Stratakos ◽

Vasileios Leon ◽

Giorgos Armeniakos ◽

George Lentaris ◽

Dimitrios Soudris

Keyword(s):

Fixed Point ◽

Orthogonal Frequency Division Multiplexing ◽

Design Space Exploration ◽

Performance Metrics ◽

Circuit Complexity ◽

Error Rates ◽

High Order ◽

Approximate Computing ◽

Clock Frequency ◽

Approximation Techniques

Every new generation of wireless communication standard aims to improve the overall performance and quality of service (QoS), compared to the previous generations. Increased data rates, numbers and capabilities of connected devices, new applications, and higher data volume transfers are some of the key parameters that are of interest. To satisfy these increased requirements, the synergy between wireless technologies and optical transport will dominate the 5G network topologies. This work focuses on a fundamental digital function in an orthogonal frequency-division multiplexing (OFDM) baseband transceiver architecture and aims at improving the throughput and circuit complexity of this function. Specifically, we consider the high-order QAM demodulation and apply approximation techniques to achieve our goals. We adopt approximate computing as a design strategy to exploit the error resiliency of the QAM function and deliver significant gains in terms of critical performance metrics. Particularly, we take into consideration and explore four demodulation algorithms and develop accurate floating- and fixed-point circuits in VHDL. In addition, we further explore the effects of introducing approximate arithmetic components. For our test case, we consider 64-QAM demodulators, and the results suggest that the most promising design provides bit error rates (BER) ranging from 10−1 to 10−4 for SNR 0–14 dB in terms of accuracy. Targeting a Xilinx Zynq Ultrascale+ ZCU106 (XCZU7EV) FPGA device, the approximate circuits achieve up to 98% reduction in LUT utilization, compared to the accurate floating-point model of the same algorithm, and up to a 122% increase in operating frequency. In terms of power consumption, our most efficient circuit configurations consume 0.6–1.1 W when operating at their maximum clock frequency. Our results show that if the objective is to achieve high accuracy in terms of BER, the prevailing solution is the approximate LLR algorithm configured with fixed-point arithmetic and 8-bit truncation, providing 81% decrease in LUTs and 13% increase in frequency and sustains a throughput of 323 Msamples/s.

approximate computing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Plasticine: A Cross-layer Approximation Methodology for Multi-kernel Applications through Minimally Biased, High-throughput, and Energy-efficient SIMD Soft Multiplier-divider

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

Leveraging Automatic High-Level Synthesis Resource Sharing to Maximize Dynamical Voltage Overscaling with Error Control

Accurate reliability analysis methods for approximate computing circuits

Approximate Computing Circuits for Embedded Tactile Data Processing

Design of Low Power Architecture for Approximate Parallel Mid-Point Filter

An efficient look up table based approximate adder for field programmable gate array

Machine Learning-Based Pruning Technique for Low Power Approximate Computing

Stochastic Computing Emulation of Memristor Cellular Nonlinear Networks

Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

Export Citation Format

approximate computingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Plasticine: A Cross-layer Approximation Methodology for Multi-kernel Applications through Minimally Biased, High-throughput, and Energy-efficient SIMD Soft Multiplier-divider

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

Leveraging Automatic High-Level Synthesis Resource Sharing to Maximize Dynamical Voltage Overscaling with Error Control

Accurate reliability analysis methods for approximate computing circuits

Approximate Computing Circuits for Embedded Tactile Data Processing

Design of Low Power Architecture for Approximate Parallel Mid-Point Filter

An efficient look up table based approximate adder for field programmable gate array

Machine Learning-Based Pruning Technique for Low Power Approximate Computing

Stochastic Computing Emulation of Memristor Cellular Nonlinear Networks

Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

approximate computing
Recently Published Documents