ACM Transactions on Design Automation of Electronic Systems
Latest Publications


TOTAL DOCUMENTS

1217
(FIVE YEARS 199)

H-INDEX

46
(FIVE YEARS 5)

Published By Association For Computing Machinery

1084-4309

2022 ◽  
Vol 27 (3) ◽  
pp. 1-19
Author(s):  
Si Chen ◽  
Guoqi Xie ◽  
Renfa Li ◽  
Keqin Li

Reasonable partitioning is a critical issue for cyber-physical system (CPS) design. Traditional CPS partitioning methods run in a determined context and depend on the parameter pre-estimations, but they ignore the uncertainty of parameters and hardly consider reliability. The state-of-the-art work proposed an uncertainty theory based CPS partitioning method, which includes parameter uncertainty and reliability analysis, but it only considers linear uncertainty distributions for variables and ignores the uncertainty of reliability. In this paper, we propose an uncertainty theory based CPS partitioning method with uncertain reliability analysis. We convert the uncertain objective and constraint into determined forms; such conversion methods can be applied to all forms of uncertain variables, not just for linear. By applying uncertain reliability analysis in the uncertainty model, we for the first time include the uncertainty of reliability into the CPS partitioning, where the reliability enhancement algorithm is proposed. We study the performance of the reliability obtained through uncertain reliability analysis, and experimental results show that the system reliability with uncertainty does not change significantly with the growth of task module numbers.


2022 ◽  
Vol 27 (3) ◽  
pp. 1-26
Author(s):  
Mahabub Hasan Mahalat ◽  
Suraj Mandal ◽  
Anindan Mondal ◽  
Bibhash Sen ◽  
Rajat Subhra Chakraborty

Secure authentication of any Internet-of-Things (IoT) device becomes the utmost necessity due to the lack of specifically designed IoT standards and intrinsic vulnerabilities with limited resources and heterogeneous technologies. Despite the suitability of arbiter physically unclonable function (APUF) among other PUF variants for the IoT applications, implementing it on field-programmable gate arrays (FPGAs) is challenging. This work presents the complete characterization of the path changing switch (PCS) 1 based APUF on two different families of FPGA, like Spartan-3E (90 nm CMOS) and Artix-7 (28 nm CMOS). A comprehensive study of the existing tuning concept for programmable delay logic (PDL) based APUF implemented on FPGA is presented, leading to establishment of its practical infeasibility. We investigate the entropy, randomness properties of the PCS based APUF suitable for practical applications, and the effect of temperature variation signifying the adequate tolerance against environmental variation. The XOR composition of PCS based APUF is introduced to boost performance and security. The robustness of the PCS based APUF against machine learning based modeling attack is evaluated, showing similar characteristics as the conventional APUF. Experimental results validate the efficacy of PCS based APUF with a little hardware footprint removing the paucity of lightweight security primitive for IoT.


2022 ◽  
Vol 27 (3) ◽  
pp. 1-26
Author(s):  
Skandha Deepsita S ◽  
Dhayala Kumar M ◽  
Noor Mahammad SK

The approximate hardware design can save huge energy at the cost of errors incurred in the design. This article proposes the approximate algorithm for low-power compressors, utilized to build approximate multiplier with low energy and acceptable error profiles. This article presents two design approaches (DA1 and DA2) for higher bit size approximate multipliers. The proposed multiplier of DA1 have no propagation of carry signal from LSB to MSB, resulted in a very high-speed design. The increment in delay, power, and energy are not exponential with increment of multiplier size ( n ) for DA1 multiplier. It can be observed that the maximum combinations lie in the threshold Error Distance of 5% of the maximum value possible for any particular multiplier of size n . The proposed 4-bit DA1 multiplier consumes only 1.3 fJ of energy, which is 87.9%, 78%, 94%, 67.5%, and 58.9% less when compared to M1, M2, LxA, MxA, accurate designs respectively. The DA2 approach is recursive method, i.e., n -bit multiplier built with n/2-bit sub-multipliers. The proposed 8-bit multiplication has 92% energy savings with Mean Relative Error Distance (MRED) of 0.3 for the DA1 approach and at least 11% to 40% of energy savings with MRED of 0.08 for the DA2 approach. The proposed multipliers are employed in the image processing algorithm of DCT, and the quality is evaluated. The standard PSNR metric is 55 dB for less approximation and 35 dB for maximum approximation.


2022 ◽  
Vol 27 (3) ◽  
pp. 1-31
Author(s):  
Yukui Luo ◽  
Shijin Duan ◽  
Xiaolin Xu

With the emerging cloud-computing development, FPGAs are being integrated with cloud servers for higher performance. Recently, it has been explored to enable multiple users to share the hardware resources of a remote FPGA, i.e., to execute their own applications simultaneously. Although being a promising technique, multi-tenant FPGA unfortunately brings its unique security concerns. It has been demonstrated that the capacitive crosstalk between FPGA long-wires can be a side-channel to extract secret information, giving adversaries the opportunity to implement crosstalk-based side-channel attacks. Moreover, recent work reveals that medium-wires and multiplexers in configurable logic block (CLB) are also vulnerable to crosstalk-based information leakage. In this work, we propose FPGAPRO: a defense framework leveraging P lacement, R outing, and O bfuscation to mitigate the secret leakage on FPGA components, including long-wires, medium-wires, and logic elements in CLB. As a user-friendly defense strategy, FPGAPRO focuses on protecting the security-sensitive instances meanwhile considering critical path delay for performance maintenance. As the proof-of-concept, the experimental result demonstrates that FPGAPRO can effectively reduce the crosstalk-caused side-channel leakage by 138 times. Besides, the performance analysis shows that this strategy prevents the maximum frequency from timing violation.


2022 ◽  
Vol 27 (3) ◽  
pp. 1-24
Author(s):  
Lang Feng ◽  
Jiayi Huang ◽  
Jeff Huang ◽  
Jiang Hu

Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks. However, its real-world application has been quite limited so far because of the prohibitive performance overhead it incurs. Moreover, the overhead is enormously difficult to overcome without substantially lowering the DFI criterion. In this work, an analysis is performed to understand the main factors contributing to the overhead. Accordingly, a hardware-assisted parallel approach is proposed to tackle the overhead challenge. Simulations on SPEC CPU 2006 benchmark show that the proposed approach can completely enforce the DFI defined in the original seminal work while reducing performance overhead by 4×, on average.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-33
Author(s):  
Zahra Ebrahimi ◽  
Dennis Klar ◽  
Mohammad Aasim Ekhtiyar ◽  
Akash Kumar

The rapid evolution of error-resilient programs intertwined with their quest for high throughput has motivated the use of Single Instruction, Multiple Data (SIMD) components in Field-Programmable Gate Arrays (FPGAs). Particularly, to exploit the error-resiliency of such applications, Cross-layer approximation paradigm has recently gained traction, the ultimate goal of which is to efficiently exploit approximation potentials across layers of abstraction. From circuit- to application-level, valuable studies have proposed various approximation techniques, albeit linked to four drawbacks: First, most of approximate multipliers and dividers operate only in SISD mode. Second, imprecise units are often substituted, merely in a single kernel of a multi-kernel application, with an end-to-end analysis in Quality of Results (QoR) and not in the gained performance. Third, state-of-the-art (SoA) strategies neglect the fact that each kernel contributes differently to the end-to-end QoR and performance metrics. Therefore, they lack in adopting a generic methodology for adjusting the approximation knobs to maximize performance gains for a user-defined quality constraint. Finally, multi-level techniques lack in being efficiently supported, from application-, to architecture-, to circuit-level, in a cohesive cross-layer hierarchy. In this article, we propose Plasticine , a cross-layer methodology for multi-kernel applications, which addresses the aforementioned challenges by efficiently utilizing the synergistic effects of a chain of techniques across layers of abstraction. To this end, we propose an application sensitivity analysis and a heuristic that tailor the precision at constituent kernels of the application by finding the most tolerable degree of approximations for each of consecutive kernels, while also satisfying the ultimate user-defined QoR. The chain of approximations is also effectively enabled in a cross-layer hierarchy, from application- to architecture- to circuit-level, through the plasticity of SIMD multiplier-dividers, each supporting dynamic precision variability along with hybrid functionality. The end-to-end evaluations of Plasticine  on three multi-kernel applications employed in bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) demonstrate 41%–64%, 39%–62%, and 70%–86% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over 32-bit fixed precision, with negligible loss in QoR. To springboard future research in reconfigurable and approximate computing communities, our implementations will be available and open-sourced at https://cfaed.tu-dresden.de/pd-downloads.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-33
Author(s):  
Liu Liu ◽  
Sibren Isaacman ◽  
Ulrich Kremer

Many embedded environments require applications to produce outcomes under different, potentially changing, resource constraints. Relaxing application semantics through approximations enables trading off resource usage for outcome quality. Although quality is a highly subjective notion, previous work assumes given, fixed low-level quality metrics that often lack a strong correlation to a user’s higher-level quality experience. Users may also change their minds with respect to their quality expectations depending on the resource budgets they are willing to dedicate to an execution. This motivates the need for an adaptive application framework where users provide execution budgets and a customized quality notion. This article presents a novel adaptive program graph representation that enables user-level, customizable quality based on basic quality aspects defined by application developers. Developers also define application configuration spaces, with possible customization to eliminate undesirable configurations. At runtime, the graph enables the dynamic selection of the configuration with maximal customized quality within the user-provided resource budget. An adaptive application framework based on our novel graph representation has been implemented on Android and Linux platforms and evaluated on eight benchmark programs, four with fully customizable quality. Using custom quality instead of the default quality, users may improve their subjective quality experience value by up to 3.59×, with 1.76× on average under different resource constraints. Developers are able to exploit their application structure knowledge to define configuration spaces that are on average 68.7% smaller as compared to existing, structure-oblivious approaches. The overhead of dynamic reconfiguration averages less than 1.84% of the overall application execution time.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-16
Author(s):  
Ming Han ◽  
Ye Wang ◽  
Jian Dong ◽  
Gang Qu

One major challenge in deploying Deep Neural Network (DNN) in resource-constrained applications, such as edge nodes, mobile embedded systems, and IoT devices, is its high energy cost. The emerging approximate computing methodology can effectively reduce the energy consumption during the computing process in DNN. However, a recent study shows that the weight storage and access operations can dominate DNN's energy consumption due to the fact that the huge size of DNN weights must be stored in the high-energy-cost DRAM. In this paper, we propose Double-Shift, a low-power DNN weight storage and access framework, to solve this problem. Enabled by approximate decomposition and quantization, Double-Shift can reduce the data size of the weights effectively. By designing a novel weight storage allocation strategy, Double-Shift can boost the energy efficiency by trading the energy consuming weight storage and access operations for low-energy-cost computations. Our experimental results show that Double-Shift can reduce DNN weights to 3.96%–6.38% of the original size and achieve an energy saving of 86.47%–93.62%, while introducing a DNN classification error within 2%.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-18
Author(s):  
Shaahin Angizi ◽  
Navid Khoshavi ◽  
Andrew Marshall ◽  
Peter Dowben ◽  
Deliang Fan

Magneto-Electric FET ( MEFET ) is a recently developed post-CMOS FET, which offers intriguing characteristics for high-speed and low-power design in both logic and memory applications. In this article, we present MeF-RAM , a non-volatile cache memory design based on 2-Transistor-1-MEFET ( 2T1M ) memory bit-cell with separate read and write paths. We show that with proper co-design across MEFET device, memory cell circuit, and array architecture, MeF-RAM is a promising candidate for fast non-volatile memory ( NVM ). To evaluate its cache performance in the memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework to quantitatively analyze and benchmark the MeF-RAM design with other memory technologies, including both volatile memory (i.e., SRAM, eDRAM) and other popular non-volatile emerging memory (i.e., ReRAM, STT-MRAM, and SOT-MRAM). The experiment results for the PARSEC benchmark suite indicate that, as an L2 cache memory, MeF-RAM reduces Energy Area Latency ( EAT ) product on average by ~98% and ~70% compared with typical 6T-SRAM and 2T1R SOT-MRAM counterparts, respectively.


2022 ◽  
Vol 27 (2) ◽  
pp. 1-19
Author(s):  
Tiancong Bu ◽  
Kaige Yan ◽  
Jingweijia Tan

Dense SLAM is an important application on an embedded environment. However, embedded platforms usually fail to provide enough computation resources for high-accuracy real-time dense SLAM, even with high-parallelism architecture such as GPUs. To tackle this problem, one solution is to design proper approximation techniques for dense SLAM on embedded GPUs. In this work, we propose two novel approximation techniques, critical data identification and redundant branch elimination. We also analyze the error characteristics of the other two techniques—loop skipping and thread approximation. Then, we propose SLaPP, an online adaptive approximation controller, which aims to control the error to be under an acceptable threshold. The evaluation shows SLaPP can achieve 2.0× performance speedup and 30% energy saving on average compared to the case without approximation.


Sign in / Sign up

Export Citation Format

Share Document