hardware accelerators
Recently Published Documents


TOTAL DOCUMENTS

342
(FIVE YEARS 126)

H-INDEX

21
(FIVE YEARS 6)

2022 ◽  
Vol 15 (1) ◽  
pp. 1-26
Author(s):  
Mathieu Gross ◽  
Konrad Hohentanner ◽  
Stefan Wiehler ◽  
Georg Sigl

Isolated execution is a concept commonly used for increasing the security of a computer system. In the embedded world, ARM TrustZone technology enables this goal and is currently used on mobile devices for applications such as secure payment or biometric authentication. In this work, we investigate the security benefits achievable through the usage of ARM TrustZone on FPGA-SoCs. We first adapt Microsoft’s implementation of a firmware Trusted Platform Module (fTPM) running inside ARM TrustZone for the Zynq UltraScale+ platform. This adaptation consists in integrating hardware accelerators available on the device to fTPM’s implementation and to enhance fTPM with an entropy source derived from on-chip SRAM start-up patterns. With our approach, we transform a software implementation of a TPM into a hybrid hardware/software design that could address some of the security drawbacks of the original implementation while keeping its flexibility. To demonstrate the security gains obtained via the usage of ARM TrustZone and our hybrid-TPM on FPGA-SoCs, we propose a framework that combines them for enabling a secure remote bitstream loading. The approach consists in preventing the insecure usages of a bitstream reconfiguration interface that are made possible by the manufacturer and to integrate the interface inside a Trusted Execution Environment.


2021 ◽  
Author(s):  
Xiaofan Zhang ◽  
Dawei Wang ◽  
Pierce Chuang ◽  
Shugao Ma ◽  
Deming Chen ◽  
...  

2021 ◽  
pp. 41-52
Author(s):  
Sree Ranjani Rajendran ◽  
Rajat Subhra Chakraborty

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-21
Author(s):  
Fateme S. Hosseini ◽  
Fanruo Meng ◽  
Chengmo Yang ◽  
Wujie Wen ◽  
Rosario Cammarota

Hardware accelerators are essential to the accommodation of ever-increasing Deep Neural Network (DNN) workloads on the resource-constrained embedded devices. While accelerators facilitate fast and energy-efficient DNN operations, their accuracy is threatened by faults in their on-chip and off-chip memories, where millions of DNN weights are held. The use of emerging Non-Volatile Memories (NVM) further exposes DNN accelerators to a non-negligible rate of permanent defects due to immature fabrication, limited endurance, and aging. To tolerate defects in NVM-based DNN accelerators, previous work either requires extra redundancy in hardware or performs defect-aware retraining, imposing significant overhead. In comparison, this paper proposes a set of algorithms that exploit the flexibility in setting the fault-free bits in weight memory to effectively approximate weight values, so as to mitigate defect-induced accuracy drop. These algorithms can be applied as a one-step solution when loading the weights to embedded devices. They only require trivial hardware support and impose negligible run-time overhead. Experiments on popular DNN models show that the proposed techniques successfully boost inference accuracy even in the face of elevated defect rates in the weight memory.


Micromachines ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1243
Author(s):  
Tommaso Zanotti ◽  
Paolo Pavan ◽  
Francesco Maria Puglisi

Logic-in-memory (LIM) circuits based on the material implication logic (IMPLY) and resistive random access memory (RRAM) technologies are a candidate solution for the development of ultra-low power non-von Neumann computing architectures. Such architectures could enable the energy-efficient implementation of hardware accelerators for novel edge computing paradigms such as binarized neural networks (BNNs) which rely on the execution of logic operations. In this work, we present the multi-input IMPLY operation implemented on a recently developed smart IMPLY architecture, SIMPLY, which improves the circuit reliability, reduces energy consumption, and breaks the strict design trade-offs of conventional architectures. We show that the generalization of the typical logic schemes used in LIM circuits to multi-input operations strongly reduces the execution time of complex functions needed for BNNs inference tasks (e.g., the 1-bit Full Addition, XNOR, Popcount). The performance of four different RRAM technologies is compared using circuit simulations leveraging a physics-based RRAM compact model. The proposed solution approaches the performance of its CMOS equivalent while bypassing the von Neumann bottleneck, which gives a huge improvement in bit error rate (by a factor of at least 108) and energy-delay product (projected up to a factor of 1010).


2021 ◽  
Vol 25 (2) ◽  
pp. 5-13
Author(s):  
Mahadev Satyanarayanan ◽  
Nathan Beckmann ◽  
Grace A. Lewis ◽  
Brandon Lucia

This position paper examines a spectrum of approaches to overcoming the limited computing power of mobile devices caused by their need to be small, lightweight and energy efficient. At one extreme is offloading of compute-intensive operations to a cloudlet nearby. At the other extreme is the use of fixed-function hardware accelerators on mobile devices. Between these endpoints lie various configurations of programmable hardware accelerators. We explore the strengths and weaknesses of these approaches and conclude that they are, in fact, complementary. Based on this insight, we advocate a softwarehardware co-evolution path that combines their strengths.


2021 ◽  
Vol 13 (18) ◽  
pp. 3606
Author(s):  
Lorenzo Diana ◽  
Jia Xu ◽  
Luca Fanucci

Oil spills represent one of the major threats to marine ecosystems. Satellite synthetic-aperture radar (SAR) sensors have been widely used to identify oil spills due to their ability to provide high resolution images during day and night under all weather conditions. In recent years, the use of artificial intelligence (AI) systems, especially convolutional neural networks (CNNs), have led to many important improvements in performing this task. However, most of the previous solutions to this problem have focused on obtaining the best performance under the assumption that there are no constraints on the amount of hardware resources being used. For this reason, the amounts of hardware resources such as memory and power consumption required by previous solutions make them unsuitable for remote embedded systems such as nano and micro-satellites, which usually have very limited hardware capability and very strict limits on power consumption. In this paper, we present a CNN architecture for semantically segmenting SAR images into multiple classes. The proposed CNN is specifically designed to run on remote embedded systems, which have very limited hardware capability and strict limits on power consumption. Even if the performance in terms of results accuracy does not represent a step forward compared with previous solutions, the presented CNN has the important advantage of being able to run on remote embedded systems with limited hardware resources while achieving good performance. The presented CNN is compatible with dedicated hardware accelerators available on the market due to its low memory footprint and small size. It also provides many additional very significant advantages, such as having shorter inference times, requiring shorter training times, and avoiding transmission of irrelevant data. Our goal is to allow embedded low power remote devices such as satellite systems for remote sensing to be able to directly run CNNs on board, so that the amount of data that needs to be transmitted to ground and processed on ground can be substantially reduced, which will be greatly beneficial in significantly reducing the amount of time needed for identification of oil spills from SAR images.


Materials ◽  
2021 ◽  
Vol 14 (18) ◽  
pp. 5223
Author(s):  
Nikolaos Vasileiadis ◽  
Vasileios Ntinas ◽  
Georgios Ch. Sirakoulis ◽  
Panagiotis Dimitrakis

State-of-the-art IoT technologies request novel design solutions in edge computing, resulting in even more portable and energy-efficient hardware for in-the-field processing tasks. Vision sensors, processors, and hardware accelerators are among the most demanding IoT applications. Resistance switching (RS) two-terminal devices are suitable for resistive RAMs (RRAM), a promising technology to realize storage class memories. Furthermore, due to their memristive nature, RRAMs are appropriate candidates for in-memory computing architectures. Recently, we demonstrated a CMOS compatible silicon nitride (SiNx) MIS RS device with memristive properties. In this paper, a report on a new photodiode-based vision sensor architecture with in-memory computing capability, relying on memristive device, is disclosed. In this context, the resistance switching dynamics of our memristive device were measured and a data-fitted behavioral model was extracted. SPICE simulations were made highlighting the in-memory computing capabilities of the proposed photodiode-one memristor pixel vision sensor. Finally, an integration and manufacturing perspective was discussed.


Sign in / Sign up

Export Citation Format

Share Document