Design and Implementation of a Reconfigurable Cryptographic Coprocessor with Multiple Side-Channel Attacks Countermeasures

Nowadays, countermeasures against side-channel attack (SCA) have become necessary in hardware security. And the need for supporting multiple crypto algorithms on a chip is increasing. We propose a reconfigurable crypto coprocessor, which not only supports multiple crypto algorithms, but also provides multiple effective SCA countermeasures of SPA, DPA and EMA, by making use of its own reconfigurable features other than using extra resources. The countermeasure methods include several global and encryption flow related countermeasures, which can also be reconfigured along with the circuit function. This coprocessor is a coarse-grained reconfigurable architecture composed of several reconfigurable modules, such as logic arithmetic, shift, modular ADD/Substrate, permutation, S-box and modular multiplication units, all of which are reconfigurable. This reconfigurable cryptographic coprocessor is integrated into a system-on chip with a 32-bit CPU and fabricated in 0.18 m CMOS process with 1.8[Formula: see text]V supply and 100 MHz maximum frequency. Experimental results show that it can successfully resist SPA and DPA with one million power traces. As for EMA, if we use full countermeasures, it can resist EMA with up to 1.2 million electromagnetic traces without revealing the right subkey. Thus, this reconfigurable coprocessor can provide a good solution for both supporting multiple algorithms and providing SCA resistance, with no frequency influence, neglectable area overhead and small power overhead.

Download Full-text

AMROFloor: An Efficient Aging Mitigation and Resource Optimization Floorplanner for Virtual Coarse-Grained Runtime Reconfigurable FPGAs

Electronics ◽

10.3390/electronics11020273 ◽

2022 ◽

Vol 11 (2) ◽

pp. 273

Author(s):

Zeyu Li ◽

Zhao Huang ◽

Quan Wang ◽

Junjie Wang

Keyword(s):

Resource Optimization ◽

Coarse Grained ◽

Cmos Process ◽

Time To Failure ◽

Effective Solution ◽

Aging Effects ◽

Rapid Reduction ◽

Silicon Accumulation ◽

Long Time ◽

On Chip

With the rapid reduction of CMOS process size, the FPGAs with high-silicon accumulation technology are becoming more sensitive to aging effects. This reduces the reliability and service life of the device. The offline aging-aware layout planning based on balance stress is an effective solution. However, the existing methods need to take a long time to solve the floorplanner, and the corresponding layout solutions occupy many on-chip resources. To this end, we proposed an efficient Aging Mitigation and Resource Optimization Floorplanner (AMROFloor) for FPGAs. First, the layout solution is implemented on the Virtual Coarse-Grained Runtime Reconfigurable Architecture, which contributes to avoiding rule constraints for placement and routing. Second, the Maximize Reconfigurable Regions Algorithm (MRRA) is proposed to quickly determine the RRs’ number and size to save the solving time and ensure an effective solution. Furthermore, the Resource Combination Algorithm (RCA) is proposed to optimize the on-chip resources, reducing the on-Chip Resource Utilization (CRU) while achieving the same aging relief effect. Experiments were simulated and implemented on Xilinx FPGA. The results demonstrate that the AMROFloor method designed in this paper can extend the Mean Time to Failure (MTTF) by 13.8% and optimize the resource overhead by 19.2% on average compared to the existing aging-aware layout solutions.

Download Full-text

Design and Implementation of Operation-Reduced LDPC Decoder Based on a Check Node Stopping Scheme

Journal of Circuits System and Computers ◽

10.1142/s0218126617500281 ◽

2016 ◽

Vol 26 (02) ◽

pp. 1750028

Author(s):

Cheng-Hung Lin ◽

Tzu-Hsuan Huang ◽

Shu-Yen Lin ◽

Yu-Hsuan Lee

Keyword(s):

Power Dissipation ◽

Message Passing ◽

Cmos Process ◽

Two Phase ◽

Ldpc Decoder ◽

Design And Implementation ◽

Ldpc Decoding ◽

Area Overhead ◽

Check Node ◽

Decoder Design

In this paper, we propose an operation-reduced low-density parity check (LDPC) decoder design and implementation by stopping reliable operation of check nodes of the iterative two-phase message passing (TPMP) min-sum algorithm (MSA). A check node stopping (CNS) scheme is used to tag reliability of check nodes by detecting the magnitudes of the check node belief messages with a threshold. The operation of reliable check nodes tagged by the CNS scheme can be stopped in the later iterations. The proposed LDPC decoder that employs the CNS scheme can significantly terminate the redundant operations of check nodes and efficiently reduce the power consumption of decoder. From the simulations under WiMAX QC LDPC decoding with high channel quality, the CNS scheme achieves up to 12% stopping rate of check nodes with a loss of coding gain less than 0.1 dB. The WiMAX QC LDPC decoder chip that employs the CNS scheme is implemented by a 90-nm CMOS process. Compared with the LDPC decoder that employs no CNS scheme, the overall power dissipation of the proposed LDPC decoder is decreased by 4.1% with 0.5% area overhead.

Download Full-text

Heterogeneous Integration of Boost Power Supply and On-Chip Solar Cell using triple well CMOS Process

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.138.41 ◽

2018 ◽

Vol 138 (1) ◽

pp. 41-49

Author(s):

Kazuma Igarashi ◽

Yoshimasa Minami ◽

Nobuhiko Nakano

Keyword(s):

Solar Cell ◽

Power Supply ◽

Heterogeneous Integration ◽

Cmos Process ◽

On Chip

Download Full-text

Fabrication of Monolithic Integrated Bimaterial Resonant Uncooled IR Sensor

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.543.176 ◽

2013 ◽

Vol 543 ◽

pp. 176-179 ◽

Cited By ~ 1

Author(s):

D.Q. Zhao ◽

Xia Zhang ◽

P. Liu ◽

F. Yang ◽

C. Lin ◽

...

Keyword(s):

Silicon Nitride ◽

Hole Mobility ◽

Micro Electro Mechanical System ◽

Cmos Process ◽

Standard Cmos Process ◽

Ir Sensor ◽

Cantilever Structure ◽

Cmos Mems ◽

On Chip ◽

Micro Cantilever

In this work we studied the fabrication of a monolithic bimaterial micro-cantilever resonant IR sensor with on-chip drive circuits. The effects of high temperature process and stress induced performance degradation were investigated. The post-CMOS MEMS (micro electro mechanical system) fabrication process of this IR sensor is the focus of this paper, starting from theoretical analysis and simulation, and then moving to experimental verification. The capacitive cantilever structure was fabricated by surface micromachining method, and drive circuits were prepared by standard CMOS process. While the stress introduced by MEMS films, such as the tensile silicon nitride which works as a contact etch stopper layer for MOSFETs and releasing stop layer for the MEMS structure, increases the electron mobility of NMOS, PMOS hole mobility decreases. Moreover, the NMOS threshold voltage (Vth) shifts, and transconductance (Gm) degrades. An additional step of selective removing silicon nitride capping layer and polysilicon layer upon IC area were inserted into the standard CMOS process to lower the stress in MOSFET channel regions. Selective removing silicon nitride and polysilicon before annealing can void 77% Vth shift and 86% Gm loss.

Download Full-text

Design and Implementation of Tunable Network on Chip for FPGA applications

2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) ◽

10.1109/i-smac49090.2020.9243305 ◽

2020 ◽

Author(s):

Varsha Joy

Keyword(s):

Network On Chip ◽

Design And Implementation ◽

On Chip

Download Full-text

A 1.93-pJ/Bit PCI Express Gen4 PHY Transmitter with On-Chip Supply Regulators in 28 nm CMOS

Electronics ◽

10.3390/electronics10010068 ◽

2021 ◽

Vol 10 (1) ◽

pp. 68

Author(s):

Woorham Bae ◽

Sung-Yong Cho ◽

Deog-Kyoon Jeong

Keyword(s):

Finite Impulse Response ◽

Cmos Process ◽

Pci Express ◽

Fully Integrated ◽

Peripheral Component Interconnect ◽

Low Power Cmos ◽

On Chip ◽

Output Driver ◽

Wide Operating ◽

28 Nm

This paper presents a fully integrated Peripheral Component Interconnect (PCI) Express (PCIe) Gen4 physical layer (PHY) transmitter. The prototype chip is fabricated in a 28 nm low-power CMOS process, and the active area of the proposed transmitter is 0.23 mm2. To enable voltage scaling across wide operating rates from 2.5 Gb/s to 16 Gb/s, two on-chip supply regulators are included in the transmitter. At the same time, the regulators maintain the output impedance of the transmitter to meet the return loss specification of the PCIe, by including replica segments of the output driver and reference resistance in the regulator loop. A three-tap finite-impulse-response (FIR) equalization is implemented and, therefore, the transmitter provides more than 9.5 dB equalization which is required in the PCIe specification. At 16 Gb/s, the prototype chip achieves energy efficiency of 1.93 pJ/bit including all the interface, bias, and built-in self-test circuits.

Download Full-text

ThermalAttackNet: Are CNNs Making It Easy to Perform Temperature Side-Channel Attack in Mobile Edge Devices?

Future Internet ◽

10.3390/fi13060146 ◽

2021 ◽

Vol 13 (6) ◽

pp. 146

Author(s):

Somdip Dey ◽

Amit Kumar Singh ◽

Klaus McDonald-Maier

Keyword(s):

Information Flow ◽

Heat Dissipation ◽

Side Channel ◽

Side Channel Attack ◽

Side Channel Attacks ◽

Information Flow Control ◽

On Chip ◽

Temperature Side ◽

Over Time ◽

Memory Efficient

Side-channel attacks remain a challenge to information flow control and security in mobile edge devices till this date. One such important security flaw could be exploited through temperature side-channel attacks, where heat dissipation and propagation from the processing cores are observed over time in order to deduce security flaws. In this paper, we study how computer vision-based convolutional neural networks (CNNs) could be used to exploit temperature (thermal) side-channel attack on different Linux governors in mobile edge device utilizing multi-processor system-on-chip (MPSoC). We also designed a power- and memory-efficient CNN model that is capable of performing thermal side-channel attack on the MPSoC and can be used by industry practitioners and academics as a benchmark to design methodologies to secure against such an attack in MPSoC.

Download Full-text

Timing attacks and local timing attacks against Barrett’s modular multiplication algorithm

Journal of Cryptographic Engineering ◽

10.1007/s13389-020-00254-3 ◽

2021 ◽

Author(s):

Johannes Mittmann ◽

Werner Schindler

Keyword(s):

Side Channel ◽

Modular Exponentiation ◽

Modular Multiplication ◽

Multiplication Algorithm ◽

Diffie Hellman ◽

Mathematical Difficulties ◽

Execution Times ◽

Stochastic Properties ◽

Timing Attacks ◽

Theoretical Results

AbstractMontgomery’s and Barrett’s modular multiplication algorithms are widely used in modular exponentiation algorithms, e.g. to compute RSA or ECC operations. While Montgomery’s multiplication algorithm has been studied extensively in the literature and many side-channel attacks have been detected, to our best knowledge no thorough analysis exists for Barrett’s multiplication algorithm. This article closes this gap. For both Montgomery’s and Barrett’s multiplication algorithm, differences of the execution times are caused by conditional integer subtractions, so-called extra reductions. Barrett’s multiplication algorithm allows even two extra reductions, and this feature increases the mathematical difficulties significantly. We formulate and analyse a two-dimensional Markov process, from which we deduce relevant stochastic properties of Barrett’s multiplication algorithm within modular exponentiation algorithms. This allows to transfer the timing attacks and local timing attacks (where a second side-channel attack exhibits the execution times of the particular modular squarings and multiplications) on Montgomery’s multiplication algorithm to attacks on Barrett’s algorithm. However, there are also differences. Barrett’s multiplication algorithm requires additional attack substeps, and the attack efficiency is much more sensitive to variations of the parameters. We treat timing attacks on RSA with CRT, on RSA without CRT, and on Diffie–Hellman, as well as local timing attacks against these algorithms in the presence of basis blinding. Experiments confirm our theoretical results.

Download Full-text

Categorization and SEU Fault Simulations of Radiation-Hardened-by-Design Flip-Flops

Electronics ◽

10.3390/electronics10131572 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1572

Author(s):

Ehab A. Hamed ◽

Inhee Lee

Keyword(s):

Soft Error ◽

Error Tolerance ◽

Cmos Process ◽

Flip Flop ◽

Single Event Upsets ◽

Area Overhead ◽

Fair Comparison ◽

Reference Design ◽

Radiation Hardened ◽

Radiation Hardened By Design

In the previous three decades, many Radiation-Hardened-by-Design (RHBD) Flip-Flops (FFs) have been designed and improved to be immune to Single Event Upsets (SEUs). Their specifications are enhanced regarding soft error tolerance, area overhead, power consumption, and delay. In this review, previously presented RHBD FFs are classified into three categories with an overview of each category. Six well-known RHBD FFs architectures are simulated using a 180 nm CMOS process to show a fair comparison between them while the conventional Transmission Gate Flip-Flop (TGFF) is used as a reference design for this comparison. The results of the comparison are analyzed to give some important highlights about each design.

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text