Design and Implementation of a Reconfigurable Cryptographic Coprocessor with Multiple Side-Channel Attacks Countermeasures

2018 ◽  
Vol 27 (11) ◽  
pp. 1850180 ◽  
Author(s):  
Xinchao Shang ◽  
Weiwei Shan ◽  
Xinning Liu

Nowadays, countermeasures against side-channel attack (SCA) have become necessary in hardware security. And the need for supporting multiple crypto algorithms on a chip is increasing. We propose a reconfigurable crypto coprocessor, which not only supports multiple crypto algorithms, but also provides multiple effective SCA countermeasures of SPA, DPA and EMA, by making use of its own reconfigurable features other than using extra resources. The countermeasure methods include several global and encryption flow related countermeasures, which can also be reconfigured along with the circuit function. This coprocessor is a coarse-grained reconfigurable architecture composed of several reconfigurable modules, such as logic arithmetic, shift, modular ADD/Substrate, permutation, S-box and modular multiplication units, all of which are reconfigurable. This reconfigurable cryptographic coprocessor is integrated into a system-on chip with a 32-bit CPU and fabricated in 0.18 m CMOS process with 1.8[Formula: see text]V supply and 100 MHz maximum frequency. Experimental results show that it can successfully resist SPA and DPA with one million power traces. As for EMA, if we use full countermeasures, it can resist EMA with up to 1.2 million electromagnetic traces without revealing the right subkey. Thus, this reconfigurable coprocessor can provide a good solution for both supporting multiple algorithms and providing SCA resistance, with no frequency influence, neglectable area overhead and small power overhead.

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 273
Author(s):  
Zeyu Li ◽  
Zhao Huang ◽  
Quan Wang ◽  
Junjie Wang

With the rapid reduction of CMOS process size, the FPGAs with high-silicon accumulation technology are becoming more sensitive to aging effects. This reduces the reliability and service life of the device. The offline aging-aware layout planning based on balance stress is an effective solution. However, the existing methods need to take a long time to solve the floorplanner, and the corresponding layout solutions occupy many on-chip resources. To this end, we proposed an efficient Aging Mitigation and Resource Optimization Floorplanner (AMROFloor) for FPGAs. First, the layout solution is implemented on the Virtual Coarse-Grained Runtime Reconfigurable Architecture, which contributes to avoiding rule constraints for placement and routing. Second, the Maximize Reconfigurable Regions Algorithm (MRRA) is proposed to quickly determine the RRs’ number and size to save the solving time and ensure an effective solution. Furthermore, the Resource Combination Algorithm (RCA) is proposed to optimize the on-chip resources, reducing the on-Chip Resource Utilization (CRU) while achieving the same aging relief effect. Experiments were simulated and implemented on Xilinx FPGA. The results demonstrate that the AMROFloor method designed in this paper can extend the Mean Time to Failure (MTTF) by 13.8% and optimize the resource overhead by 19.2% on average compared to the existing aging-aware layout solutions.


2016 ◽  
Vol 26 (02) ◽  
pp. 1750028
Author(s):  
Cheng-Hung Lin ◽  
Tzu-Hsuan Huang ◽  
Shu-Yen Lin ◽  
Yu-Hsuan Lee

In this paper, we propose an operation-reduced low-density parity check (LDPC) decoder design and implementation by stopping reliable operation of check nodes of the iterative two-phase message passing (TPMP) min-sum algorithm (MSA). A check node stopping (CNS) scheme is used to tag reliability of check nodes by detecting the magnitudes of the check node belief messages with a threshold. The operation of reliable check nodes tagged by the CNS scheme can be stopped in the later iterations. The proposed LDPC decoder that employs the CNS scheme can significantly terminate the redundant operations of check nodes and efficiently reduce the power consumption of decoder. From the simulations under WiMAX QC LDPC decoding with high channel quality, the CNS scheme achieves up to 12% stopping rate of check nodes with a loss of coding gain less than 0.1 dB. The WiMAX QC LDPC decoder chip that employs the CNS scheme is implemented by a 90-nm CMOS process. Compared with the LDPC decoder that employs no CNS scheme, the overall power dissipation of the proposed LDPC decoder is decreased by 4.1% with 0.5% area overhead.


2013 ◽  
Vol 543 ◽  
pp. 176-179 ◽  
Author(s):  
D.Q. Zhao ◽  
Xia Zhang ◽  
P. Liu ◽  
F. Yang ◽  
C. Lin ◽  
...  

In this work we studied the fabrication of a monolithic bimaterial micro-cantilever resonant IR sensor with on-chip drive circuits. The effects of high temperature process and stress induced performance degradation were investigated. The post-CMOS MEMS (micro electro mechanical system) fabrication process of this IR sensor is the focus of this paper, starting from theoretical analysis and simulation, and then moving to experimental verification. The capacitive cantilever structure was fabricated by surface micromachining method, and drive circuits were prepared by standard CMOS process. While the stress introduced by MEMS films, such as the tensile silicon nitride which works as a contact etch stopper layer for MOSFETs and releasing stop layer for the MEMS structure, increases the electron mobility of NMOS, PMOS hole mobility decreases. Moreover, the NMOS threshold voltage (Vth) shifts, and transconductance (Gm) degrades. An additional step of selective removing silicon nitride capping layer and polysilicon layer upon IC area were inserted into the standard CMOS process to lower the stress in MOSFET channel regions. Selective removing silicon nitride and polysilicon before annealing can void 77% Vth shift and 86% Gm loss.


Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 68
Author(s):  
Woorham Bae ◽  
Sung-Yong Cho ◽  
Deog-Kyoon Jeong

This paper presents a fully integrated Peripheral Component Interconnect (PCI) Express (PCIe) Gen4 physical layer (PHY) transmitter. The prototype chip is fabricated in a 28 nm low-power CMOS process, and the active area of the proposed transmitter is 0.23 mm2. To enable voltage scaling across wide operating rates from 2.5 Gb/s to 16 Gb/s, two on-chip supply regulators are included in the transmitter. At the same time, the regulators maintain the output impedance of the transmitter to meet the return loss specification of the PCIe, by including replica segments of the output driver and reference resistance in the regulator loop. A three-tap finite-impulse-response (FIR) equalization is implemented and, therefore, the transmitter provides more than 9.5 dB equalization which is required in the PCIe specification. At 16 Gb/s, the prototype chip achieves energy efficiency of 1.93 pJ/bit including all the interface, bias, and built-in self-test circuits.


2021 ◽  
Vol 13 (6) ◽  
pp. 146
Author(s):  
Somdip Dey ◽  
Amit Kumar Singh ◽  
Klaus McDonald-Maier

Side-channel attacks remain a challenge to information flow control and security in mobile edge devices till this date. One such important security flaw could be exploited through temperature side-channel attacks, where heat dissipation and propagation from the processing cores are observed over time in order to deduce security flaws. In this paper, we study how computer vision-based convolutional neural networks (CNNs) could be used to exploit temperature (thermal) side-channel attack on different Linux governors in mobile edge device utilizing multi-processor system-on-chip (MPSoC). We also designed a power- and memory-efficient CNN model that is capable of performing thermal side-channel attack on the MPSoC and can be used by industry practitioners and academics as a benchmark to design methodologies to secure against such an attack in MPSoC.


Author(s):  
Johannes Mittmann ◽  
Werner Schindler

AbstractMontgomery’s and Barrett’s modular multiplication algorithms are widely used in modular exponentiation algorithms, e.g. to compute RSA or ECC operations. While Montgomery’s multiplication algorithm has been studied extensively in the literature and many side-channel attacks have been detected, to our best knowledge no thorough analysis exists for Barrett’s multiplication algorithm. This article closes this gap. For both Montgomery’s and Barrett’s multiplication algorithm, differences of the execution times are caused by conditional integer subtractions, so-called extra reductions. Barrett’s multiplication algorithm allows even two extra reductions, and this feature increases the mathematical difficulties significantly. We formulate and analyse a two-dimensional Markov process, from which we deduce relevant stochastic properties of Barrett’s multiplication algorithm within modular exponentiation algorithms. This allows to transfer the timing attacks and local timing attacks (where a second side-channel attack exhibits the execution times of the particular modular squarings and multiplications) on Montgomery’s multiplication algorithm to attacks on Barrett’s algorithm. However, there are also differences. Barrett’s multiplication algorithm requires additional attack substeps, and the attack efficiency is much more sensitive to variations of the parameters. We treat timing attacks on RSA with CRT, on RSA without CRT, and on Diffie–Hellman, as well as local timing attacks against these algorithms in the presence of basis blinding. Experiments confirm our theoretical results.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1572
Author(s):  
Ehab A. Hamed ◽  
Inhee Lee

In the previous three decades, many Radiation-Hardened-by-Design (RHBD) Flip-Flops (FFs) have been designed and improved to be immune to Single Event Upsets (SEUs). Their specifications are enhanced regarding soft error tolerance, area overhead, power consumption, and delay. In this review, previously presented RHBD FFs are classified into three categories with an overview of each category. Six well-known RHBD FFs architectures are simulated using a 180 nm CMOS process to show a fair comparison between them while the conventional Transmission Gate Flip-Flop (TGFF) is used as a reference design for this comparison. The results of the comparison are analyzed to give some important highlights about each design.


Author(s):  
Dennis Wolf ◽  
Andreas Engel ◽  
Tajas Ruschke ◽  
Andreas Koch ◽  
Christian Hochberger

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.


Sign in / Sign up

Export Citation Format

Share Document