AMROFloor: An Efficient Aging Mitigation and Resource Optimization Floorplanner for Virtual Coarse-Grained Runtime Reconfigurable FPGAs

Zeyu Li; Zhao Huang; Quan Wang; Junjie Wang

doi:10.3390/electronics11020273

AMROFloor: An Efficient Aging Mitigation and Resource Optimization Floorplanner for Virtual Coarse-Grained Runtime Reconfigurable FPGAs

Electronics ◽

10.3390/electronics11020273 ◽

2022 ◽

Vol 11 (2) ◽

pp. 273

Author(s):

Zeyu Li ◽

Zhao Huang ◽

Quan Wang ◽

Junjie Wang

Keyword(s):

Resource Optimization ◽

Coarse Grained ◽

Cmos Process ◽

Time To Failure ◽

Effective Solution ◽

Aging Effects ◽

Rapid Reduction ◽

Silicon Accumulation ◽

Long Time ◽

On Chip

With the rapid reduction of CMOS process size, the FPGAs with high-silicon accumulation technology are becoming more sensitive to aging effects. This reduces the reliability and service life of the device. The offline aging-aware layout planning based on balance stress is an effective solution. However, the existing methods need to take a long time to solve the floorplanner, and the corresponding layout solutions occupy many on-chip resources. To this end, we proposed an efficient Aging Mitigation and Resource Optimization Floorplanner (AMROFloor) for FPGAs. First, the layout solution is implemented on the Virtual Coarse-Grained Runtime Reconfigurable Architecture, which contributes to avoiding rule constraints for placement and routing. Second, the Maximize Reconfigurable Regions Algorithm (MRRA) is proposed to quickly determine the RRs’ number and size to save the solving time and ensure an effective solution. Furthermore, the Resource Combination Algorithm (RCA) is proposed to optimize the on-chip resources, reducing the on-Chip Resource Utilization (CRU) while achieving the same aging relief effect. Experiments were simulated and implemented on Xilinx FPGA. The results demonstrate that the AMROFloor method designed in this paper can extend the Mean Time to Failure (MTTF) by 13.8% and optimize the resource overhead by 19.2% on average compared to the existing aging-aware layout solutions.

Download Full-text

Design and Implementation of a Reconfigurable Cryptographic Coprocessor with Multiple Side-Channel Attacks Countermeasures

Journal of Circuits System and Computers ◽

10.1142/s0218126618501803 ◽

2018 ◽

Vol 27 (11) ◽

pp. 1850180 ◽

Cited By ~ 1

Author(s):

Xinchao Shang ◽

Weiwei Shan ◽

Xinning Liu

Keyword(s):

Coarse Grained ◽

Side Channel ◽

Cmos Process ◽

Modular Multiplication ◽

Small Power ◽

Design And Implementation ◽

Area Overhead ◽

On Chip ◽

The Right ◽

Circuit Function

Nowadays, countermeasures against side-channel attack (SCA) have become necessary in hardware security. And the need for supporting multiple crypto algorithms on a chip is increasing. We propose a reconfigurable crypto coprocessor, which not only supports multiple crypto algorithms, but also provides multiple effective SCA countermeasures of SPA, DPA and EMA, by making use of its own reconfigurable features other than using extra resources. The countermeasure methods include several global and encryption flow related countermeasures, which can also be reconfigured along with the circuit function. This coprocessor is a coarse-grained reconfigurable architecture composed of several reconfigurable modules, such as logic arithmetic, shift, modular ADD/Substrate, permutation, S-box and modular multiplication units, all of which are reconfigurable. This reconfigurable cryptographic coprocessor is integrated into a system-on chip with a 32-bit CPU and fabricated in 0.18 m CMOS process with 1.8[Formula: see text]V supply and 100 MHz maximum frequency. Experimental results show that it can successfully resist SPA and DPA with one million power traces. As for EMA, if we use full countermeasures, it can resist EMA with up to 1.2 million electromagnetic traces without revealing the right subkey. Thus, this reconfigurable coprocessor can provide a good solution for both supporting multiple algorithms and providing SCA resistance, with no frequency influence, neglectable area overhead and small power overhead.

Download Full-text

Heterogeneous Integration of Boost Power Supply and On-Chip Solar Cell using triple well CMOS Process

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.138.41 ◽

2018 ◽

Vol 138 (1) ◽

pp. 41-49

Author(s):

Kazuma Igarashi ◽

Yoshimasa Minami ◽

Nobuhiko Nakano

Keyword(s):

Solar Cell ◽

Power Supply ◽

Heterogeneous Integration ◽

Cmos Process ◽

On Chip

Download Full-text

Fabrication of Monolithic Integrated Bimaterial Resonant Uncooled IR Sensor

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.543.176 ◽

2013 ◽

Vol 543 ◽

pp. 176-179 ◽

Cited By ~ 1

Author(s):

D.Q. Zhao ◽

Xia Zhang ◽

P. Liu ◽

F. Yang ◽

C. Lin ◽

...

Keyword(s):

Silicon Nitride ◽

Hole Mobility ◽

Micro Electro Mechanical System ◽

Cmos Process ◽

Standard Cmos Process ◽

Ir Sensor ◽

Cantilever Structure ◽

Cmos Mems ◽

On Chip ◽

Micro Cantilever

In this work we studied the fabrication of a monolithic bimaterial micro-cantilever resonant IR sensor with on-chip drive circuits. The effects of high temperature process and stress induced performance degradation were investigated. The post-CMOS MEMS (micro electro mechanical system) fabrication process of this IR sensor is the focus of this paper, starting from theoretical analysis and simulation, and then moving to experimental verification. The capacitive cantilever structure was fabricated by surface micromachining method, and drive circuits were prepared by standard CMOS process. While the stress introduced by MEMS films, such as the tensile silicon nitride which works as a contact etch stopper layer for MOSFETs and releasing stop layer for the MEMS structure, increases the electron mobility of NMOS, PMOS hole mobility decreases. Moreover, the NMOS threshold voltage (Vth) shifts, and transconductance (Gm) degrades. An additional step of selective removing silicon nitride capping layer and polysilicon layer upon IC area were inserted into the standard CMOS process to lower the stress in MOSFET channel regions. Selective removing silicon nitride and polysilicon before annealing can void 77% Vth shift and 86% Gm loss.

Download Full-text

A 1.93-pJ/Bit PCI Express Gen4 PHY Transmitter with On-Chip Supply Regulators in 28 nm CMOS

Electronics ◽

10.3390/electronics10010068 ◽

2021 ◽

Vol 10 (1) ◽

pp. 68

Author(s):

Woorham Bae ◽

Sung-Yong Cho ◽

Deog-Kyoon Jeong

Keyword(s):

Finite Impulse Response ◽

Cmos Process ◽

Pci Express ◽

Fully Integrated ◽

Peripheral Component Interconnect ◽

Low Power Cmos ◽

On Chip ◽

Output Driver ◽

Wide Operating ◽

28 Nm

This paper presents a fully integrated Peripheral Component Interconnect (PCI) Express (PCIe) Gen4 physical layer (PHY) transmitter. The prototype chip is fabricated in a 28 nm low-power CMOS process, and the active area of the proposed transmitter is 0.23 mm2. To enable voltage scaling across wide operating rates from 2.5 Gb/s to 16 Gb/s, two on-chip supply regulators are included in the transmitter. At the same time, the regulators maintain the output impedance of the transmitter to meet the return loss specification of the PCIe, by including replica segments of the output driver and reference resistance in the regulator loop. A three-tap finite-impulse-response (FIR) equalization is implemented and, therefore, the transmitter provides more than 9.5 dB equalization which is required in the PCIe specification. At 16 Gb/s, the prototype chip achieves energy efficiency of 1.93 pJ/bit including all the interface, bias, and built-in self-test circuits.

Download Full-text

Unveiling the Dynamics of KRAS4b on Lipid Model Membranes

The Journal of Membrane Biology ◽

10.1007/s00232-021-00176-z ◽

2021 ◽

Author(s):

Cesar A. López ◽

Animesh Agarwal ◽

Que N. Van ◽

Andrew G. Stephen ◽

S. Gnanakaran

Keyword(s):

Molecular Mechanisms ◽

Hypervariable Region ◽

Small Gtpase ◽

Model Systems ◽

Coarse Grained ◽

Regulatory Function ◽

Limited Information ◽

Long Time ◽

Cell Growth And Differentiation ◽

State Models

AbstractSmall GTPase proteins are ubiquitous and responsible for regulating several processes related to cell growth and differentiation. Mutations that stabilize their active state can lead to uncontrolled cell proliferation and cancer. Although these proteins are well characterized at the cellular scale, the molecular mechanisms governing their functions are still poorly understood. In addition, there is limited information about the regulatory function of the cell membrane which supports their activity. Thus, we have studied the dynamics and conformations of the farnesylated KRAS4b in various membrane model systems, ranging from binary fluid mixtures to heterogeneous raft mimics. Our approach combines long time-scale coarse-grained (CG) simulations and Markov state models to dissect the membrane-supported dynamics of KRAS4b. Our simulations reveal that protein dynamics is mainly modulated by the presence of anionic lipids and to some extent by the nucleotide state (activation) of the protein. In addition, our results suggest that both the farnesyl and the polybasic hypervariable region (HVR) are responsible for its preferential partitioning within the liquid-disordered (Ld) domains in membranes, potentially enhancing the formation of membrane-driven signaling platforms. Graphic Abstract

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text

Simba

Communications of the ACM ◽

10.1145/3460227 ◽

2021 ◽

Vol 64 (6) ◽

pp. 107-116

Author(s):

Yakun Sophia Shao ◽

Jason Cemons ◽

Rangharajan Venkatesan ◽

Brian Zimmer ◽

Matthew Fojtik ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Locality ◽

Coarse Grained ◽

Batch Size ◽

Peak Performance ◽

Large Scale Systems ◽

High Area ◽

On Chip ◽

And Storage

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.

Download Full-text

Substrate-triggered technique for on-chip ESD protection design in a 0.18-μm salicided CMOS process

IEEE Transactions on Electron Devices ◽

10.1109/ted.2003.812495 ◽

2003 ◽

Vol 50 (4) ◽

pp. 1050-1057 ◽

Cited By ~ 15

Author(s):

Ming-Dou Ker ◽

Tung-Yang Chen

Keyword(s):

Cmos Process ◽

Esd Protection ◽

On Chip

Download Full-text

Data Storage Layout for Object-Based De-Duplication System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.2284 ◽

2013 ◽

Vol 303-306 ◽

pp. 2284-2288

Author(s):

Fang Yan ◽

Yu An Tan

Keyword(s):

Data Storage ◽

Storage System ◽

Unstructured Data ◽

Effective Solution ◽

Advanced Method ◽

Data Object ◽

Object Based ◽

The World ◽

Long Time ◽

Reduce Energy Consumption

The world is increasingly awash in more and more unstructured data. Object-based data de-duplication is the current most advanced method and is the effective solution for detecting duplicate data. We developed an energy saving policy for conventional disk based RAID systems. According to the characteristics of object-based data de-duplication, we introduce object layout strategies for unstructured data applications; disk accesses are concentrated in a part of the disks in a long time which is conducive to scheduling other disks into standby or shutdown mode. Our proposed methods reduce energy consumption of de-duplication storage system.

Download Full-text

Clockwise and counterclockwise hysteresis characterize state changes in the same aquatic ecosystem

10.1101/2020.05.01.073239 ◽

2020 ◽

Author(s):

Amanda C. Northrop ◽

Vanessa Avalone ◽

Aaron M. Ellison ◽

Bryan A. Ballif ◽

Nicholas J. Gotelli

Keyword(s):

Aquatic Ecosystem ◽

Recovery Phase ◽

Ecosystem Recovery ◽

Pitcher Plant ◽

Rapid Reduction ◽

Long Time ◽

High Enrichment ◽

Change State ◽

Abrupt Shifts ◽

Clockwise Hysteresis

Incremental increases in a driver variable, such as nutrients or detritus, can trigger abrupt shifts in aquatic ecosys-tems. Once these ecosystems change state, a simple reduction in the driver variable may not return them to their original state. Because of the long time scales involved, we still have a poor understanding of the dynamics of ecosys-tem recovery after a state change. A model system for understanding ecosystem recovery is the aquatic microecosystem that inhabits the cup-shaped leaves of the pitcher plant Sarracenia purpurea. With enrichment of organic matter, this system flips within 1 to 3 days from an oxygen-rich state to an oxygen-poor (hypoxic) state. In a replicated green-house experiment, we enriched pitcher plant leaves at different rates with bovine serum albumin (BSA), a molecular substitute for detritus. Changes in dissolved oxygen ([O2]) and undigested BSA concentration were monitored during enrichment and recovery phases. At low enrichment rates, ecosystems showed a substantial lag in the recovery of [O2] (clockwise hysteresis). At intermediate enrichment rates, [O2] tracked the levels of undigested BSA with the same profile during the enrichment and recovery phases (no hysteresis). At high enrichment rates, we observed a novel response: changes in [O2] were proportionally larger during the recovery phase than during the enrichment phase (counter-clockwise hysteresis). These experiments demonstrate that detrital enrichment rate can modulate a diversity of hysteretic responses in a single aquatic ecosystem. With counter-clockwise hysteresis, rapid reduction of a driver variable following high enrichment rates may be a viable restoration strategy.

Download Full-text