area efficiency
Recently Published Documents


TOTAL DOCUMENTS

181
(FIVE YEARS 53)

H-INDEX

18
(FIVE YEARS 3)

2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Chih-Cheng Chang ◽  
Shao-Tzu Li ◽  
Tong-Lin Pan ◽  
Chia-Ming Tsai ◽  
I-Ting Wang ◽  
...  

AbstractDevice quantization of in-memory computing (IMC) that considers the non-negligible variation and finite dynamic range of practical memory technology is investigated, aiming for quantitatively co-optimizing system performance on accuracy, power, and area. Architecture- and algorithm-level solutions are taken into consideration. Weight-separate mapping, VGG-like algorithm, multiple cells per weight, and fine-tuning of the classifier layer are effective for suppressing inference accuracy loss due to variation and allow for the lowest possible weight precision to improve area and energy efficiency. Higher priority should be given to developing low-conductance and low-variability memory devices that are essential for energy and area-efficiency IMC whereas low bit precision (< 3b) and memory window (< 10) are less concerned.


Author(s):  
Xiangyu Chen ◽  
Takeaki Yajima ◽  
Isao H. Inoue ◽  
Tetsuya Iizuka

Abstract Spiking neural networks (SNNs) inspired by biological neurons enable a more realistic mimicry of the human brain. To realize SNNs similar to large-scale biological networks, neuron circuits with high area efficiency are essential. In this paper, we propose a compact leaky integrate-and-fire (LIF) neuron circuit with a long and tunable time constant, which consists of a capacitor and two pseudo resistors (PRs). The prototype chip was fabricated with TSMC 65 nm CMOS technology, and it occupies a die area of 1392 m2. The fabricated LIF neuron has a power consumption of 6 W and a leak time constant of up to 1.2 ms (the resistance of PR is up to 600 MΩ). In addition, the time constants are tunable by changing the bias voltage of PRs. Overall, this proposed neuron circuit facilitates the very-large-scale integration (VLSI) of adaptive SNNs, which is crucial for the implementation of bio-scale brain-inspired computing.


Author(s):  
Min-Hua Ho ◽  
Chung-I.G. Hsu ◽  
Jhuo-Ting Hung ◽  
You-Lin Shen

The contribution of this work is to propose miniaturized trisection bandpass filters (BPFs) using size-reduced substrate integrated coaxial resonators (SICRs). The applied SICRs are operated under a coaxial mode. The occupied circuit area of the SICR developed from its structurally similar one, the substrate integrated waveguide (SIW) cavity, is only 6.2% that of the latter, corresponding to a circuit-area reduction rate of 93.8%. The cross-coupling between the input and output resonators can be either a magnetic or electric coupling for locating the transmission zero near either the upper or lower passband edge, respectively. Sample trisection BPFs with magnetic/electric cross-couplings are built for experimental verification. Agreements between measured and simulated data are observed. These miniaturized trisection BPFs with a freely switchable transmission zero are endowed with the advantage of an excellent circuit-area efficiency in the category of SICR and SIW cavity trisection BPFs.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2856
Author(s):  
Fang Tang ◽  
Qiyun Ma ◽  
Zhou Shu ◽  
Yuanjin Zheng ◽  
Amine Bermak

This paper presents a 10 bit 100 MS/s asynchronous successive approximation register (SAR) analog-to-digital converter (ADC) without calibration for industrial control system (ICS) applications. Several techniques are adopted in the proposed switching procedure to achieve better linearity, power and area efficiency. A single-side-fixed technique is utilized to reduce the number of capacitors; a parallel split capacitor array in combination with a partially thermometer coded technique can minimize the switching energy, improve speed, and decrease differential non-linearity (DNL). In addition, a compact timing-protection scheme is proposed to ensure the stability of the asynchronous SAR ADC. The proposed ADC is fabricated in a 28 nm CMOS process with an active area of 0.026 mm2. At 100 MS/s, the ADC achieves a signal-to-noise-and-distortion ratio (SNDR) of 51.54 dB and a spurious free dynamic range (SFDR) of 55.12 dB with the Nyquist input. The measured DNL and integral non-linearity (INL) without calibration are +0.37/−0.44 and +0.48/−0.63 LSB, respectively. The power consumption is 1.1 mW with a supply voltage of 0.9 V, leading to a figure of merit (FoM) of 35.6 fJ/conversion-step.


Author(s):  
Pham Khoi Dong ◽  
Hung K Nguyen ◽  
Fawnizu Azmadi Hussin ◽  
Xuan Tu Tran

Security issues in high-speed data transfer between devices are always a big challenge. On the other hand, new data transfer standards such as IEEE P802.3bs 2017 stipulate the maximum data rate up to 400 Gbps. So, security encryptions need high throughput to meet data transfer rates and low latency to ensure the quality of services. In this paper, we propose a multi-core AES encryption hardware architecture to achieve ultra-high-throughput encryption. To reduce area cost and power consumption, these cores share the same KeyExpansion blocks. Fully parallel, outer round pipeline technique is also applied to the proposed architecture to achieve low latency encryption. The design has been modelled at RTL (Register-Transfer-Level) in VHDL and then synthesized with a CMOS 45nm technology using Synopsys Design Compiler. With 10-cores fully parallel and outer round pipeline, the implementation results show that our architecture achieves a throughput of 1 Tbps at the maximum operating frequency of 800 MHz. These results meet the speed requirements of future communication standards. In addition, our design also achieves a high power-efficiency of 2377 Gbps/W and area-efficiency of 833 Gbps/mm2, that is 2.6x and 4.5x higher than those of the other highest throughput of single-core AES, respectively.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Hyungmin Cho

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.


2021 ◽  
Vol 13 (21) ◽  
pp. 11984
Author(s):  
Najib Rahman Sabory ◽  
Tomonobu Senjyu ◽  
Mir Sayed Shah Danish ◽  
Ayaz Hosham ◽  
Ajmal Noorzada ◽  
...  

A smart city is fundamentally intended to reduce the consumption of resources and optimize efficiencies. In almost any area, efficiency results in energy saving, reduced energy intensity, sustainable economic development, enhanced productivity, a protected environment, and most importantly, cooperation with the climate change battle. Although budget, technology, and the required infrastructure are major constraints for poor cities to achieve smart and sustainable city goals, the benefits of smart cities are multiple for poor cities compared to developing and developed cities. Poor cities achieve improved living environments, security, safety, economic development, governance, and quality of life in addition to achieving sustainable energy goals, and this study seeks to identify those smart renewable energy and energy efficiency strategies that are economically feasible and technically applicable in poor cities. The findings of this research would help poor and low-income, developing cities take the initial steps towards becoming smart cities by applying smart, innovative, and economically feasible sustainable energy projects and initiatives. As a result, these cities will be able to enhance their environment, economy, and employment by transitioning to smart ones.


2021 ◽  
Vol 11 (20) ◽  
pp. 9730
Author(s):  
Zulfikar Zulfikar ◽  
Norhayati Soin ◽  
Sharifah Fatmadiana Wan Muhamad Hatta ◽  
Mohamad Sofian Abu Talip ◽  
Anuar Jaafar

The research into ring oscillator physically unclonable functions (RO-PUF) continues to expand due to its simple structure, ease of generating responses, and its promises of primitive security. However, a substantial study has yet to be carried out in developing designs of the FPGA-based RO-PUF, which effectively balances performance and area efficiency. This work proposes a modified RO-PUF where the ring oscillators are connected directly to the counters. The proposed RO-PUF requires fewer RO than the conventional structure since this work utilizes the direct pulse count method. This work aims to seek the ideal routing density of ROs to improve uniqueness. For this purpose, five logic arrangements of a wide range of routing densities of ROs were tested. Upon implementation onto the FPGA chip, the routing density of ROs are varied significantly in terms of wire utilization (higher than 25%) and routing hotspots (higher than 80%). The best uniqueness attained was 52.71%, while the highest reliability was 99.51%. This study improves the uniqueness by 2% subsequent to the application of scenarios to consider ROs with a narrow range of routing density. The best range of wire utilization and routing hotspots of individual RO in this work is 3–5% and 20–50%, respectively. The performance metrics (uniqueness and reliability) of the proposed RO-PUF are much better than existing works using a similar FPGA platform (Altera), and it is as good as the recent RO-PUFs realized on Xilinx. Additionally, this work estimates the minimum runtimes to reduce error and response bit-flip of RO-PUF.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2533
Author(s):  
Wenjia Fu ◽  
Jincheng Xia ◽  
Xu Lin ◽  
Ming Liu ◽  
Mingjiang Wang

CORDIC algorithm is used for low-cost hardware implementation to calculate transcendental functions. This paper proposes a low-latency high-precision architecture for the computation of hyperbolic functions sinhx and coshx based on an improved CORDIC algorithm, that is, the QH-CORDIC. The principle, structure, and range of convergence of the QH-CORDIC are discussed, and the hardware circuit architecture of functions sinhx and coshx using the QH-CORDIC is plotted in this paper. The proposed architecture is implemented using an FPGA device, showing that it has 75% and 50% latency overhead over the two latest prior works. In the synthesis using TSMC 65 nm standard cell library, ASIC implementation results show that the proposed architecture is also superior to the two latest prior works in terms of total time (latency × period), ATP (area × total time), total energy (power × total time), energy efficiency (total energy/efficient bits), and area efficiency (efficient bits/area/total time). Comparison of related works indicates that it is much more favorable for the proposed architecture to perform high-precision floating-point computations on functions sinhx and coshx than the LUT method, stochastic computing, and other CORDIC algorithms.


Author(s):  
Saurabh J Shewale ◽  
Sonal A Shirsath

This paper presents a comparative study of Complementary MOSFET (CMOS) full adder circuits. Our approach is based on hybrid design full adder circuits combined in a single unit. Full adder circuit is an essential component for designing of various digital systems. It is used for different applications such as Digital signal processor, microcontroller, microprocessor and data processing units (DSP). In most of these systems the adder lies in the critical path that determines the overall speed of the system. Full adder is mainly used in VLSI devices like microprocessor for computational purposes. The proposed full adder cell has low power consumption, better area efficiency. Recently, there have been massive research interests in this area due to the growing need for low-power and high-performance computing systems. Our aim is to design and compare the full adder circuit in various technologies and compare their power capacity. By using the hybrid structure of NMOS and PMOS, we have implemented the circuit of full adder.


Sign in / Sign up

Export Citation Format

Share Document