scholarly journals Using Pruning and Truncation for Power-Efficient 2-D Approximate Tchebichef Transform Hardware Architecture

2018 ◽  
Vol 13 (1) ◽  
pp. 1-6
Author(s):  
Guilherme Paim ◽  
Leandro M. G. Rocha ◽  
Gustavo M. Santana ◽  
Leonardo B. Soares ◽  
Eduardo A. C. Da Costa ◽  
...  

Due to the intensive use of discrete transforms in pic-ture coding, the search for fast and power-efficient approaches for their hardware implementation has gained importance. The DTT (Discrete Tchebichef Transform) represents a discrete class of the Chebyshev orthogonal polynomials, and it is an al-ternative for the DCT (Discrete Cosine Transform), commonly used in picture coding. In this work, we propose a new approx-imation for the integer DTT, with better quality and power-ef-ficiency by exploring truncation and pruning. The principal idea is reduce the values of coefficients to fractions enables trun-cation by shifts in the internal transform calculations and lead to lower values for the non-diagonal residues, which reduces non-orthogonality. We have also selectively pruned the rows of the state-of-the-art approximate DTT matrix. The approximate DTT architectures were synthesized for ASIC in Cadence RTL Compiler tool using a realistic power extraction methodology considering real-inputs vectors and the delays, with the Nangate 45 nm standard cells library. The synthesis results show that the proposed-pruned approximate DTT hardwired solution in-creases the maximum frequency about 10.78%, minimize cells area by 50.2%, with savings up to 55.9% of power dissipation with more compression ratio and less quality losses in the com-pressed image, when compared with state-of-the-art approxi-mate DTT hardware designs.

2019 ◽  
Vol 15 (4) ◽  
pp. 379-387
Author(s):  
Tayebeh Asiyabi ◽  
Jafar Torfifard

In this paper, a new architecture of four-stage CMOS operational transconductance amplifier (OTA) based on an alternative differential AC boosting compensation called DACBC is proposed. The presented structure removes feedforward and boosts feedback paths of compensation network simultaneously. Moreover, the presented circuit uses a fairly small compensation capacitor in the order of 1 pF, which makes the circuit very compact regarding enhanced several small-signal and largesignal characteristics. The proposed circuit along with several state-of-the-art schemes from the literature have been extensively analysed and compared together. The simulation results show with the same capacitive load and power dissipation the unity-gain frequency (UGF) can be improved over 60 times than conventional nested Miller compensation. The results of the presented OTA with 15 pF capacitive load demonstrated 65° phase margin, 18.88 MHz as UGF and DC gain of 115 dB with power dissipation of 462 μW from 1.8 V.


Author(s):  
Yunhong Gong ◽  
Yanan Sun ◽  
Dezhong Peng ◽  
Peng Chen ◽  
Zhongtai Yan ◽  
...  

AbstractThe COVID-19 pandemic has caused a global alarm. With the advances in artificial intelligence, the COVID-19 testing capabilities have been greatly expanded, and hospital resources are significantly alleviated. Over the past years, computer vision researches have focused on convolutional neural networks (CNNs), which can significantly improve image analysis ability. However, CNN architectures are usually manually designed with rich expertise that is scarce in practice. Evolutionary algorithms (EAs) can automatically search for the proper CNN architectures and voluntarily optimize the related hyperparameters. The networks searched by EAs can be used to effectively process COVID-19 computed tomography images without expert knowledge and manual setup. In this paper, we propose a novel EA-based algorithm with a dynamic searching space to design the optimal CNN architectures for diagnosing COVID-19 before the pathogenic test. The experiments are performed on the COVID-CT data set against a series of state-of-the-art CNN models. The experiments demonstrate that the architecture searched by the proposed EA-based algorithm achieves the best performance yet without any preprocessing operations. Furthermore, we found through experimentation that the intensive use of batch normalization may deteriorate the performance. This contrasts with the common sense approach of manually designing CNN architectures and will help the related experts in handcrafting CNN models to achieve the best performance without any preprocessing operations


Author(s):  
Subhadeep Banik ◽  
Takanori Isobe ◽  
Fukang Liu ◽  
Kazuhiko Minematsu ◽  
Kosei Sakamoto

We present Orthros, a 128-bit block pseudorandom function. It is designed with primary focus on latency of fully unrolled circuits. For this purpose, we adopt a parallel structure comprising two keyed permutations. The round function of each permutation is similar to Midori, a low-energy block cipher, however we thoroughly revise it to reduce latency, and introduce different rounds to significantly improve cryptographic strength in a small number of rounds. We provide a comprehensive, dedicated security analysis. For hardware implementation, Orthros achieves the lowest latency among the state-of-the-art low-latency primitives. For example, using the STM 90nm library, Orthros achieves a minimum latency of around 2.4 ns, while other constructions like PRINCE, Midori-128 and QARMA9-128- σ0 achieve 2.56 ns, 4.10 ns, 4.38 ns respectively.


Author(s):  
Subhrajit Sinha Roy ◽  
Abhishek Basu ◽  
Avik Chattopadhyay

In this chapter, hardware implementation of an LSB replacement-based digital image watermarking algorithm is introduced. The proposed scheme is developed in spatial domain. In this watermarking process, data or watermark is implanted into the cover image pixels through an adaptive last significant bit (LSB) replacement technique. The real-time execution of the watermarking logic is developed here using reversible logic. Utilization of reversible logic reduces the power dissipation by means of no information loss. The lesser power dissipation enables a faster operation as well as holds up Moore's law. The experimental results confirm that the proposed scheme offers high imperceptibility with a justified robustness.


2019 ◽  
Vol 3 (4) ◽  
pp. 382-396 ◽  
Author(s):  
Ioannis Karageorgos ◽  
Mehmet M. Isgenc ◽  
Samuel Pagliarini ◽  
Larry Pileggi

AbstractIn today’s globalized integrated circuit (IC) ecosystem, untrusted foundries are often procured to build critical systems since they offer state-of-the-art silicon with the best performance available. On the other hand, ICs that originate from trusted fabrication cannot match the same performance level since trusted fabrication is often available on legacy nodes. Split-Chip is a dual-IC approach that leverages the performance of an untrusted IC and combines it with the guaranties of a trusted IC. In this paper, we provide a framework for chip-to-chip authentication that can further improve a Split-Chip system by protecting it from attacks that are unique to Split-Chip. A hardware implementation that utilizes an SRAM-based PUF as an identifier and public key cryptography for handshake is discussed. Circuit characteristics are provided, where the trusted IC is designed in a 28-nm CMOS technology and the untrusted IC is designed in an also commercial 16-nm CMOS technology. Most importantly, our solution does not require a processor for performing any of the handshake or cryptography tasks, thus being not susceptible to software vulnerabilities and exploits.


2012 ◽  
Vol 21 (08) ◽  
pp. 1240025 ◽  
Author(s):  
CHUN-YUAN CHENG ◽  
JINN-SHYAN WANG ◽  
CHENG-TAI YEH

This paper presents an all-digital delay locked loop (ADDLL) that uses asynchronous-deskewing technology and achieves low power/voltage, small jitter, fast locking, and high process, voltage, and temperature (PVT)-variation tolerance. The measurement results show that the maximum frequency is 100 MHz at 0.35 V with 19 μW power dissipation, 62 ps peak-to-peak jitter, and 3 locking cycles. When operated at 0.5 V, the measured maximal operating clock frequency is 450 MHz with 12 ps peak-to-peak jitter, 6 locking cycles and 119 μW power dissipation. The ADDLL is fabricated with 55 nm CMOS technology, and the active area is only 0.019 mm2.


2016 ◽  
Vol 2016 ◽  
pp. 1-14 ◽  
Author(s):  
Miquel L. Alomar ◽  
Vincent Canals ◽  
Nicolas Perez-Mora ◽  
Víctor Martínez-Moll ◽  
Josep L. Rosselló

Hardware implementation of artificial neural networks (ANNs) allows exploiting the inherent parallelism of these systems. Nevertheless, they require a large amount of resources in terms of area and power dissipation. Recently, Reservoir Computing (RC) has arisen as a strategic technique to design recurrent neural networks (RNNs) with simple learning capabilities. In this work, we show a new approach to implement RC systems with digital gates. The proposed method is based on the use of probabilistic computing concepts to reduce the hardware required to implement different arithmetic operations. The result is the development of a highly functional system with low hardware resources. The presented methodology is applied to chaotic time-series forecasting.


2020 ◽  
Author(s):  
Somdip Dey ◽  
Suman Saha ◽  
Amit Singh ◽  
Klaus D. Mcdonald-Maier

<div><div><div><p>Fruit and vegetable classification using Convolutional Neural Networks (CNNs) has become a popular application in the agricultural industry, however, to the best of our knowledge no previously recorded study has designed and evaluated such an application on a mobile platform. In this paper, we propose a power-efficient CNN model, FruitVegCNN, to perform classification of fruits and vegetables in a mobile multi-processor system-on-a-chip (MPSoC). We also evaluated the efficacy of FruitVegCNN compared to popular state-of-the-art CNN models in real mobile plat- forms (Huawei P20 Lite and Samsung Galaxy Note 9) and experimental results show the efficacy and power efficiency of our proposed CNN architecture.</p></div></div></div>


Sign in / Sign up

Export Citation Format

Share Document