clock cycle
Recently Published Documents


TOTAL DOCUMENTS

184
(FIVE YEARS 78)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Gunnar Carlstedt ◽  
Mats Rimborg

<div>A clock system for a huge grid of small clock regions is presented. There is an oscillator in each clock region, which drives the local clock of a processing element (PE). The oscillators are kept synchronized by exploiting the phase of their neighbors. In an infinite mesh, the clock skew would be zero, but in a network of limited size there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, this skew can be reduced to 1.2 ps. The mesh may contain millions of clock regions.</div><div> Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 µm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 µW per PE. It corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances.</div><div> Power noise is reduced by scheduling the local oscillators gradually along one of the grid’s axes. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.</div>


2021 ◽  
Author(s):  
Gunnar Carlstedt ◽  
Mats Rimborg

<div>A clock system for a huge grid of small clock regions is presented. There is an oscillator in each clock region, which drives the local clock of a processing element (PE). The oscillators are kept synchronized by exploiting the phase of their neighbors. In an infinite mesh, the clock skew would be zero, but in a network of limited size there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, this skew can be reduced to 1.2 ps. The mesh may contain millions of clock regions.</div><div> Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 µm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 µW per PE. It corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances.</div><div> Power noise is reduced by scheduling the local oscillators gradually along one of the grid’s axes. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.</div>


Author(s):  
Tim Beyne ◽  
Siemen Dhooghe ◽  
Amir Moradi ◽  
Aein Rezaei Shahmirzadi

This work introduces second-order masked implementation of LED, Midori, Skinny, and Prince ciphers which do not require fresh masks to be updated at every clock cycle. The main idea lies on a combination of the constructions given by Shahmirzadi and Moradi at CHES 2021, and the theory presented by Beyne et al. at Asiacrypt 2020. The presented masked designs only use a minimal number of shares, i.e., three to achieve second-order security, and we make use of a trick to pair a couple of S-boxes to reduce their latency. The theoretical security analyses of our constructions are based on the linear-cryptanalytic properties of the underlying masked primitive as well as SILVER, the leakage verification tool presented at Asiacrypt 2020. To improve this cryptanalytic analysis, we use the noisy probing model which allows for the inclusion of noise in the framework of Beyne et al. We further provide FPGA-based experimental security analysis confirming second-order protection of our masked implementations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yasmin Halawani ◽  
Dima Kilani ◽  
Eman Hassan ◽  
Huruy Tesfai ◽  
Hani Saleh ◽  
...  

AbstractContent addressable memory (CAM) for search and match operations demands high speed and low power for near real-time decision-making across many critical domains. Resistive RAM (RRAM)-based in-memory computing has high potential in realizing an efficient static CAM for artificial intelligence tasks, especially on resource-constrained platforms. This paper presents an XNOR-based RRAM-CAM with a time-domain analog adder for efficient winning class computation. The CAM compares two operands, one voltage and the second one resistance, and outputs a voltage proportional to the similarity between the input query and the pre-stored patterns. Processing the summation of the output similarity voltages in the time-domain helps avoid voltage saturation, variation, and noise dominating the analog voltage-based computing. After that, to determine the winning class among the multiple classes, a digital realization is utilized to consider the class with the longest pulse width as the winning class. As a demonstrator, hyperdimensional computing for efficient MNIST classification is considered. The proposed design uses 65 nm CMOS foundry technology and realistic data for RRAM with total area of 0.0077 mm2, consumes 13.6 pJ of energy per 1 k query within 10 ns clock cycle. It shows a reduction of ~ 31 × in area and ~ 3 × in energy consumption compared to fully digital ASIC implementation using 65 nm foundry technology. The proposed design exhibits a remarkable reduction in area and energy compared to two of the state-of-the-art RRAM designs.


2021 ◽  
Vol 2021 (2) ◽  
pp. 11-14
Author(s):  
VIOLETA-VALI CIUCUR

"PWM, (Pulse Width Modulation) is the most effective way to control analog circuits using numerical outputs by changing the duration and frequency of the signal. The duration of each state t1 and t2, so the filling factor, where T = t1 + t2 = constant. If only one of the times (t1 or t2) varies, then the Tt period of a cycle varies, so the f = 1 / T frequency varies. The PWM signal is actually a modulated rectangular signal over the duration by modifying the duration of each period t1, t2 of the cycle as well as the change in frequency. The maximum benefit of a stepper motor can only be obtained if it is ordered correctly, this requiring a direct current source, an electronic switch and a controlled pulse generator (numerical information). The frequency of the CLOCK cycle is measured in Hz and the filling factor is measured in percentage (%). The amplitude of the output signal is constant even if the amplitude of signals producing the fill factor varies. "


Author(s):  
Paolo Visconti ◽  
Ramiro Velazquez ◽  
Stefano Capoccia ◽  
Roberto De Fazio

<p>In this research work, a fast and lightweight AES-128 cypher based on the Xilinx ZCU102 FPGA board is presented, suitable for 5G communications. In particular, both encryption and decryption algorithms have been developed using a pipelined approach, so enabling the simultaneous processing of the rounds on multiple data packets at each clock cycle. Both the encryption and decryption systems support an operative frequency up to 220 MHz, reaching 28.16 Gbit/s maximum data throughput; besides, the encryption and decryption phases last both only ten clock periods. To guarantee the interoperability of the developed encryption/decryption system with the other sections of the 5G communication apparatus, synchronization and control signals have been integrated. The encryption system uses only 1631 CLBs, whereas the decryption one only 3464 CLBs, ascribable, mainly, to the Inverse Mix Columns step. The developed cypher shows higher efficiency (8.63 Mbps/slice) than similar solutions present in literature.</p>


Author(s):  
M. N. Sudha ◽  
M. Rajendiran ◽  
Mariusz Specht ◽  
Kasarla Satish Reddy ◽  
S. Sugumaran

AbstractInternet of things (IoTs) is an integration of heterogeneous physical devices which are interconnected and communicated over the physical Internet. The design of secure, lightweight and an effective authentication protocol is required, because the information is transmitted among the remote user and numerous sensing devices over the IoT network. Recently, two-factor authentication (TFA) scheme is developed for providing the security among the IoT devices. But, the performances of the IoT network are affected due to the less memory storage and restricted resource of the IoT. In this paper, the integration of data inverting encoding scheme (DIES) and substitution-box-based inverter is proposed for providing the security using the random values of one-time alias identity, challenge, server nonce and device nonce. Here, the linearity of produced random values is decreased for each clock cycle based on the switching characteristics of the selection line in DIES. Moreover, the linear feedback shift register is used in the adaptive physically unclonable function (APUF) for generating the random response value. The APUF–DIES-IoT architecture is analyzed in terms of lookup table, flip flops, slices, frequency and delay. This APUF–DIES-IoT architecture is analyzed for different security and authentication performances. Two existing methods are considered to evaluate the APUF–DIES-IoT architecture such as TFA-PUF-IoT and TFA-APUF-IoT. The APUF–DIES-IoT architecture uses 36 flip flops at Virtex 6; it is less when compared to the TFA-PUF-IoT and TFA-APUF-IoT.


Doklady BGUIR ◽  
2021 ◽  
Vol 19 (5) ◽  
pp. 86-93
Author(s):  
V. V. Kliuchenia

The hardware implementations of fixed-point DCT blocks, known as IntDCT [1] and BinDCT [2], require some solutions. One of the main issues is the choice between the implementation of the conversion on FPGA, or the implementation on a digital signal processor (Digital Signal Processor, DSP). Each of the implementations has its own pros and cons. One of the most important advantages of the DSP implementation is the presence of special instructions used in DSP, in particular, the ability to multiply two numbers in one clock cycle. Therefore, with the advent of DSP, the limitation on the number of multiplications in algorithms was removed. On the other hand, when implementing a block on an FPGA, we can limit not ourselves to the bitness of the data (within reasonable limits), we have the ability to parallelize all incoming data and implement specialized computing cores for various tasks. In fact, designing multimedia systems on FPGAs reminds the design of similar systems based on the logic of a small and medium degree of integration. Such an implementation has the same limitations: a relatively small amount of available memory, the need to design basic structural elements (multipliers, divisors), etc. It is the inequality of the addition and multiplication operations when they are implemented on FPGAs that caused the search for DCT algorithms with the smallest number of factors. However, even this is not enough, since the structure of the multiplier is many times more complex than the structure of the adder, which made it necessary to look for ways to transform without using multiplications at all. This article shows how, on the basis of integer direct and inverse DCT and distributed arithmetic, to create a new universal architecture of decorrelated transform on FPGAs without multiplication operations for image transformation coding systems that operate on the principle of lossless-to-lossy (L2L), and to obtain the best experimental results in terms of hardware resources compared to comparable compression systems.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5081
Author(s):  
Hsu-Yu Kao ◽  
Xin-Jia Chen ◽  
Shih-Hsu Huang

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.


Sign in / Sign up

Export Citation Format

Share Document