clock cycle Latest Research Papers

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

10.36227/techrxiv.17088812.v1 ◽

2021 ◽

Author(s):

Gunnar Carlstedt ◽

Mats Rimborg

Keyword(s):

Clock Cycle ◽

Internal Clock ◽

Clock Frequency ◽

Clock Skew ◽

Huge Number ◽

Local Oscillators ◽

Clock System ◽

Clock Distribution Network ◽

Free Running ◽

Central Clock

<div>A clock system for a huge grid of small clock regions is presented. There is an oscillator in each clock region, which drives the local clock of a processing element (PE). The oscillators are kept synchronized by exploiting the phase of their neighbors. In an infinite mesh, the clock skew would be zero, but in a network of limited size there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, this skew can be reduced to 1.2 ps. The mesh may contain millions of clock regions.</div><div> Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 µm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 µW per PE. It corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances.</div><div> Power noise is reduced by scheduling the local oscillators gradually along one of the grid’s axes. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.</div>

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

10.36227/techrxiv.17088812 ◽

2021 ◽

Author(s):

Gunnar Carlstedt ◽

Mats Rimborg

Keyword(s):

Clock Cycle ◽

Internal Clock ◽

Clock Frequency ◽

Clock Skew ◽

Huge Number ◽

Local Oscillators ◽

Clock System ◽

Clock Distribution Network ◽

Free Running ◽

Central Clock

<div>A clock system for a huge grid of small clock regions is presented. There is an oscillator in each clock region, which drives the local clock of a processing element (PE). The oscillators are kept synchronized by exploiting the phase of their neighbors. In an infinite mesh, the clock skew would be zero, but in a network of limited size there will be fringe effects. In a mesh with 25×25 oscillators, the maximum skew between neighboring regions is within 3.3 ps. By slightly adjusting the free running frequency of the oscillators, this skew can be reduced to 1.2 ps. The mesh may contain millions of clock regions.</div><div> Because there is no central clock, both power consumption and clock frequency can be improved compared to a conventional clock distribution network. A PE of 150×150 µm² running at 6.7 GHz with 93 master-slave flip-flops is used as an example. The PE-internal clock skew is less than 2.3 ps, and the energy consumption of the clock system 807 µW per PE. It corresponds to an effective gate and wire capacitance of 509 aF, or 7.3 gate capacitances.</div><div> Power noise is reduced by scheduling the local oscillators gradually along one of the grid’s axes. In this way, surge currents, which generally have their peaks at the clock edges, are distributed evenly over a full clock cycle.</div>

Cryptanalysis of Efficient Masked Ciphers: Applications to Low Latency

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.679-721 ◽

2021 ◽

pp. 679-721

Author(s):

Tim Beyne ◽

Siemen Dhooghe ◽

Amir Moradi ◽

Aein Rezaei Shahmirzadi

Keyword(s):

Clock Cycle ◽

Security Analysis ◽

Main Idea ◽

Second Order ◽

Low Latency ◽

Verification Tool

This work introduces second-order masked implementation of LED, Midori, Skinny, and Prince ciphers which do not require fresh masks to be updated at every clock cycle. The main idea lies on a combination of the constructions given by Shahmirzadi and Moradi at CHES 2021, and the theory presented by Beyne et al. at Asiacrypt 2020. The presented masked designs only use a minimal number of shares, i.e., three to achieve second-order security, and we make use of a trick to pair a couple of S-boxes to reduce their latency. The theoretical security analyses of our constructions are based on the linear-cryptanalytic properties of the underlying masked primitive as well as SILVER, the leakage verification tool presented at Asiacrypt 2020. To improve this cryptanalytic analysis, we use the noisy probing model which allows for the inclusion of noise in the framework of Beyne et al. We further provide FPGA-based experimental security analysis confirming second-order protection of our masked implementations.

RRAM-based CAM combined with time-domain circuits for hyperdimensional computing

Scientific Reports ◽

10.1038/s41598-021-99000-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yasmin Halawani ◽

Dima Kilani ◽

Eman Hassan ◽

Huruy Tesfai ◽

Hani Saleh ◽

...

Keyword(s):

Time Domain ◽

High Speed ◽

Clock Cycle ◽

Content Addressable Memory ◽

The Time Domain ◽

Analog Voltage ◽

Voltage Saturation ◽

Reduction In Area ◽

Remarkable Reduction ◽

Analog Adder

AbstractContent addressable memory (CAM) for search and match operations demands high speed and low power for near real-time decision-making across many critical domains. Resistive RAM (RRAM)-based in-memory computing has high potential in realizing an efficient static CAM for artificial intelligence tasks, especially on resource-constrained platforms. This paper presents an XNOR-based RRAM-CAM with a time-domain analog adder for efficient winning class computation. The CAM compares two operands, one voltage and the second one resistance, and outputs a voltage proportional to the similarity between the input query and the pre-stored patterns. Processing the summation of the output similarity voltages in the time-domain helps avoid voltage saturation, variation, and noise dominating the analog voltage-based computing. After that, to determine the winning class among the multiple classes, a digital realization is utilized to consider the class with the longest pulse width as the winning class. As a demonstrator, hyperdimensional computing for efficient MNIST classification is considered. The proposed design uses 65 nm CMOS foundry technology and realistic data for RRAM with total area of 0.0077 mm2, consumes 13.6 pJ of energy per 1 k query within 10 ns clock cycle. It shows a reduction of ~ 31 × in area and ~ 3 × in energy consumption compared to fully digital ASIC implementation using 65 nm foundry technology. The proposed design exhibits a remarkable reduction in area and energy compared to two of the state-of-the-art RRAM designs.

ELECTRICAL DRIVE AND CONTROL OF AN ELECTRIC MOTOR STEP BY STEP BY STEP BIPOLAR

Journal of marine Technology and Environment ◽

10.53464/jmte.02.2021.02 ◽

2021 ◽

Vol 2021 (2) ◽

pp. 11-14

Author(s):

VIOLETA-VALI CIUCUR

Keyword(s):

Analog Circuits ◽

Filling Factor ◽

Pulse Width Modulation ◽

Pulse Generator ◽

Clock Cycle ◽

Current Source ◽

Maximum Benefit ◽

Rectangular Signal ◽

The Times ◽

Drive And Control

"PWM, (Pulse Width Modulation) is the most effective way to control analog circuits using numerical outputs by changing the duration and frequency of the signal. The duration of each state t1 and t2, so the filling factor, where T = t1 + t2 = constant. If only one of the times (t1 or t2) varies, then the Tt period of a cycle varies, so the f = 1 / T frequency varies. The PWM signal is actually a modulated rectangular signal over the duration by modifying the duration of each period t1, t2 of the cycle as well as the change in frequency. The maximum benefit of a stepper motor can only be obtained if it is ordered correctly, this requiring a direct current source, an electronic switch and a controlled pulse generator (numerical information). The frequency of the CLOCK cycle is measured in Hz and the filling factor is measured in percentage (%). The amplitude of the output signal is constant even if the amplitude of signals producing the fill factor varies. "

High-performance AES-128 algorithm implementation by FPGA-based SoC for 5G communications

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i5.pp4221-4232 ◽

2021 ◽

Vol 11 (5) ◽

pp. 4221

Author(s):

Paolo Visconti ◽

Ramiro Velazquez ◽

Stefano Capoccia ◽

Roberto De Fazio

Keyword(s):

High Performance ◽

Clock Cycle ◽

Research Work ◽

Data Packets ◽

Multiple Data ◽

Encryption And Decryption ◽

Encryption Decryption ◽

A low-area design of two-factor authentication using DIES and SBI for IoT security

The Journal of Supercomputing ◽

10.1007/s11227-021-04022-w ◽

2021 ◽

Author(s):

M. N. Sudha ◽

M. Rajendiran ◽

Mariusz Specht ◽

Kasarla Satish Reddy ◽

S. Sugumaran

Keyword(s):

Clock Cycle ◽

Memory Storage ◽

Lookup Table ◽

Linear Feedback ◽

Selection Line ◽

Iot Security ◽

Low Area ◽

Physical Internet ◽

Iot Devices ◽

Switching Characteristics

AbstractInternet of things (IoTs) is an integration of heterogeneous physical devices which are interconnected and communicated over the physical Internet. The design of secure, lightweight and an effective authentication protocol is required, because the information is transmitted among the remote user and numerous sensing devices over the IoT network. Recently, two-factor authentication (TFA) scheme is developed for providing the security among the IoT devices. But, the performances of the IoT network are affected due to the less memory storage and restricted resource of the IoT. In this paper, the integration of data inverting encoding scheme (DIES) and substitution-box-based inverter is proposed for providing the security using the random values of one-time alias identity, challenge, server nonce and device nonce. Here, the linearity of produced random values is decreased for each clock cycle based on the switching characteristics of the selection line in DIES. Moreover, the linear feedback shift register is used in the adaptive physically unclonable function (APUF) for generating the random response value. The APUF–DIES-IoT architecture is analyzed in terms of lookup table, flip flops, slices, frequency and delay. This APUF–DIES-IoT architecture is analyzed for different security and authentication performances. Two existing methods are considered to evaluate the APUF–DIES-IoT architecture such as TFA-PUF-IoT and TFA-APUF-IoT. The APUF–DIES-IoT architecture uses 36 flip flops at Virtex 6; it is less when compared to the TFA-PUF-IoT and TFA-APUF-IoT.

A Connected Component Labelling algorithm for a multi-pixel per clock cycle video stream

10.1109/dsd53832.2021.00016 ◽

2021 ◽

Author(s):

Marcin Kowalczyk ◽

Tomasz Kryjak

Keyword(s):

Clock Cycle ◽

Video Stream ◽

Connected Component ◽

Labelling Algorithm

Architecture of the discrete sosine transformation processor for image compression systems on the losless-to-lossy circuit

Doklady BGUIR ◽

10.35596/1729-7648-2021-19-5-86-93 ◽

2021 ◽

Vol 19 (5) ◽

pp. 86-93

Author(s):

V. V. Kliuchenia

Keyword(s):

Digital Signal Processor ◽

Clock Cycle ◽

Digital Signal ◽

Multimedia Systems ◽

Coding Systems ◽

Number Of Factors ◽

Hardware Implementations ◽

Pros And Cons ◽

Inverse Dct ◽

Signal Processor

The hardware implementations of fixed-point DCT blocks, known as IntDCT [1] and BinDCT [2], require some solutions. One of the main issues is the choice between the implementation of the conversion on FPGA, or the implementation on a digital signal processor (Digital Signal Processor, DSP). Each of the implementations has its own pros and cons. One of the most important advantages of the DSP implementation is the presence of special instructions used in DSP, in particular, the ability to multiply two numbers in one clock cycle. Therefore, with the advent of DSP, the limitation on the number of multiplications in algorithms was removed. On the other hand, when implementing a block on an FPGA, we can limit not ourselves to the bitness of the data (within reasonable limits), we have the ability to parallelize all incoming data and implement specialized computing cores for various tasks. In fact, designing multimedia systems on FPGAs reminds the design of similar systems based on the logic of a small and medium degree of integration. Such an implementation has the same limitations: a relatively small amount of available memory, the need to design basic structural elements (multipliers, divisors), etc. It is the inequality of the addition and multiplication operations when they are implemented on FPGAs that caused the search for DCT algorithms with the smallest number of factors. However, even this is not enough, since the structure of the multiplier is many times more complex than the structure of the adder, which made it necessary to look for ways to transform without using multiplications at all. This article shows how, on the basis of integer direct and inverse DCT and distributed arithmetic, to create a new universal architecture of decorrelated transform on FPGAs without multiplication operations for image transformation coding systems that operate on the principle of lossless-to-lossy (L2L), and to obtain the best experimental results in terms of hardware resources compared to comparable compression systems.

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Sensors ◽

10.3390/s21155081 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5081

Author(s):

Hsu-Yu Kao ◽

Xin-Jia Chen ◽

Shih-Hsu Huang

Keyword(s):

Power Consumption ◽

Low Power ◽

State Of The Art ◽

Clock Cycle ◽

The State ◽

Edge Computing ◽

Convolution Operation ◽

Product Matrix ◽

Unit Design ◽

Overall Performance

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

clock cycle
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

Cryptanalysis of Efficient Masked Ciphers: Applications to Low Latency

RRAM-based CAM combined with time-domain circuits for hyperdimensional computing

ELECTRICAL DRIVE AND CONTROL OF AN ELECTRIC MOTOR STEP BY STEP BY STEP BIPOLAR

High-performance AES-128 algorithm implementation by FPGA-based SoC for 5G communications

A low-area design of two-factor authentication using DIES and SBI for IoT security

A Connected Component Labelling algorithm for a multi-pixel per clock cycle video stream

Architecture of the discrete sosine transformation processor for image compression systems on the losless-to-lossy circuit

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Export Citation Format

clock cycleRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

A Low Energy Clock Network with a Huge Number of Local Synchronized Oscillators

Cryptanalysis of Efficient Masked Ciphers: Applications to Low Latency

RRAM-based CAM combined with time-domain circuits for hyperdimensional computing

ELECTRICAL DRIVE AND CONTROL OF AN ELECTRIC MOTOR STEP BY STEP BY STEP BIPOLAR

High-performance AES-128 algorithm implementation by FPGA-based SoC for 5G communications

A low-area design of two-factor authentication using DIES and SBI for IoT security

A Connected Component Labelling algorithm for a multi-pixel per clock cycle video stream

Architecture of the discrete sosine transformation processor for image compression systems on the losless-to-lossy circuit

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

clock cycle
Recently Published Documents