Heuristic method for bitsliced representation of randomly generated 8×8 cryptographic S-Box

The article is devoted to the issues of increasing the security and efficiency of software implementation for the symmetric block ciphers. For the implementation of cryptoalgorithms on low-end CPUs (8/16/32-bit microcontrollers), it is important to provide increased resistance to power consumption analysis attacks. With regard to the implementation of ciphers on high-end CPUs (x86, ARM Cortex-A), it is important to eliminate the vulnerability primarily to timing and cache attacks. The authors used a bitslice approach to securely implement block ciphers, which has potential advantages such as high speed and low computing resources. However, the known bitsliced methods have a significant limitation, since they work with deterministic S-Boxes or arbitrary S-Boxes of smaller sizes. The paper proposes a new heuristic method for bitsliced representation of cryptographic 8×8 S-Boxes containing randomly generated values. These values defy description using algebraic expressions. The method is based on the decomposition of the truth table, which describes the S-Box, into two parts. One part of the table forms logical masks, and the other is split into bit vectors. To find a logical description of these vectors an exhaustive search is used. After finding the description of all vectors, these two parts of the table are combined into one using logical operations. The use of this method oriented on software implementation in the logical basis {AND, OR, XOR, NOT} ensures the minimization of arbitrary 8×8 S-Boxes. The proposed method can be implemented using standard logical instructions on any 8/16/32/64-bit processors. It is also possible to use logical SIMD instructions from the SSE, AVX, AVX-512 extensions for x86-64 processors, which provides high performance due to the use of long registers. The corresponding software has been developed that implements the method of searching for bitsliced representations of a given S-Box, and also automatically generates C++ code for it based on SSE, AVX and AVX-512 instructions. The effectiveness of the method on the S-Box of known block ciphers, in particular the Ukrainian encryption standard "Kalyna", has been investigated. It was found that the developed algorithm requires almost half as many gates for the bitsliced description of an arbitrary S-Box than the best of known algorithm (370 gates versus 680, respectively). For ciphers that use two or four S-Box tables, joint minimization can yield up to 330 or 300 gates per table, respectively. Keywords: bitslicing; S-Box; logical minimization; SIMD; x86-64 CPU; software implementation; block ciphers.

Download Full-text

High-Performance Symmetric Block Ciphers on CUDA

2011 Second International Conference on Networking and Computing ◽

10.1109/icnc.2011.40 ◽

2011 ◽

Cited By ~ 14

Author(s):

Naoki Nishikawa ◽

Keisuke Iwai ◽

Takakazu Kurokawa

Keyword(s):

High Performance ◽

Block Ciphers ◽

Symmetric Block

Download Full-text

Device for Calculating Logical and Arithmetic Operations

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.350-357 ◽

2021 ◽

Vol 12 (7) ◽

pp. 350-357

Author(s):

S. S. Shevelev ◽

Keyword(s):

Fixed Point ◽

High Speed ◽

High Performance ◽

Arithmetic Operations ◽

Structural Scheme ◽

Computing Systems ◽

Logical Operations ◽

Addition And Subtraction

A device has been developed that performs logical and arithmetic operations, which can be used to create high-performance, high-speed computing systems. Specialized blocks perform logical operations: AND, OR, NOT, arithmetic operations: addition and subtraction of binary numbers. Arithmetic operations are performed in direct fixed-point codes. The device is presented in the form of a structural scheme, structural and functional schemes of blocks and an algorithm for the operation of the device.

Download Full-text

SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum Supersingular Isogeny Key Exchange

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2018.i3.1-20 ◽

2018 ◽

pp. 1-20 ◽

Cited By ~ 3

Author(s):

Hwajeong Seo ◽

Zhe Liu ◽

Patrick Longa ◽

Zhi Hu

Keyword(s):

High Speed ◽

High Performance ◽

Key Exchange ◽

Modular Arithmetic ◽

Diffie Hellman ◽

Real World Applications ◽

Diffie Hellman Key Exchange ◽

Memory Accesses ◽

Montgomery Reduction ◽

Cache Attacks

We present high-speed implementations of the post-quantum supersingular isogeny Diffie-Hellman key exchange (SIDH) and the supersingular isogeny key encapsulation (SIKE) protocols for 32-bit ARMv7-A processors with NEON support. The high performance of our implementations is mainly due to carefully optimized multiprecision and modular arithmetic that finely integrates both ARM and NEON instructions in order to reduce the number of pipeline stalls and memory accesses, and a new Montgomery reduction technique that combines the use of the UMAAL instruction with a variant of the hybrid-scanning approach. In addition, we present efficient implementations of SIDH and SIKE for 64-bit ARMv8-A processors, based on a high-speed Montgomery multiplication that leverages the power of 64-bit instructions. Our experimental results consolidate the practicality of supersingular isogeny-based protocols for many real-world applications. For example, a full key-exchange execution of SIDHp503 is performed in about 176 million cycles on an ARM Cortex-A15 from the ARMv7-A family (i.e., 88 milliseconds @2.0GHz). On an ARM Cortex-A72 from the ARMv8-A family, the same operation can be carried out in about 90 million cycles (i.e., 45 milliseconds @1.992GHz). All our software is protected against timing and cache attacks. The techniques for modular multiplication presented in this work have broad applications to other cryptographic schemes.

Download Full-text

The SPEEDY Family of Block Ciphers

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i4.510-545 ◽

2021 ◽

pp. 510-545

Author(s):

Gregor Leander ◽

Thorben Moos ◽

Amir Moradi ◽

Shahram Rasoolzadeh

Keyword(s):

Design Process ◽

High Speed ◽

High Performance ◽

Hardware Security ◽

Block Ciphers ◽

Low Latency ◽

Substitution Box ◽

Single Cycle ◽

Low Area ◽

Pseudorandom Function

We introduce SPEEDY, a family of ultra low-latency block ciphers. We mix engineering expertise into each step of the cipher’s design process in order to create a secure encryption primitive with an extremely low latency in CMOS hardware. The centerpiece of our constructions is a high-speed 6-bit substitution box whose coordinate functions are realized as two-level NAND trees. In contrast to other low-latency block ciphers such as PRINCE, PRINCEv2, MANTIS and QARMA, we neither constrain ourselves by demanding decryption at low overhead, nor by requiring a super low area or energy. This freedom together with our gate- and transistor-level considerations allows us to create an ultra low-latency cipher which outperforms all known solutions in single-cycle encryption speed. Our main result, SPEEDY-6-192, is a 6-round 192-bit block and 192-bit key cipher which can be executed faster in hardware than any other known encryption primitive (including Gimli in Even-Mansour scheme and the Orthros pseudorandom function) and offers 128-bit security. One round more, i.e., SPEEDY-7-192, provides full 192-bit security. SPEEDY primarily targets hardware security solutions embedded in high-end CPUs, where area and energy restrictions are secondary while high performance is the number one priority.

Download Full-text

High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

International Journal of Networking and Computing ◽

10.15803/ijnc.2.2_251 ◽

2012 ◽

Vol 2 (2) ◽

pp. 251-268 ◽

Cited By ~ 19

Author(s):

Naoki Nishikawa ◽

Keisuke Iwai ◽

Takakazu Kurokawa

Keyword(s):

High Performance ◽

Block Ciphers ◽

Multicore Cpu ◽

Symmetric Block

Download Full-text

Analysis of Software Implemented Low Entropy Masking Schemes

Security and Communication Networks ◽

10.1155/2018/7206835 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Dan Li ◽

Jiazhe Chen ◽

An Wang ◽

Xiaoyun Wang

Keyword(s):

High Performance ◽

Selection Criterion ◽

Block Ciphers ◽

Absolute Difference ◽

Hardware Implementations ◽

Software Implementations ◽

Low Entropy ◽

Symmetric Block

Low Entropy Masking Schemes (LEMS) are countermeasure techniques to mitigate the high performance overhead of masked hardware and software implementations of symmetric block ciphers by reducing the entropy of the mask sets. The security of LEMS depends on the choice of the mask sets. Previous research mainly focused on searching balanced mask sets for hardware implementations. In this paper, we find that those balanced mask sets may have vulnerabilities in terms of absolute difference when applied in software implemented LEMS. The experiments verify that such vulnerabilities certainly make the software LEMS implementations insecure. To fix the vulnerabilities, we present a selection criterion to choose the mask sets. When some feasible mask sets are already picked out by certain searching algorithms, our selection criterion could be a reference factor to help decide on a more secure one for software LEMS.

Download Full-text

Vacuum System to Minimize the Specimen Contamination of High-Performance EM

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100077967 ◽

1977 ◽

Vol 35 ◽

pp. 68-69

Author(s):

N. Yoshimura ◽

K. Shirota ◽

T. Etoh

Keyword(s):

Electron Microscope ◽

High Speed ◽

High Performance ◽

High Vacuum ◽

Vacuum System ◽

Pump System ◽

Pumping System ◽

Diffusion Pump ◽

Almost All ◽

Cascade Type

One of the most important requirements for a high-performance EM, especially an analytical EM using a fine beam probe, is to prevent specimen contamination by providing a clean high vacuum in the vicinity of the specimen. However, in almost all commercial EMs, the pressure in the vicinity of the specimen under observation is usually more than ten times higher than the pressure measured at the punping line. The EM column inevitably requires the use of greased Viton O-rings for fine movement, and specimens and films need to be exchanged frequently and several attachments may also be exchanged. For these reasons, a high speed pumping system, as well as a clean vacuum system, is now required. A newly developed electron microscope, the JEM-100CX features clean high vacuum in the vicinity of the specimen, realized by the use of a CASCADE type diffusion pump system which has been essentially improved over its predeces- sorD employed on the JEM-100C.

Download Full-text

PHAX-SCAN: Functional integration of a Scanning Electron Microscope and an energy-dispersive x-ray analyser

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100152252 ◽

1989 ◽

Vol 47 ◽

pp. 56-57

Author(s):

Marc H. Peeters ◽

Max T. Otten

Keyword(s):

Electron Microscope ◽

Scanning Electron Microscope ◽

High Speed ◽

High Performance ◽

Functional Integration ◽

Energy Dispersive ◽

X Rays ◽

X Ray ◽

High Speed Analysis ◽

Scanning Electron

Over the past decades, the combination of energy-dispersive analysis of X-rays and scanning electron microscopy has proved to be a powerful tool for fast and reliable elemental characterization of a large variety of specimens. The technique has evolved rapidly from a purely qualitative characterization method to a reliable quantitative way of analysis. In the last 5 years, an increasing need for automation is observed, whereby energy-dispersive analysers control the beam and stage movement of the scanning electron microscope in order to collect digital X-ray images and perform unattended point analysis over multiple locations.The Philips High-speed Analysis of X-rays system (PHAX-Scan) makes use of the high performance dual-processor structure of the EDAX PV9900 analyser and the databus structure of the Philips series 500 scanning electron microscope to provide a highly automated, user-friendly and extremely fast microanalysis system. The software that runs on the hardware described above was specifically designed to provide the ultimate attainable speed on the system.

Download Full-text

The bright future of digital imaging in scanning electron microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100149672 ◽

1993 ◽

Vol 51 ◽

pp. 768-769

Author(s):

M. T. Postek ◽

A. E. Vladar

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Digital Imaging ◽

High Speed ◽

High Performance ◽

Mass Storage ◽

Analog To Digital ◽

Central Processing ◽

Digital Imaging Technology ◽

Scanning Electron

One of the major advancements applied to scanning electron microscopy (SEM) during the past 10 years has been the development and application of digital imaging technology. Advancements in technology, notably the availability of less expensive, high-density memory chips and the development of high speed analog-to-digital converters, mass storage and high performance central processing units have fostered this revolution. Today, most modern SEM instruments have digital electronics as a standard feature. These instruments, generally have 8 bit or 256 gray levels with, at least, 512 × 512 pixel density operating at TV rate. In addition, current slow-scan commercial frame-grabber cards, directly applicable to the SEM, can have upwards of 12-14 bit lateral resolution permitting image acquisition at 4096 × 4096 resolution or greater. The two major categories of SEM systems to which digital technology have been applied are:In the analog SEM system the scan generator is normally operated in an analog manner and the image is displayed in an analog or "slow scan" mode.

Download Full-text