ModuloNET: Neural Networks Meet Modular Arithmetic for Efficient Hardware Masking

Intellectual Property (IP) thefts of trained machine learning (ML) models through side-channel attacks on inference engines are becoming a major threat. Indeed, several recent works have shown reverse engineering of the model internals using such attacks, but the research on building defenses is largely unexplored. There is a critical need to efficiently and securely transform those defenses from cryptography such as masking to ML frameworks. Existing works, however, revealed that a straightforward adaptation of such defenses either provides partial security or leads to high area overheads. To address those limitations, this work proposes a fundamentally new direction to construct neural networks that are inherently more compatible with masking. The key idea is to use modular arithmetic in neural networks and then efficiently realize masking, in either Boolean or arithmetic fashion, depending on the type of neural network layers. We demonstrate our approach on the edge-computing friendly binarized neural networks (BNN) and show how to modify the training and inference of such a network to work with modular arithmetic without sacrificing accuracy. We then design novel masking gadgets using Domain-Oriented Masking (DOM) to efficiently mask the unique operations of ML such as the activation function and the output layer classification, and we prove their security in the glitch-extended probing model. Finally, we implement fully masked neural networks on an FPGA, quantify that they can achieve a similar latency while reducing the FF and LUT costs over the state-of-the-art protected implementations by 34.2% and 42.6%, respectively, and demonstrate their first-order side-channel security with up to 1M traces.

Download Full-text

A secure and highly efficient first-order masking scheme for AES linear operations

Cybersecurity ◽

10.1186/s42400-021-00082-w ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Jingdian Ming ◽

Yongbin Zhou ◽

Huizhong Li ◽

Qian Zhang

Keyword(s):

Cost Reduction ◽

Provable Security ◽

State Of The Art ◽

Side Channel ◽

First Order ◽

Highly Efficient ◽

Non Linear ◽

Device Independence ◽

Practical Implications ◽

Provably Secure

AbstractDue to its provable security and remarkable device-independence, masking has been widely accepted as a noteworthy algorithmic-level countermeasure against side-channel attacks. However, relatively high cost of masking severely limits its applicability. Considering the high tackling complexity of non-linear operations, most masked AES implementations focus on the security and cost reduction of masked S-boxes. In this paper, we focus on linear operations, which seems to be underestimated, on the contrary. Specifically, we discover some security flaws and redundant processes in popular first-order masked AES linear operations, and pinpoint the underlying root causes. Then we propose a provably secure and highly efficient masking scheme for AES linear operations. In order to show its practical implications, we replace the linear operations of state-of-the-art first-order AES masking schemes with our proposal, while keeping their original non-linear operations unchanged. We implement four newly combined masking schemes on an Intel Core i7-4790 CPU, and the results show they are roughly 20% faster than those original ones. Then we select one masked implementation named RSMv2 due to its popularity, and investigate its security and efficiency on an AVR ATMega163 processor and four different FPGA devices. The results show that no exploitable first-order side-channel leakages are detected. Moreover, compared with original masked AES implementations, our combined approach is nearly 25% faster on the AVR processor, and at least 70% more efficient on four FPGA devices.

Download Full-text

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa039 ◽

2021 ◽

Author(s):

Bubacarr Bah ◽

Holger Rauhut ◽

Ulrich Terstiege ◽

Michael Westdickenberg

Keyword(s):

Neural Networks ◽

Critical Point ◽

Global Minimum ◽

Gradient Flow ◽

Activation Function ◽

Riemannian Metric ◽

Gradient Flows ◽

Global Minimizers ◽

Network Layers ◽

Almost All

Abstract We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.

Download Full-text

Multiplicative Masking for AES in Hardware

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2018.i3.431-468 ◽

2018 ◽

pp. 431-468

Author(s):

Lauren De Meyer ◽

Oscar Reparaz ◽

Begül Bilgin

Keyword(s):

Design Process ◽

Hardware Implementation ◽

State Of The Art ◽

Second Order ◽

The Other ◽

Bivariate Analysis ◽

Side Channel ◽

Power Model ◽

First Order ◽

Previous State

Hardware masked AES designs usually rely on Boolean masking and perform the computation of the S-box using the tower-field decomposition. On the other hand, splitting sensitive variables in a multiplicative way is more amenable for the computation of the AES S-box, as noted by Akkar and Giraud. However, multiplicative masking needs to be implemented carefully not to be vulnerable to first-order DPA with a zero-value power model. Up to now, sound higher-order multiplicative masking schemes have been implemented only in software. In this work, we demonstrate the first hardware implementation of AES using multiplicative masks. The method is tailored to be secure even if the underlying gates are not ideal and glitches occur in the circuit. We detail the design process of first- and second-order secure AES-128 cores, which result in the smallest die area to date among previous state-of-the-art masked AES implementations with comparable randomness cost and latency. The first- and second-order masked implementations improve resp. 29% and 18% over these designs. We deploy our construction on a Spartan-6 FPGA and perform a side-channel evaluation. No leakage is detected with up to 50 million traces for both our first- and second-order implementation. For the latter, this holds both for univariate and bivariate analysis.

Download Full-text

Efficient Side-Channel Protections of ARX Ciphers

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2018.i3.627-653 ◽

2018 ◽

pp. 627-653 ◽

Cited By ~ 1

Author(s):

Bernhard Jungk ◽

Richard Petri ◽

Marc Stöttinger

Keyword(s):

High Performance ◽

State Of The Art ◽

Side Channel ◽

First Order ◽

Current State ◽

Assembly Instructions ◽

Improved Performance ◽

Good Trade ◽

Published Result ◽

Very High

The current state of the art of Boolean masking for the modular addition operation in software has a very high performance overhead. Firstly, the instruction count is very high compared to a normal addition operation. Secondly, until recently, the entropy consumed by such protections was also quite high. Our paper significantly improves both aspects, by applying the Threshold Implementation (TI) methodology with two shares and by reusing internal values as randomness source in such a way that the uniformity is always preserved. Our approach performs considerably faster compared to the previously known masked addition and subtraction algorithms by Coron et al. and Biryukov et al. improving the state of the art by 36%, if we only consider the number of ARM assembly instructions. Furthermore, similar to the masked adder from Biryukov et al. we reduce the amount of randomness and only require one bit additional entroy per addition, which is a good trade-off for the improved performance. We applied our improved masked adder to ChaCha20, for which we provide two new first-order protected implementations and achieve a 36% improvement over the best published result for ChaCha20 using an ARM Cortex-M4 microprocessor.

Download Full-text

Methodology for Efficient CNN Architectures in Profiling Attacks

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2020.i1.1-36 ◽

2019 ◽

pp. 1-36

Author(s):

Gabriel Zaid ◽

Lilian Bossuet ◽

Amaury Habrard ◽

Alexandre Venelli

Keyword(s):

Neural Networks ◽

Deep Learning ◽

State Of The Art ◽

Side Channel ◽

Clear Understanding ◽

Learning Techniques ◽

Previous State ◽

Template Attacks ◽

Public Datasets ◽

Visualization Techniques

The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Previous works have shown the benefit of using convolutional neural networks (CNN) to limit the effect of some countermeasures such as desynchronization. Compared with template attacks, deep learning techniques can deal with trace misalignment and the high dimensionality of the data. Pre-processing is no longer mandatory. However, the performance of attacks depends to a great extent on the choice of each hyperparameter used to configure a CNN architecture. Hence, we cannot perfectly harness the potential of deep neural networks without a clear understanding of the network’s inner-workings. To reduce this gap, we propose to clearly explain the role of each hyperparameters during the feature selection phase using some specific visualization techniques including Weight Visualization, Gradient Visualization and Heatmaps. By highlighting which features are retained by filters, heatmaps come in handy when a security evaluator tries to interpret and understand the efficiency of CNN. We propose a methodology for building efficient CNN architectures in terms of attack efficiency and network complexity, even in the presence of desynchronization. We evaluate our methodology using public datasets with and without desynchronization. In each case, our methodology outperforms the previous state-of-the-art CNN models while significantly reducing network complexity. Our networks are up to 25 times more efficient than previous state-of-the-art while their complexity is up to 31810 times smaller. Our results show that CNN networks do not need to be very complex to perform well in the side-channel context.

Download Full-text

SCORING MODELING BASED ON NEURAL NETWORKS FOR DETERMINING A BANK BORROWER'S RATING

Economy of Ukraine ◽

10.15407/economyukr.2020.10.054 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 54-62

Author(s):

Oleksii VASYLIEV ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Statistical Data ◽

Activation Function ◽

Decision Making Process ◽

Neural Network Architecture ◽

Acceptable Accuracy ◽

The Neural Network ◽

Sigmoid Activation Function

The problem of applying neural networks to calculate ratings used in banking in the decision-making process on granting or not granting loans to borrowers is considered. The task is to determine the rating function of the borrower based on a set of statistical data on the effectiveness of loans provided by the bank. When constructing a regression model to calculate the rating function, it is necessary to know its general form. If so, the task is to calculate the parameters that are included in the expression for the rating function. In contrast to this approach, in the case of using neural networks, there is no need to specify the general form for the rating function. Instead, certain neural network architecture is chosen and parameters are calculated for it on the basis of statistical data. Importantly, the same neural network architecture can be used to process different sets of statistical data. The disadvantages of using neural networks include the need to calculate a large number of parameters. There is also no universal algorithm that would determine the optimal neural network architecture. As an example of the use of neural networks to determine the borrower's rating, a model system is considered, in which the borrower's rating is determined by a known non-analytical rating function. A neural network with two inner layers, which contain, respectively, three and two neurons and have a sigmoid activation function, is used for modeling. It is shown that the use of the neural network allows restoring the borrower's rating function with quite acceptable accuracy.

Download Full-text

AutoLinker: Automatic Fragment Linking with Deep Conditional Transformer Neural Networks

10.26434/chemrxiv.12271508.v2 ◽

2020 ◽

Author(s):

Yuyao Yang ◽

Shuangjia Zheng ◽

Shimin Su ◽

Jun Xu ◽

Hongming Chen

Keyword(s):

Neural Networks ◽

Drug Discovery ◽

Drug Design ◽

State Of The Art ◽

The State ◽

Generative Model ◽

Lead Optimization ◽

Scaffold Hopping ◽

Reference Models ◽

Lead Generation

Fragment based drug design represents a promising drug discovery paradigm complimentary to the traditional HTS based lead generation strategy. How to link fragment structures to increase compound affinity is remaining a challenge task in this paradigm. Hereby a novel deep generative model (AutoLinker) for linking fragments is developed with the potential for applying in the fragment-based lead generation scenario. The state-of-the-art transformer architecture was employed to learn the linker grammar and generate novel linker. Our results show that, given starting fragments and user customized linker constraints, our AutoLinker model can design abundant drug-like molecules fulfilling these constraints and its performance was superior to other reference models. Moreover, several examples were showcased that AutoLinker can be useful tools for carrying out drug design tasks such as fragment linking, lead optimization and scaffold hopping.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

The New Activation Function for Complex Valued Neural Networks: Complex Swish Function

4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings ◽

10.36287/setsci.4.6.050 ◽

2019 ◽

Author(s):

Mehmet Çelebi ◽

Murat Ceylan

Keyword(s):

Neural Networks ◽

Activation Function ◽

Complex Valued

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text