scholarly journals The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations

Author(s):  
Stjepan Picek ◽  
Annelie Heuser ◽  
Alan Jovic ◽  
Shivam Bhasin ◽  
Francesco Regazzoni

We concentrate on machine learning techniques used for profiled sidechannel analysis in the presence of imbalanced data. Such scenarios are realistic and often occurring, for instance in the Hamming weight or Hamming distance leakage models. In order to deal with the imbalanced data, we use various balancing techniques and we show that most of them help in mounting successful attacks when the data is highly imbalanced. Especially, the results with the SMOTE technique are encouraging, since we observe some scenarios where it reduces the number of necessary measurements more than 8 times. Next, we provide extensive results on comparison of machine learning and side-channel metrics, where we show that machine learning metrics (and especially accuracy as the most often used one) can be extremely deceptive. This finding opens a need to revisit the previous works and their results in order to properly assess the performance of machine learning in side-channel analysis.

2020 ◽  
Vol 4 (4) ◽  
pp. 314-328
Author(s):  
Léo Weissbart ◽  
Łukasz Chmielewski ◽  
Stjepan Picek ◽  
Lejla Batina

AbstractProfiling attacks, especially those based on machine learning, proved to be very successful techniques in recent years when considering the side-channel analysis of symmetric-key crypto implementations. At the same time, the results for implementations of asymmetric-key cryptosystems are very sparse. This paper considers several machine learning techniques to mount side-channel attacks on two implementations of scalar multiplication on the elliptic curve Curve25519. The first implementation follows the baseline implementation with complete formulae as used for EdDSA in WolfSSl, where we exploit power consumption as a side-channel. The second implementation features several countermeasures, and in this case, we analyze electromagnetic emanations to find side-channel leakage. Most techniques considered in this work result in potent attacks, and especially the method of choice appears to be convolutional neural networks (CNNs), which can break the first implementation with only a single measurement in the attack phase. The same convolutional neural network demonstrated excellent performance for attacking AES cipher implementations. Our results show that some common grounds can be established when using deep learning for profiling attacks on very different cryptographic algorithms and their corresponding implementations.


Author(s):  
Jiajia Zhang ◽  
Mengce Zheng ◽  
Jiehui Nan ◽  
Honggang Hu ◽  
Nenghai Yu

Since Kocher (CRYPTO’96) proposed timing attack, side channel analysis (SCA) has shown great potential to break cryptosystems via physical leakage. Recently, deep learning techniques are widely used in SCA and show equivalent and even better performance compared to traditional methods. However, it remains unknown why and when deep learning techniques are effective and efficient for SCA. Masure et al. (IACR TCHES 2020(1):348–375) illustrated that deep learning paradigm is suitable for evaluating implementations against SCA from a worst-case scenario point of view, yet their work is limited to balanced data and a specific loss function. Besides, deep learning metrics are not consistent with side channel metrics. In most cases, they are deceptive in foreseeing the feasibility and complexity of mounting a successful attack, especially for imbalanced data. To mitigate the gap between deep learning metrics and side channel metrics, we propose a novel Cross Entropy Ratio (CER) metric to evaluate the performance of deep learning models for SCA. CER is closely related to traditional side channel metrics Guessing Entropy (GE) and Success Rate (SR) and fits to deep learning scenario. Besides, we show that it works stably while deep learning metrics such as accuracy becomes rather unreliable when the training data tends to be imbalanced. However, estimating CER can be done as easy as natural metrics in deep learning algorithms with low computational complexity. Furthermore, we adapt CER metric to a new kind of loss function, namely CER loss function, designed specifically for deep learning in side channel scenario. In this way, we link directly the SCA objective to deep learning optimization. Our experiments on several datasets show that, for SCA with imbalanced data, CER loss function outperforms Cross Entropy loss function in various conditions.


Author(s):  
Lejla Batina ◽  
Milena Djukanovic ◽  
Annelie Heuser ◽  
Stjepan Picek

AbstractSide-channel attacks (SCAs) are powerful attacks based on the information obtained from the implementation of cryptographic devices. Profiling side-channel attacks has received a lot of attention in recent years due to the fact that this type of attack defines the worst-case security assumptions. The SCA community realized that the same approach is actually used in other domains in the form of supervised machine learning. Consequently, some researchers started experimenting with different machine learning techniques and evaluating their effectiveness in the SCA context. More recently, we are witnessing an increase in the use of deep learning techniques in the SCA community with strong first results in side-channel analyses, even in the presence of countermeasures. In this chapter, we consider the evolution of profiling attacks, and subsequently we discuss the impacts they have made in the data preprocessing, feature engineering, and classification phases. We also speculate on the future directions and the best-case consequences for the security of small devices.


2019 ◽  
Vol 8 (4) ◽  
pp. 2687-2695

Class imbalance problem is often observed when instances of major class exceed instances of minor class. The performance of machine learning techniques is immensely afflicted by imbalanced data in several fields. The skewed distribution either predicts the majority class with high error rate or will not foresee the minority class. To solve the problem of imbalanced data of software bugs, Synthetic minority oversampling technique (SMOTE) is used which balances the imbalanced datasets of Apache Projects. It is applied on summary of bugs to balance the dataset and predicts severity at system and component level. Several machine learning techniques are applied on imbalanced as well as balanced datasets to predict the severity of software bugs using textual description. Test outcomes and statistical analysis shows improved results on balanced datasets in respect to Gmean and balance metrics instead of machine learning techniques applied on imbalanced data. Evaluation metrics Gmean improves by 34% and balance by 11% at system level and by 42% and 62% at component level. Further, it was observed that solving class imbalance problem on textual data is helpful in augmenting the performance.


2019 ◽  
Vol 119 (3) ◽  
pp. 676-696 ◽  
Author(s):  
Zhongyi Hu ◽  
Raymond Chiong ◽  
Ilung Pranata ◽  
Yukun Bao ◽  
Yuqing Lin

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.


Sign in / Sign up

Export Citation Format

Share Document