The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations

We concentrate on machine learning techniques used for profiled sidechannel analysis in the presence of imbalanced data. Such scenarios are realistic and often occurring, for instance in the Hamming weight or Hamming distance leakage models. In order to deal with the imbalanced data, we use various balancing techniques and we show that most of them help in mounting successful attacks when the data is highly imbalanced. Especially, the results with the SMOTE technique are encouraging, since we observe some scenarios where it reduces the number of necessary measurements more than 8 times. Next, we provide extensive results on comparison of machine learning and side-channel metrics, where we show that machine learning metrics (and especially accuracy as the most often used one) can be extremely deceptive. This finding opens a need to revisit the previous works and their results in order to properly assess the performance of machine learning in side-channel analysis.

Download Full-text

Systematic Side-Channel Analysis of Curve25519 with Machine Learning

Journal of Hardware and Systems Security ◽

10.1007/s41635-020-00106-w ◽

2020 ◽

Vol 4 (4) ◽

pp. 314-328

Author(s):

Léo Weissbart ◽

Łukasz Chmielewski ◽

Stjepan Picek ◽

Lejla Batina

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Side Channel ◽

Single Measurement ◽

Excellent Performance ◽

Side Channel Analysis ◽

Learning Techniques ◽

Attack Phase ◽

Channel Analysis ◽

Symmetric Key

AbstractProfiling attacks, especially those based on machine learning, proved to be very successful techniques in recent years when considering the side-channel analysis of symmetric-key crypto implementations. At the same time, the results for implementations of asymmetric-key cryptosystems are very sparse. This paper considers several machine learning techniques to mount side-channel attacks on two implementations of scalar multiplication on the elliptic curve Curve25519. The first implementation follows the baseline implementation with complete formulae as used for EdDSA in WolfSSl, where we exploit power consumption as a side-channel. The second implementation features several countermeasures, and in this case, we analyze electromagnetic emanations to find side-channel leakage. Most techniques considered in this work result in potent attacks, and especially the method of choice appears to be convolutional neural networks (CNNs), which can break the first implementation with only a single measurement in the attack phase. The same convolutional neural network demonstrated excellent performance for attacking AES cipher implementations. Our results show that some common grounds can be established when using deep learning for profiling attacks on very different cryptographic algorithms and their corresponding implementations.

Download Full-text

Reverse engineering smart card malware using side channel analysis with machine learning techniques

2016 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2016.7841039 ◽

2016 ◽

Cited By ~ 2

Author(s):

Hippolyte Djonon Tsague ◽

Bheki Twala

Keyword(s):

Machine Learning ◽

Reverse Engineering ◽

Smart Card ◽

Machine Learning Techniques ◽

Side Channel ◽

Side Channel Analysis ◽

Learning Techniques ◽

Channel Analysis

Download Full-text

A Novel Evaluation Metric for Deep Learning-Based Side Channel Analysis and Its Extended Application to Imbalanced Data

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2020.i3.73-96 ◽

2020 ◽

pp. 73-96

Author(s):

Jiajia Zhang ◽

Mengce Zheng ◽

Jiehui Nan ◽

Honggang Hu ◽

Nenghai Yu

Keyword(s):

Deep Learning ◽

Loss Function ◽

Imbalanced Data ◽

Cross Entropy ◽

Side Channel ◽

Worst Case ◽

Side Channel Analysis ◽

Learning Techniques ◽

Channel Analysis ◽

Learning Metrics

Since Kocher (CRYPTO’96) proposed timing attack, side channel analysis (SCA) has shown great potential to break cryptosystems via physical leakage. Recently, deep learning techniques are widely used in SCA and show equivalent and even better performance compared to traditional methods. However, it remains unknown why and when deep learning techniques are effective and efficient for SCA. Masure et al. (IACR TCHES 2020(1):348–375) illustrated that deep learning paradigm is suitable for evaluating implementations against SCA from a worst-case scenario point of view, yet their work is limited to balanced data and a specific loss function. Besides, deep learning metrics are not consistent with side channel metrics. In most cases, they are deceptive in foreseeing the feasibility and complexity of mounting a successful attack, especially for imbalanced data. To mitigate the gap between deep learning metrics and side channel metrics, we propose a novel Cross Entropy Ratio (CER) metric to evaluate the performance of deep learning models for SCA. CER is closely related to traditional side channel metrics Guessing Entropy (GE) and Success Rate (SR) and fits to deep learning scenario. Besides, we show that it works stably while deep learning metrics such as accuracy becomes rather unreliable when the training data tends to be imbalanced. However, estimating CER can be done as easy as natural metrics in deep learning algorithms with low computational complexity. Furthermore, we adapt CER metric to a new kind of loss function, namely CER loss function, designed specifically for deep learning in side channel scenario. In this way, we link directly the SCA objective to deep learning optimization. Our experiments on several datasets show that, for SCA with imbalanced data, CER loss function outperforms Cross Entropy loss function in various conditions.

Download Full-text

It Started with Templates: The Future of Profiling in Side-Channel Analysis

Security of Ubiquitous Computing Systems ◽

10.1007/978-3-030-10591-4_8 ◽

2021 ◽

pp. 133-145

Author(s):

Lejla Batina ◽

Milena Djukanovic ◽

Annelie Heuser ◽

Stjepan Picek

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Side Channel ◽

Side Channel Attacks ◽

Worst Case ◽

Learning Techniques ◽

The Future ◽

First Results ◽

Channel Analysis

AbstractSide-channel attacks (SCAs) are powerful attacks based on the information obtained from the implementation of cryptographic devices. Profiling side-channel attacks has received a lot of attention in recent years due to the fact that this type of attack defines the worst-case security assumptions. The SCA community realized that the same approach is actually used in other domains in the form of supervised machine learning. Consequently, some researchers started experimenting with different machine learning techniques and evaluating their effectiveness in the SCA context. More recently, we are witnessing an increase in the use of deep learning techniques in the SCA community with strong first results in side-channel analyses, even in the presence of countermeasures. In this chapter, we consider the evolution of profiling attacks, and subsequently we discuss the impacts they have made in the data preprocessing, feature engineering, and classification phases. We also speculate on the future directions and the best-case consequences for the security of small devices.

Download Full-text

Bug Severity Prediction using Class Imbalance Problem

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7297.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 2687-2695

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Machine Learning Techniques ◽

System Level ◽

Class Imbalance Problem ◽

Component Level ◽

Software Bugs ◽

Imbalance Problem ◽

Learning Techniques

Class imbalance problem is often observed when instances of major class exceed instances of minor class. The performance of machine learning techniques is immensely afflicted by imbalanced data in several fields. The skewed distribution either predicts the majority class with high error rate or will not foresee the minority class. To solve the problem of imbalanced data of software bugs, Synthetic minority oversampling technique (SMOTE) is used which balances the imbalanced datasets of Apache Projects. It is applied on summary of bugs to balance the dataset and predicts severity at system and component level. Several machine learning techniques are applied on imbalanced as well as balanced datasets to predict the severity of software bugs using textual description. Test outcomes and statistical analysis shows improved results on balanced datasets in respect to Gmean and balance metrics instead of machine learning techniques applied on imbalanced data. Evaluation metrics Gmean improves by 34% and balance by 11% at system level and by 42% and 62% at component level. Further, it was observed that solving class imbalance problem on textual data is helpful in augmenting the performance.

Download Full-text

Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance

2020 23rd International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccit51783.2020.9392694 ◽

2020 ◽

Author(s):

Kazi Amit Hasan ◽

Md. Al Mehedi Hasan

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Class Imbalance ◽

Clinical Risk Factors ◽

Machine Learning Techniques ◽

Clinical Risk ◽

Learning Techniques

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Template Attacks vs. Machine Learning Revisited (and the Curse of Dimensionality in Side-Channel Analysis)

Constructive Side-Channel Analysis and Secure Design - Lecture Notes in Computer Science ◽

10.1007/978-3-319-21476-4_2 ◽

2015 ◽

pp. 20-33 ◽

Cited By ~ 32

Author(s):

Liran Lerman ◽

Romain Poussier ◽

Gianluca Bontempi ◽

Olivier Markowitch ◽

François-Xavier Standaert

Keyword(s):

Machine Learning ◽

Curse Of Dimensionality ◽

Side Channel ◽

Side Channel Analysis ◽

Channel Analysis ◽

Template Attacks

Download Full-text

Malicious web domain identification using online credibility and performance data by considering the class imbalance issue

Industrial Management & Data Systems ◽

10.1108/imds-02-2018-0072 ◽

2019 ◽

Vol 119 (3) ◽

pp. 676-696 ◽

Cited By ~ 5

Author(s):

Zhongyi Hu ◽

Raymond Chiong ◽

Ilung Pranata ◽

Yukun Bao ◽

Yuqing Lin

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Performance Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

Domain Identification ◽

Learning Techniques ◽

And Performance

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Download Full-text