scholarly journals BMOP: Bidirectional Universal Adversarial Learning for Binary OpCode Features

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Xiang Li ◽  
Yuanping Nie ◽  
Zhi Wang ◽  
Xiaohui Kuang ◽  
Kefan Qiu ◽  
...  

For malware detection, current state-of-the-art research concentrates on machine learning techniques. Binary n -gram OpCode features are commonly used for malicious code identification and classification with high accuracy. Binary OpCode modification is much more difficult than modification of image pixels. Traditional adversarial perturbation methods could not be applied on OpCode directly. In this paper, we propose a bidirectional universal adversarial learning method for effective binary OpCode perturbation from both benign and malicious perspectives. Benign features are those OpCodes that represent benign behaviours, while malicious features are OpCodes for malicious behaviours. From a large dataset of benign and malicious binary applications, we select the most significant benign and malicious OpCode features based on the feature SHAP value in the trained machine learning model. We implement an OpCode modification method that insert benign OpCodes into executables as garbage codes without execution and modify malicious OpCodes by equivalent replacement preserving execution semantics. The experimental results show that the benign and malicious OpCode perturbation (BMOP) method could bypass malicious code detection models based on the SVM, XGBoost, and DNN algorithms.

2022 ◽  
pp. 220-249
Author(s):  
Md Ariful Haque ◽  
Sachin Shetty

Financial sectors are lucrative cyber-attack targets because of their immediate financial gain. As a result, financial institutions face challenges in developing systems that can automatically identify security breaches and separate fraudulent transactions from legitimate transactions. Today, organizations widely use machine learning techniques to identify any fraudulent behavior in customers' transactions. However, machine learning techniques are often challenging because of financial institutions' confidentiality policy, leading to not sharing the customer transaction data. This chapter discusses some crucial challenges of handling cybersecurity and fraud in the financial industry and building machine learning-based models to address those challenges. The authors utilize an open-source e-commerce transaction dataset to illustrate the forensic processes by creating a machine learning model to classify fraudulent transactions. Overall, the chapter focuses on how the machine learning models can help detect and prevent fraudulent activities in the financial sector in the age of cybersecurity.


2020 ◽  
Vol 9 (6) ◽  
pp. 379 ◽  
Author(s):  
Eleonora Grilli ◽  
Fabio Remondino

The use of machine learning techniques for point cloud classification has been investigated extensively in the last decade in the geospatial community, while in the cultural heritage field it has only recently started to be explored. The high complexity and heterogeneity of 3D heritage data, the diversity of the possible scenarios, and the different classification purposes that each case study might present, makes it difficult to realise a large training dataset for learning purposes. An important practical issue that has not been explored yet, is the application of a single machine learning model across large and different architectural datasets. This paper tackles this issue presenting a methodology able to successfully generalise to unseen scenarios a random forest model trained on a specific dataset. This is achieved looking for the best features suitable to identify the classes of interest (e.g., wall, windows, roof and columns).


Author(s):  
D Djordjevic ◽  
J Tracey ◽  
M Alqahtani ◽  
J Boyd ◽  
C Go

Background: Infantile spasms (IS) is a devastating pediatric seizure disorder for which EEG referrals are prioritized at the Hospital for Sick Children, representing a resource challenge. The goal of this study was to improve the triaging system for these referrals. Methods: Part 1: descriptive analysis was performed retrospectively on EEG referrals. Part 2: prospective questionnaires were used to determine relative risk of various predictive factors. Part 3: electronic referral form was amended to include 5 positive predictive factors. A triage point system was tested by assigning EEGs as high risk (3 days), standard risk (1 week), or low risk (2 weeks). A machine learning model was developed. Results: Most EEG referrals were from community pediatricians with a low yield of IS diagnoses. Using the 5 predictive factors, the proposed triage system accurately diagnosed all IS within 3 days. No abnormal EEGs were missed in the low-risk category. The machine learning model had over 90% predictive accuracy and will be prospectively tested. Conclusions: Improving EEG triaging for IS may be possible to prioritize higher risk patients. Machine Learning techniques can potentially be applied to help with predictions. We hope that our findings will ultimately improve resource utilization and patient care.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Choudhary Sobhan Shakeel ◽  
Saad Jawaid Khan ◽  
Beenish Chaudhry ◽  
Syeda Fatima Aijaz ◽  
Umer Hassan

Alopecia areata is defined as an autoimmune disorder that results in hair loss. The latest worldwide statistics have exhibited that alopecia areata has a prevalence of 1 in 1000 and has an incidence of 2%. Machine learning techniques have demonstrated potential in different areas of dermatology and may play a significant role in classifying alopecia areata for better prediction and diagnosis. We propose a framework pertaining to the classification of healthy hairs and alopecia areata. We used 200 images of healthy hairs from the Figaro1k dataset and 68 hair images of alopecia areata from the Dermnet dataset to undergo image preprocessing including enhancement and segmentation. This was followed by feature extraction including texture, shape, and color. Two classification techniques, i.e., support vector machine (SVM) and k -nearest neighbor (KNN), are then applied to train a machine learning model with 70% of the images. The remaining image set was used for the testing phase. With a 10-fold cross-validation, the reported accuracies of SVM and KNN are 91.4% and 88.9%, respectively. Paired sample T -test showed significant differences between the two accuracies with a p < 0.001 . SVM generated higher accuracy (91.4%) as compared to KNN (88.9%). The findings of our study demonstrate potential for better prediction in the field of dermatology.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Chalachew Muluken Liyew ◽  
Haileyesus Amsaya Melese

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.


Analysis of patient’s data is always a great idea to get accurate results on using classifiers. A combination of classifiers would give an accurate result than using a single classifier because one single classifier does not give accurate results but always appropriate ones. The aim is to predict the outcome feature of the data set. The “outcome” can contain only two values that is 0 and 1. 0 means patient doesn’t have heart disease and 1 means patient have heart diseases. So, there is a need to build a classification algorithm that can predict the Outcome feature of the test dataset with good accuracy. For this understanding the data is important, and then various classification algorithm can be tested. Then the best model can be selected which gives highest accuracy among all. The built model can then be given to the software developer for building the end user application using the selected machine learning model that will be able to predict the heart disease in a patient.


Personality has been important for a number of types of cooperation; it has useful in predicting job achievement, expert and emotional relationship achievement, and even tendency towards a variety of interfaces. To accurately examine the characters of users, a personality test must be carried out. In numerous areas of online life it is usually impractical to use character research. . We used SVM classification, Random Forest algorithm, Naïve Bayes Algorithm and Logistic regression to comparatively predict the user’s personality accurately. The main goal of the paper is to evaluate the machine learning models using the four parameters- accuracy, precision, recall, f1 score and basing upon these parameters the best machine learning model will be used to classify the big five personality traits of the twitter users.


2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Rajkumar Gangappa Nadakinamani ◽  
A. Reyana ◽  
Sandeep Kautish ◽  
A. S. Vibith ◽  
Yogita Gupta ◽  
...  

Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry’s clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS’s performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.


2020 ◽  
Vol 12 (1) ◽  
pp. 12 ◽  
Author(s):  
You Guo ◽  
Hector Marco-Gisbert ◽  
Paul Keir

A webshell is a command execution environment in the form of web pages. It is often used by attackers as a backdoor tool for web server operations. Accurately detecting webshells is of great significance to web server protection. Most security products detect webshells based on feature-matching methods—matching input scripts against pre-built malicious code collections. The feature-matching method has a low detection rate for obfuscated webshells. However, with the help of machine learning algorithms, webshells can be detected more efficiently and accurately. In this paper, we propose a new PHP webshell detection model, the NB-Opcode (naïve Bayes and opcode sequence) model, which is a combination of naïve Bayes classifiers and opcode sequences. Through experiments and analysis on a large number of samples, the experimental results show that the proposed method could effectively detect a range of webshells. Compared with the traditional webshell detection methods, this method improves the efficiency and accuracy of webshell detection.


2020 ◽  
Vol 34 (20) ◽  
pp. 2050196
Author(s):  
Haozhen Situ ◽  
Zhimin He

Machine learning techniques can help to represent and solve quantum systems. Learning measurement outcome distribution of quantum ansatz is useful for characterization of near-term quantum computing devices. In this work, we use the popular unsupervised machine learning model, variational autoencoder (VAE), to reconstruct the measurement outcome distribution of quantum ansatz. The number of parameters in the VAE are compared with the number of measurement outcomes. The numerical results show that VAE can efficiently learn the measurement outcome distribution with few parameters. The influence of entanglement on the task is also revealed.


Sign in / Sign up

Export Citation Format

Share Document