An Opcode-Based Malware Detection Model Using Supervised Learning Algorithms

Om Prakash Samantray; Satya Narayan Tripathy

doi:10.4018/ijisp.2021100102

An Opcode-Based Malware Detection Model Using Supervised Learning Algorithms

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2021100102 ◽

2021 ◽

Vol 15 (4) ◽

pp. 18-30

Author(s):

Om Prakash Samantray ◽

Satya Narayan Tripathy

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Malware Detection ◽

Support Vector ◽

Detection Accuracy ◽

Detection Techniques ◽

Detection Model ◽

Proposed Model ◽

Tree Classifier ◽

Supervised Learning Algorithms

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.

Download Full-text

An Enhanced Hidden Semi-Markov model for Outlier Detection in Multivariate Datasets

10.22541/au.162685414.44316319/v1 ◽

2021 ◽

Author(s):

G Manoharan ◽

K Sivakumar

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Outlier Detection ◽

Hidden Markov ◽

Research Work ◽

Detection Accuracy ◽

Detection Techniques ◽

Detection Model ◽

Huge Data ◽

Proposed Model

Outlier detection in data mining is an important arena where detection models are developed to discover the objects that do not confirm the expected behavior. The generation of huge data in real time applications makes the outlier detection process into more crucial and challenging. Traditional detection techniques based on mean and covariance are not suitable to handle large amount of data and the results are affected by outliers. So it is essential to develop an efficient outlier detection model to detect outliers in the large dataset. The objective of this research work is to develop an efficient outlier detection model for multivariate data employing the enhanced Hidden Semi-Markov Model (HSMM). It is an extension of conventional Hidden Markov Model (HMM) where the proposed model allows arbitrary time distribution in its states to detect outliers. Experimental results demonstrate the better performance of proposed model in terms of detection accuracy, detection rate. Compared to conventional Hidden Markov Model based outlier detection the detection accuracy of proposed model is obtained as 98.62% which is significantly better for large multivariate datasets.

Download Full-text

A Hybrid Model for Android Malware Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2250.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2656-2662

Keyword(s):

Malware Detection ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Dynamic Parameters ◽

Android Malware ◽

Detection Techniques ◽

Advantages And Disadvantages ◽

Android Malware Detection ◽

Tree Classifier ◽

Hybrid Detection

Android malware have risen exponentially over the past few years, posing several serious threats such as system damage, financial loss, and mobile botnets. Various detection techniques have been proposed in the literature for Android malware detection. Some of the techniques analyze static parameters such as permissions, or intents, whereas, others focus on dynamic parameters such as network traffic or system calls. Static techniques are relatively easier to implement, however, stealthy recent malware evade static detection by virtue of update attacks. Dynamic detection can be used to detect such stealthy malware, however, it increases the computation overhead. Hence, both kinds of techniques have their own advantages and disadvantages. In this paper, we have proposed an innovative hybrid detection model that uses both static and dynamic features for malware analysis and detection. We first rank the static and dynamic parameters according to the information gain and then apply machine learning algorithms in the testing phase. The results indicate that hybrid approach is better than both static and dynamic approaches and the proposed model achieves 98.9% detection accuracy with Decision Tree classifier

Download Full-text

Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma

Briefings in Bioinformatics ◽

10.1093/bib/bbx153 ◽

2017 ◽

Vol 20 (3) ◽

pp. 985-994 ◽

Cited By ~ 15

Author(s):

Leili Shahriyari

Keyword(s):

Supervised Learning ◽

Unit Length ◽

Learning Algorithms ◽

Colon Adenocarcinoma ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Maximum Accuracy ◽

Normalization Methods ◽

Supervised Learning Algorithms

Abstract Motivation: One of the main challenges in machine learning (ML) is choosing an appropriate normalization method. Here, we examine the effect of various normalization methods on analyzing FPKM upper quartile (FPKM-UQ) RNA sequencing data sets. We collect the HTSeq-FPKM-UQ files of patients with colon adenocarcinoma from TCGA-COAD project. We compare three most common normalization methods: scaling, standardizing using z-score and vector normalization by visualizing the normalized data set and evaluating the performance of 12 supervised learning algorithms on the normalized data set. Additionally, for each of these normalization methods, we use two different normalization strategies: normalizing samples (files) or normalizing features (genes). Results: Regardless of normalization methods, a support vector machine (SVM) model with the radial basis function kernel had the maximum accuracy (78%) in predicting the vital status of the patients. However, the fitting time of SVM depended on the normalization methods, and it reached its minimum fitting time when files were normalized to the unit length. Furthermore, among all 12 learning algorithms and 6 different normalization techniques, the Bernoulli naive Bayes model after standardizing files had the best performance in terms of maximizing the accuracy as well as minimizing the fitting time. We also investigated the effect of dimensionality reduction methods on the performance of the supervised ML algorithms. Reducing the dimension of the data set did not increase the maximum accuracy of 78%. However, it leaded to discovery of the 7SK RNA gene expression as a predictor of survival in patients with colon adenocarcinoma with accuracy of 78%.

Download Full-text

Identification Of Hepatocellular Carcinoma Using Supervised Learning Algorithms

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v14i3.1992 ◽

2014 ◽

Vol 14 (3) ◽

pp. 5535-5542

Author(s):

Sagri Sharma ◽

Sanjay Kadam ◽

Hemant Darbari

Keyword(s):

Gene Expression ◽

Hepatocellular Carcinoma ◽

Supervised Learning ◽

Dna Analysis ◽

Learning Algorithm ◽

Expression Profiles ◽

Learning Algorithms ◽

Support Vector ◽

Unseen Data ◽

Supervised Learning Algorithms

Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms & statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data

Download Full-text

Supervised Learning Algorithms of Machine Learning: Prediction of Brand Loyalty

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9498.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 3886-3889

Keyword(s):

Logistic Regression ◽

Supervised Learning ◽

Supervised Classification ◽

Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Sample Data ◽

Supervised Learning Algorithms ◽

Bayes Algorithm

The present research explores the loyalty prediction problem of a brand through supervised learning algorithms of classifications: logistic regression, decision tree, support vector machine, bayes algorithm and K-nearest neighbors (KNN) algorithm. 265 customers’ FMCG loyalty sample data were taken and variables of the data set include; loyalty status, gender, family size, age, frequency of purchase, and FMCG purchase. Data have been analyzed with the help of Python packages such as Pandas (Data analysis), Numpy (Numerical calculation), Matplotlib (Visualization), and Sklearn (Modeling). Among the supervised classification algorithms, logistic regression has outperformed than other techniques

Download Full-text

Heart ID: Human Identification Based on Radar Micro-Doppler Signatures of the Heart Using Deep Learning

Remote Sensing ◽

10.3390/rs11101220 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1220 ◽

Cited By ~ 1

Author(s):

Peibei Cao ◽

Weijie Xia ◽

Yi Li

Keyword(s):

Supervised Learning ◽

Doppler Radar ◽

User Authentication ◽

Learning Algorithms ◽

Human Identification ◽

Optical Systems ◽

Support Vector ◽

Time Frequency ◽

Average Accuracy ◽

Supervised Learning Algorithms

Human identification based on radar signatures of individual heartbeats is crucial in various applications, including user authentication in mobile devices, identification of escaped criminals, etc. Usually, optical systems employed to recognize humans are sensitive to ambient light environments, while radar does not have such a drawback, since it has high penetration and all-weather capability. Meanwhile, since micro-Doppler characteristics from the heart of different people are distinct and not easy to fake, it can be used for identification. In this paper, we employed a deep convolutional neural network (DCNN) and conventional supervised learning methods to realize heartbeat-based identification. First, the heartbeat signals were acquired by a Doppler radar and processed by short-time Fourier transform. Then, predefined features were extracted for the conventional supervised learning algorithms, while time–frequency graphs were directly inputted to the DCNN since the network had its own feature extraction part. It is shown that the DCNN could achieve average accuracy of 98.5% for identifying four people, and higher than 80% when the number of people was less than ten. For conventional supervised learning algorithms when identifying four people, the accuracy of the support vector machine (SVM) was 88.75%, and the accuracy of SVM–Bayes was 91.25%, while naive Bayes had the lowest accuracy of 80.75%.

Download Full-text

Analysis of Tree Based Supervised Learning Algorithms on Medical Data

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.9.04.2019.p8817 ◽

2019 ◽

Vol 9 (4) ◽

pp. p8817

Author(s):

Thin Thin Swe

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Medical Data ◽

Supervised Learning Algorithms

Download Full-text

Using UAV-Based Hyperspectral Imagery to Detect Winter Wheat Fusarium Head Blight

Remote Sensing ◽

10.3390/rs13153024 ◽

2021 ◽

Vol 13 (15) ◽

pp. 3024

Author(s):

Huiqin Ma ◽

Wenjiang Huang ◽

Yingying Dong ◽

Linyi Liu ◽

Anting Guo

Keyword(s):

Winter Wheat ◽

Fusarium Head Blight ◽

Hyperspectral Imagery ◽

Spectral Feature ◽

Support Vector ◽

Detection Accuracy ◽

Feature Combination ◽

Field Scale ◽

Head Blight ◽

Detection Model

Fusarium head blight (FHB) is a major winter wheat disease in China. The accurate and timely detection of wheat FHB is vital to scientific field management. By combining three types of spectral features, namely, spectral bands (SBs), vegetation indices (VIs), and wavelet features (WFs), in this study, we explore the potential of using hyperspectral imagery obtained from an unmanned aerial vehicle (UAV), to detect wheat FHB. First, during the wheat filling period, two UAV-based hyperspectral images were acquired. SBs, VIs, and WFs that were sensitive to wheat FHB were extracted and optimized from the two images. Subsequently, a field-scale wheat FHB detection model was formulated, based on the optimal spectral feature combination of SBs, VIs, and WFs (SBs + VIs + WFs), using a support vector machine. Two commonly used data normalization algorithms were utilized before the construction of the model. The single WFs, and the spectral feature combination of optimal SBs and VIs (SBs + VIs), were respectively used to formulate models for comparison and testing. The results showed that the detection model based on the normalized SBs + VIs + WFs, using min–max normalization algorithm, achieved the highest R2 of 0.88 and the lowest RMSE of 2.68% among the three models. Our results suggest that UAV-based hyperspectral imaging technology is promising for the field-scale detection of wheat FHB. Combining traditional SBs and VIs with WFs can improve the detection accuracy of wheat FHB effectively.

Download Full-text

Investigating the performance of the supervised learning algorithms for estimating NPPs parameters in combination with the different feature selection techniques

Annals of Nuclear Energy ◽

10.1016/j.anucene.2021.108299 ◽

2021 ◽

Vol 158 ◽

pp. 108299

Author(s):

Khalil Moshkbar-Bakhshayesh

Keyword(s):

Feature Selection ◽

Supervised Learning ◽

Learning Algorithms ◽

Supervised Learning Algorithms ◽

Feature Selection Techniques

Download Full-text

Supervised learning algorithms in the classification of plant populations with different degrees of kinship

Revista Brasileira de Botânica ◽

10.1007/s40415-021-00703-1 ◽

2021 ◽

Author(s):

Leandro Skowronski ◽

Paula Martin de Moraes ◽

Mario Luiz Teixeira de Moraes ◽

Wesley Nunes Gonçalves ◽

Michel Constantino ◽

...

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Plant Populations ◽

Supervised Learning Algorithms

Download Full-text