scholarly journals Ensemble Malware Classification System Using Deep Neural Networks

Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 721 ◽  
Author(s):  
Barath Narayanan Narayanan ◽  
Venkata Salini Priyamvada Davuluru

With the advancement of technology, there is a growing need of classifying malware programs that could potentially harm any computer system and/or smaller devices. In this research, an ensemble classification system comprising convolutional and recurrent neural networks is proposed to distinguish malware programs. Microsoft’s Malware Classification Challenge (BIG 2015) dataset with nine distinct classes is utilized for this study. This dataset contains an assembly file and a compiled file for each malware program. Compiled files are visualized as images and are classified using Convolutional Neural Networks (CNNs). Assembly files consist of machine language opcodes that are distinguished among classes using Long Short-Term Memory (LSTM) networks after converting them into sequences. In addition, features are extracted from these architectures (CNNs and LSTM) and are classified using a support vector machine or logistic regression. An accuracy of 97.2% is achieved using LSTM network for distinguishing assembly files, 99.4% using CNN architecture for classifying compiled files and an overall accuracy of 99.8% using the proposed ensemble approach thereby setting a new benchmark. An independent and automated classification system for assembly and/or compiled files provides the luxury to anti-malware industry experts to choose the type of system depending on their available computational resources.

2020 ◽  
Vol 12 (2) ◽  
pp. 326 ◽  
Author(s):  
Heriberto A. Garcia ◽  
Trenton Couture ◽  
Amit Galor ◽  
Jessica M. Topple ◽  
Wei Huang ◽  
...  

A large variety of sound sources in the ocean, including biological, geophysical, and man-made, can be simultaneously monitored over instantaneous continental-shelf scale regions via the passive ocean acoustic waveguide remote sensing (POAWRS) technique by employing a large-aperture densely-populated coherent hydrophone array system. Millions of acoustic signals received on the POAWRS system per day can make it challenging to identify individual sound sources. An automated classification system is necessary to enable sound sources to be recognized. Here, the objectives are to (i) gather a large training and test data set of fin whale vocalization and other acoustic signal detections; (ii) build multiple fin whale vocalization classifiers, including a logistic regression, support vector machine (SVM), decision tree, convolutional neural network (CNN), and long short-term memory (LSTM) network; (iii) evaluate and compare performance of these classifiers using multiple metrics including accuracy, precision, recall and F1-score; and (iv) integrate one of the classifiers into the existing POAWRS array and signal processing software. The findings presented here will (1) provide an automatic classifier for near real-time fin whale vocalization detection and recognition, useful in marine mammal monitoring applications; and (2) lay the foundation for building an automatic classifier applied for near real-time detection and recognition of a wide variety of biological, geophysical, and man-made sound sources typically detected by the POAWRS system in the ocean.


2020 ◽  
Vol 13 (1) ◽  
pp. 65
Author(s):  
Jingtao Li ◽  
Yonglin Shen ◽  
Chao Yang

Due to the increasing demand for the monitoring of crop conditions and food production, it is a challenging and meaningful task to identify crops from remote sensing images. The state-of the-art crop classification models are mostly built on supervised classification models such as support vector machines (SVM), convolutional neural networks (CNN), and long- and short-term memory neural networks (LSTM). Meanwhile, as an unsupervised generative model, the adversarial generative network (GAN) is rarely used to complete classification tasks for agricultural applications. In this work, we propose a new method that combines GAN, CNN, and LSTM models to classify crops of corn and soybeans from remote sensing time-series images, in which GAN’s discriminator was used as the final classifier. The method is feasible on the condition that the training samples are small, and it fully takes advantage of spectral, spatial, and phenology features of crops from satellite data. The classification experiments were conducted on crops of corn, soybeans, and others. To verify the effectiveness of the proposed method, comparisons with models of SVM, SegNet, CNN, LSTM, and different combinations were also conducted. The results show that our method achieved the best classification results, with the Kappa coefficient of 0.7933 and overall accuracy of 0.86. Experiments in other study areas also demonstrate the extensibility of the proposed method.


Author(s):  
Hongguang Pan ◽  
Tao Su ◽  
Xiangdong Huang ◽  
Zheng Wang

To address problems of high cost, complicated process and low accuracy of oxygen content measurement in flue gas of coal-fired power plant, a method based on long short-term memory (LSTM) network is proposed in this paper to replace oxygen sensor to estimate oxygen content in flue gas of boilers. Specifically, first, the LSTM model was built with the Keras deep learning framework, and the accuracy of the model was further improved by selecting appropriate super-parameters through experiments. Secondly, the flue gas oxygen content, as the leading variable, was combined with the mechanism and boiler process primary auxiliary variables. Based on the actual production data collected from a coal-fired power plant in Yulin, China, the data sets were preprocessed. Moreover, a selection model of auxiliary variables based on grey relational analysis is proposed to construct a new data set and divide the training set and testing set. Finally, this model is compared with the traditional soft-sensing modelling methods (i.e. the methods based on support vector machine and BP neural network). The RMSE of LSTM model is 4.51% lower than that of GA-SVM model and 3.55% lower than that of PSO-BP model. The conclusion shows that the oxygen content model based on LSTM has better generalization and has certain industrial value.


Author(s):  
Chen Li ◽  
Junjun Zheng

Malicious software, called malware, can perform harmful actions on computer systems, which may cause economic damage and information leakage. Therefore, malware classification is meaningful and required to prevent malware attacks. Application programming interface (API) call sequences are easily observed and are good choices as features for malware classification. However, one of the main issues is how to generate a suitable feature for the algorithms of classification to achieve a high classification accuracy. Different malware sample brings API call sequence with different lengths, and these lengths may reach millions, which may cause computation cost and time complexities. Recurrent neural networks (RNNs) is one of the most versatile approaches to process time series data, which can be used to API call-based Malware calssification. In this paper, we propose a malware classification model with RNN, especially the long short-term memory (LSTM) and the gated recurrent unit (GRU), to classify variants of malware by using long-sequences of API calls. In numerical experiments, a benchmark dataset is used to illustrate the proposed approach and validate its accuracy. The numerical results show that the proposed RNN model works well on the malware classification.


2021 ◽  
Vol 27 (4) ◽  
pp. 230-245
Author(s):  
Chih-Chiang Wei

Strong wind during extreme weather conditions (e.g., strong winds during typhoons) is one of the natural factors that cause the collapse of frame-type scaffolds used in façade work. This study developed an alert system for use in determining whether the scaffold structure could withstand the stress of the wind force. Conceptually, the scaffolds collapsed by the warning system developed in the study contains three modules. The first module involves the establishment of wind velocity prediction models. This study employed various deep learning and machine learning techniques, namely deep neural networks, long short-term memory neural networks, support vector regressions, random forest, and k-nearest neighbors. Then, the second module contains the analysis of wind force on the scaffolds. The third module involves the development of the scaffold collapse evaluation approach. The study area was Taichung City, Taiwan. This study collected meteorological data from the ground stations from 2012 to 2019. Results revealed that the system successfully predicted the possible collapse time for scaffolds within 1 to 6 h, and effectively issued a warning time. Overall, the warning system can provide practical warning information related to the destruction of scaffolds to construction teams in need of the information to reduce the damage risk.


Author(s):  
M. Rußwurm ◽  
M. Körner

<i>Land cover classification (LCC)</i> is a central and wide field of research in earth observation and has already put forth a variety of classification techniques. Many approaches are based on classification techniques considering observation at certain points in time. However, some land cover classes, such as crops, change their spectral characteristics due to environmental influences and can thus not be monitored effectively with classical mono-temporal approaches. Nevertheless, these temporal observations should be utilized to benefit the classification process. After extensive research has been conducted on modeling temporal dynamics by spectro-temporal profiles using vegetation indices, we propose a deep learning approach to utilize these temporal characteristics for classification tasks. In this work, we show how <i>long short-term memory</i> (LSTM) neural networks can be employed for crop identification purposes with SENTINEL 2A observations from large study areas and label information provided by local authorities. We compare these temporal neural network models, <i>i.e.</i>, LSTM and <i>recurrent neural network</i> (RNN), with a classical non-temporal <i>convolutional neural network</i> (CNN) model and an additional <i>support vector machine</i> (SVM) baseline. With our rather straightforward LSTM variant, we exceeded state-of-the-art classification performance, thus opening promising potential for further research.


2019 ◽  
Vol 9 (8) ◽  
pp. 1687 ◽  
Author(s):  
Huafeng Qin ◽  
Peng Wang

Finger-vein biometrics has been extensively investigated for personal verification. A challenge is that the finger-vein acquisition is affected by many factors, which results in many ambiguous regions in the finger-vein image. Generally, the separability between vein and background is poor in such regions. Despite recent advances in finger-vein pattern segmentation, current solutions still lack the robustness to extract finger-vein features from raw images because they do not take into account the complex spatial dependencies of vein pattern. This paper proposes a deep learning model to extract vein features by combining the Convolutional Neural Networks (CNN) model and Long Short-Term Memory (LSTM) model. Firstly, we automatically assign the label based on a combination of known state of the art handcrafted finger-vein image segmentation techniques, and generate various sequences for each labeled pixel along different directions. Secondly, several Stacked Convolutional Neural Networks and Long Short-Term Memory (SCNN-LSTM) models are independently trained on the resulting sequences. The outputs of various SCNN-LSTMs form a complementary and over-complete representation and are conjointly put into Probabilistic Support Vector Machine (P-SVM) to predict the probability of each pixel of being foreground (i.e., vein pixel) given several sequences centered on it. Thirdly, we propose a supervised encoding scheme to extract the binary vein texture. A threshold is automatically computed by taking into account the maximal separation between the inter-class distance and the intra-class distance. In our approach, the CNN learns robust features for vein texture pattern representation and LSTM stores the complex spatial dependencies of vein patterns. So, the pixels in any region of a test image can then be classified effectively. In addition, the supervised information is employed to encode the vein patterns, so the resulting encoding images contain more discriminating features. The experimental results on one public finger-vein database show that the proposed approach significantly improves the finger-vein verification accuracy.


2013 ◽  
Vol 4 (4) ◽  
pp. 72-84 ◽  
Author(s):  
Salim Lahmiri ◽  
Mounir Boukadoum ◽  
Sylvain Chartier

The purpose of this paper is to present an automated system to classify financial data patterns as indicators of stock market future upward or downward moves. The classification system uses wavelet packet transform (WPT) for data decomposition and backpropagation neural networks (BPNN) for classification task. Its results are compared to those of a common classification system found in the literature which is based on ordinary wavelet transform (WT) and BPNN. In particular, the WPT is applied to the stock market data to obtain two categories of patterns: (i) approximation coefficients that represent major trend of the original data, and (ii) the residuals of the original data that capture its short-time variations. Therefore, those patterns are both complementary information used as inputs to classify stock market future shifts. For comparison purpose, BPNN and support vector machine (SVM) are separately used to classify patterns. Using S&P500 price index data, simulation results showed that both BPNN and SVM perform better with WPT extracted patterns (residuals and approximation coefficients) than standard approach based on WT approximation coefficients. In addition, BPNN outperform SVM. The WPT-NN based approach for financial data classification is more effective and promising than the standard approach adopted in the literature. The finding supports the adoption of the proposed classification system as an appropriate decision-making system in financial industry to classify financial data for forecasting purpose.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3226 ◽  
Author(s):  
Lingfeng Xu ◽  
Xiang Chen ◽  
Shuai Cao ◽  
Xu Zhang ◽  
Xun Chen

To find out the feasibility of different neural networks in sEMG-based force estimation, in this paper, three types of networks, namely convolutional neural network (CNN), long short-term memory (LSTM) network and their combination (C-LSTM) were applied to predict muscle force generated in static isometric elbow flexion across three different circumstances (multi-subject, subject-dependent and subject-independent). Eight healthy men were recruited for the experiments, and the results demonstrated that all the three models were applicable for force estimation, and LSTM and C-LSTM achieved better performances. Even under subject-independent situation, they maintained mean RMSE% of as low as 9.07 ± 1.29 and 8.67 ± 1.14. CNN turned out to be a worse choice, yielding a mean RMSE% of 12.13 ± 1.98. To our knowledge, this work was the first to employ CNN, LSTM and C-LSTM in sEMG-based force estimation, and the results not only prove the strength of the proposed networks, but also pointed out a potential way of achieving high accuracy in real-time, subject-independent force estimation.


Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.


Sign in / Sign up

Export Citation Format

Share Document