scholarly journals Detecting Portable Executable Malware by Binary Code Using an Artificial Evolutionary Fuzzy LSTM Immune System

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jian Jiang ◽  
Fen Zhang

As the planet watches in shock the evolution of the COVID-19 pandemic, new forms of sophisticated, versatile, and extremely difficult-to-detect malware expose society and especially the global economy. Machine learning techniques are posing an increasingly important role in the field of malware identification and analysis. However, due to the complexity of the problem, the training of intelligent systems proves to be insufficient in recognizing advanced cyberthreats. The biggest challenge in information systems security using machine learning methods is to understand the polymorphism and metamorphism mechanisms used by malware developers and how to effectively address them. This work presents an innovative Artificial Evolutionary Fuzzy LSTM Immune System which, by using a heuristic machine learning method that combines evolutionary intelligence, Long-Short-Term Memory (LSTM), and fuzzy knowledge, proves to be able to adequately protect modern information system from Portable Executable Malware. The main innovation in the technical implementation of the proposed approach is the fact that the machine learning system can only be trained from raw bytes of an executable file to determine if the file is malicious. The performance of the proposed system was tested on a sophisticated dataset of high complexity, which emerged after extensive research on PE malware that offered us a realistic representation of their operating states. The high accuracy of the developed model significantly supports the validity of the proposed method. The final evaluation was carried out with in-depth comparisons to corresponding machine learning algorithms and it has revealed the superiority of the proposed immune system.

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Sumit Kumar ◽  
Sanlap Acharya

The prediction of stock prices has always been a very challenging problem for investors. Using machine learning techniques to predict stock prices is also one of the favourite topics for academics working in this domain. This chapter discusses five supervised learning techniques and two unsupervised learning techniques to solve the problem of stock price prediction and has compared the performances of all the algorithms. Among the supervised learning techniques, Long Short-Term Memory (LSTM) algorithm performed better than the others whereas, among the unsupervised learning techniques, Restricted Boltzmann Machine (RBM) performed better. RBM is found to be performing even better than LSTM.


2020 ◽  
Author(s):  
Victoria Da Poian ◽  
Eric Lyness ◽  
Melissa Trainer ◽  
Xiang Li ◽  
William Brinckerhoff ◽  
...  

<div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>The majority of planetary missions return only one thing: data. The volume of data returned from distant planets is typically minuscule compared to Earth-based investigations, volume decreasing further from more distant solar system missions. Meanwhile, the data produced by planetary science instruments continue to grow along with mission ambitions. Moreover, the time required for decisional data to reach science and operations teams on Earth, and for commands to be sent, also increases with distance. To maximize the value of each bit, within these mission time and volume constraints, instruments need to be selective about what they send back to Earth. We envision instruments that analyze science data onboard, such that they can adjust and tune themselves, select the next operations to be run without requiring ground-in-the-loop, and transmit home only the most interesting or time-critical data.</p> <p>Recent developments have demonstrated the tremendous potential of robotic explorers for planetary exploration and for other extreme environments. We believe that science autonomy has the potential to be as important as robotic autonomy (e.g., roving terrain) in improving the science potential of these missions because it directly optimizes the returned data. On- board science data processing, interpretation, and reaction, as well as prioritization of telemetry, therefore, comprise new, critical challenges of mission design.</p> <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>We present a first step toward this vision: a machine learning (ML) approach for analyzing science data from the Mars Organic Molecule Analyzer (MOMA) instrument, which will land on Mars within the ExoMars rover Rosalind Franklin in 2023. MOMA is a dual-source (laser desorption and gas chromatograph) mass spectrometer that will search for past or present life on the Martian surface and subsurface through analysis of soil samples. We use data collected from the MOMA flight-like engineering model to develop mass-spectrometry- focused machine learning techniques. We first apply unsupervised algorithms in order to cluster input data based on inherent patterns and separate the bulk data into clusters. Then, optimized classification algorithms designed for MOMA’s scientific goals provide information to the scientists about the likely content of the sample. This will help the scientists with their analysis of the sample and decision-making process regarding subsequent operations.</p> <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>We used MOMA data to develop initial machine learning algorithms and strategies as a proof of concept and to design software to support intelligent operations of more autonomous systems in development for future exploratory missions. This data characterization and categorization is the first step of a longer-term objective to enable the spacecraft and instruments themselves to make real-time adjustments during operations, thus optimizing the potentially complex search for life in our solar system and beyond.</p> <p> </p> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div> </div>


The stock market has been one of the primary revenue streams for many for years. The stock market is often incalculable and uncertain; therefore predicting the ups and downs of the stock market is an uphill task even for the financial experts, which they been trying to tackle without any little success. But it is now possible to predict stock markets due to rapid improvement in technology which led to better processing speed and more accurate algorithms. It is necessary to forswear the misconception that prediction of stock market is only meant for people who have expertise in finance; hence an application can be developed to guide the user about the tempo of the stock market and risk associated with it.The prediction of prices in stock market is a complicated task, and there are various techniques that are used to solve the problem, this paper investigates some of these techniques and compares the accuracy of each of the methods. Forecasting the time series data is important topic in many economics, statistics, finance and business. Of the many techniques in forecasting time series data such as the Autoregressive, Moving Average, and the Autoregressive Integrated Moving Average, it is the Autoregressive Integrated Moving Average that has higher accuracy and higher precision than other methods. And with recent advancement in computational power of processors and advancement in knowledge of machine learning techniques and deep learning, new algorithms could be made to tackle the problem of predicting the stock market. This paper investigates one of such machine learning algorithms to forecast time series data such as Long Short Term Memory. It is compared with traditional algorithms such as the ARIMA method, to determine how superior the LSTM is compared to the traditional methods for predicting the stock market.


Author(s):  
Sangeetha Rajesh ◽  
N. J. Nalini

Singer identification is a challenging task in music information retrieval because of the combined instrumental music with the singing voice. The previous approaches focus on identification of singers based on individual features extracted from the music clips. The objective of this work is to combine Mel Frequency Cepstral Coefficients (MFCC) and Chroma DCT-reduced Pitch (CRP) features for singer identification system (SID) using machine learning techniques. The proposed system has mainly two phases. In the feature extraction phase, MFCC, [Formula: see text]MFCC, [Formula: see text]MFCC and CRP features are extracted from the music clips. In the identification phase, extracted features are trained with Bidirectional Long Short-Term Memory (BLSTM)-based Recurrent Neural Networks (RNN) and Convolution Neural Networks (CNN) and tested to identify different singer classes. The identification accuracy and Equal Error Rate (EER) are used as performance measures. Further, the experiments also demonstrate the effectiveness of score level fusion of MFCC and CRP feature in the singer identification system. Also, the experimental results are compared with the baseline system using support vector machines (SVM).


Author(s):  
Georgios Alexandridis ◽  
John Aliprantis ◽  
Konstantinos Michalakis ◽  
Konstantinos Korovesis ◽  
Panagiotis Tsantilas ◽  
...  

The task of sentiment analysis tries to predict the affective state of a document by examining its content and metadata through the application of machine learning techniques. Recent advances in the field consider sentiment to be a multi-dimensional quantity that pertains to different interpretations (or aspects), rather than a single one. Based on earlier research, the current work examines the said task in the framework of a larger architecture that crawls documents from various online sources. Subsequently, the collected data are pre-processed, in order to extract useful features that assist the machine learning algorithms in the sentiment analysis task. More specifically, the words that comprise each text are mapped to a neural embedding space and are provided to a hybrid, bi-directional long short-term memory network, coupled with convolutional layers and an attention mechanism that outputs the final textual features. Additionally, a number of document metadata are extracted, including the number of a document’s repetitions in the collected corpus (i.e. number of reposts/retweets), the frequency and type of emoji ideograms and the presence of keywords, either extracted automatically or assigned manually, in the form of hashtags. The novelty of the proposed approach lies in the semantic annotation of the retrieved keywords, since an ontology-based knowledge management system is queried, with the purpose of retrieving the classes the aforementioned keywords belong to. Finally, all features are provided to a fully connected, multi-layered, feed-forward artificial neural network that performs the analysis task. The overall architecture is compared, on a manually collected corpus of documents, with two other state-of-the-art approaches, achieving optimal results in identifying negative sentiment, which is of particular interest to certain parties (like for example, companies) that are interested in measuring their online reputation.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 528
Author(s):  
David Opeoluwa Oyewola ◽  
Emmanuel Gbenga Dada ◽  
Sanjay Misra ◽  
Robertas Damaševičius

The application of machine learning techniques to the epidemiology of COVID-19 is a necessary measure that can be exploited to curtail the further spread of this endemic. Conventional techniques used to determine the epidemiology of COVID-19 are slow and costly, and data are scarce. We investigate the effects of noise filters on the performance of machine learning algorithms on the COVID-19 epidemiology dataset. Noise filter algorithms are used to remove noise from the datasets utilized in this study. We applied nine machine learning techniques to classify the epidemiology of COVID-19, which are bagging, boosting, support vector machine, bidirectional long short-term memory, decision tree, naïve Bayes, k-nearest neighbor, random forest, and multinomial logistic regression. Data from patients who contracted coronavirus disease were collected from the Kaggle database between 23 January 2020 and 24 June 2020. Noisy and filtered data were used in our experiments. As a result of denoising, machine learning models have produced high results for the prediction of COVID-19 cases in South Korea. For isolated cases after performing noise filtering operations, machine learning techniques achieved an accuracy between 98–100%. The results indicate that filtering noise from the dataset can improve the accuracy of COVID-19 case prediction algorithms.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2020 ◽  
Author(s):  
Joseph Prinable ◽  
Peter Jones ◽  
David Boland ◽  
Alistair McEwan ◽  
Cindy Thamrin

BACKGROUND The ability to continuously monitor breathing metrics may have indications for general health as well as respiratory conditions such as asthma. However, few studies have focused on breathing due to a lack of available wearable technologies. OBJECTIVE Examine the performance of two machine learning algorithms in extracting breathing metrics from a finger-based pulse oximeter, which is amenable to long-term monitoring. METHODS Pulse oximetry data was collected from 11 healthy and 11 asthma subjects who breathed at a range of controlled respiratory rates. UNET and Long Short-Term memory (LSTM) algorithms were applied to the data, and results compared against breathing metrics derived from respiratory inductance plethysmography measured simultaneously as a reference. RESULTS The UNET vs LSTM model provided breathing metrics which were strongly correlated with those from the reference signal (all p<0.001, except for inspiratory:expiratory ratio). The following relative mean bias(95% confidence interval) were observed: inspiration time 1.89(-52.95, 56.74)% vs 1.30(-52.15, 54.74)%, expiration time -3.70(-55.21, 47.80)% vs -4.97(-56.84, 46.89)%, inspiratory:expiratory ratio -4.65(-87.18, 77.88)% vs -5.30(-87.07, 76.47)%, inter-breath intervals -2.39(-32.76, 27.97)% vs -3.16(-33.69, 27.36)%, and respiratory rate 2.99(-27.04 to 33.02)% vs 3.69(-27.17 to 34.56)%. CONCLUSIONS Both machine learning models show strongly correlation and good comparability with reference, with low bias though wide variability for deriving breathing metrics in asthma and health cohorts. Future efforts should focus on improvement of performance of these models, e.g. by increasing the size of the training dataset at the lower breathing rates. CLINICALTRIAL Sydney Local Health District Human Research Ethics Committee (#LNR\16\HAWKE99 ethics approval).


Sign in / Sign up

Export Citation Format

Share Document