Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Mathew Ashik; A. Jyothish; S. Anandaram; P. Vinod; Francesco Mercaldo; Fabio Martinelli; Antonella Santone

doi:10.3390/electronics10141694

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models

Frontiers in Public Health ◽

10.3389/fpubh.2021.670352 ◽

2021 ◽

Vol 9 ◽

Author(s):

Ashwini K ◽

P. M. Durai Raj Vincent ◽

Kathiravan Srinivasan ◽

Chuan-Yu Chang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Feature Extraction ◽

Deep Learning ◽

Convolutional Neural Network ◽

Support Vector ◽

Svm Classifier ◽

Infant Cry ◽

Learning Techniques

Neonatal infants communicate with us through cries. The infant cry signals have distinct patterns depending on the purpose of the cries. Preprocessing, feature extraction, and feature selection need expert attention and take much effort in audio signals in recent days. In deep learning techniques, it automatically extracts and selects the most important features. For this, it requires an enormous amount of data for effective classification. This work mainly discriminates the neonatal cries into pain, hunger, and sleepiness. The neonatal cry auditory signals are transformed into a spectrogram image by utilizing the short-time Fourier transform (STFT) technique. The deep convolutional neural network (DCNN) technique takes the spectrogram images for input. The features are obtained from the convolutional neural network and are passed to the support vector machine (SVM) classifier. Machine learning technique classifies neonatal cries. This work combines the advantages of machine learning and deep learning techniques to get the best results even with a moderate number of data samples. The experimental result shows that CNN-based feature extraction and SVM classifier provides promising results. While comparing the SVM-based kernel techniques, namely radial basis function (RBF), linear and polynomial, it is found that SVM-RBF provides the highest accuracy of kernel-based infant cry classification system provides 88.89% accuracy.

Download Full-text

Comparison between Deep Learning and Tree-Based Machine Learning Approaches for Landslide Susceptibility Mapping

Water ◽

10.3390/w13192664 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2664

Author(s):

Sunil Saha ◽

Jagabandhu Roy ◽

Tusar Kanti Hembram ◽

Biswajeet Pradhan ◽

Abhirup Dikshit ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Landslide Susceptibility ◽

Learning Model ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Learning Approaches ◽

Statistical Measures ◽

Deep Learning Model

The efficiency of deep learning and tree-based machine learning approaches has gained immense popularity in various fields. One deep learning model viz. convolution neural network (CNN), artificial neural network (ANN) and four tree-based machine learning models, namely, alternative decision tree (ADTree), classification and regression tree (CART), functional tree and logistic model tree (LMT), were used for landslide susceptibility mapping in the East Sikkim Himalaya region of India, and the results were compared. Landslide areas were delimited and mapped as landslide inventory (LIM) after gathering information from historical records and periodic field investigations. In LIM, 91 landslides were plotted and classified into training (64 landslides) and testing (27 landslides) subsets randomly to train and validate the models. A total of 21 landslide conditioning factors (LCFs) were considered as model inputs, and the results of each model were categorised under five susceptibility classes. The receiver operating characteristics curve and 21 statistical measures were used to evaluate and prioritise the models. The CNN deep learning model achieved the priority rank 1 with area under the curve of 0.918 and 0.933 by using the training and testing data, quantifying 23.02% and 14.40% area as very high and highly susceptible followed by ANN, ADtree, CART, FTree and LMT models. This research might be useful in landslide studies, especially in locations with comparable geophysical and climatological characteristics, to aid in decision making for land use planning.

Download Full-text

A Machine Learning View on Momentum and Reversal Trading

Algorithms ◽

10.3390/a11110170 ◽

2018 ◽

Vol 11 (11) ◽

pp. 170 ◽

Cited By ~ 2

Author(s):

Zhixi Li ◽

Vincent Tam

Keyword(s):

Neural Network ◽

Machine Learning ◽

Stock Market ◽

Short Term Memory ◽

Predictive Ability ◽

Trading Strategies ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques

Momentum and reversal effects are important phenomena in stock markets. In academia, relevant studies have been conducted for years. Researchers have attempted to analyze these phenomena using statistical methods and to give some plausible explanations. However, those explanations are sometimes unconvincing. Furthermore, it is very difficult to transfer the findings of these studies to real-world investment trading strategies due to the lack of predictive ability. This paper represents the first attempt to adopt machine learning techniques for investigating the momentum and reversal effects occurring in any stock market. In the study, various machine learning techniques, including the Decision Tree (DT), Support Vector Machine (SVM), Multilayer Perceptron Neural Network (MLP), and Long Short-Term Memory Neural Network (LSTM) were explored and compared carefully. Several models built on these machine learning approaches were used to predict the momentum or reversal effect on the stock market of mainland China, thus allowing investors to build corresponding trading strategies. The experimental results demonstrated that these machine learning approaches, especially the SVM, are beneficial for capturing the relevant momentum and reversal effects, and possibly building profitable trading strategies. Moreover, we propose the corresponding trading strategies in terms of market states to acquire the best investment returns.

Download Full-text

A Study on Brain Tumor Detection and Segmentation Using Deep Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8468 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1925-1930

Author(s):

Ambeshwar Kumar ◽

R. Manikandan ◽

Robbi Rahim

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Brain Tumor ◽

Tumor Detection ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Convolution Algorithm ◽

The Brain

It’s a new era technology in the field of medical engineering giving awareness about the various healthcare features. Deep learning is a part of machine learning, it is capable of handling high dimensional data and is efficient in concentrating on the right features. Tumor is an unbelievably complex disease: a multifaceted cell has more than hundred billion cells; each cell acquires mutation exclusively. Detection of tumor particles in experiment is easily done by MRI or CT. Brain tumors can also be detected by MRI, however, deep learning techniques give a better approach to segment the brain tumor images. Deep Learning models are imprecisely encouraged by information handling and communication designs in biological nervous system. Classification plays an significant role in brain tumor detection. Neural network is creating a well-organized rule for classification. To accomplish medical image data, neural network is trained to use the Convolution algorithm. Multilayer perceptron is intended for identification of a image. In this study article, the brain images are categorized into two types: normal and abnormal. This article emphasize the importance of classification and feature selection approach for predicting the brain tumor. This classification is done by machine learning techniques like Artificial Neural Networks, Support Vector Machine and Deep Neural Network. It could be noted that more than one technique can be applied for the segmentation of tumor. The several samples of brain tumor images are classified using deep learning algorithms, convolution neural network and multi-layer perceptron.

Download Full-text

Brain Signal Classification Based on Deep CNN

International Journal of Security and Privacy in Pervasive Computing ◽

10.4018/ijsppc.2020040102 ◽

2020 ◽

Vol 12 (2) ◽

pp. 17-29

Author(s):

Terry Gao ◽

Grace Ying Wang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Mental Status ◽

Machine Learning Techniques ◽

Support Vector ◽

Imaging Features ◽

Learning Approaches ◽

Computationally Efficient ◽

Set Up ◽

Brain Data

It is essential to increase the accuracy and robustness of classification of brain data, including EEG, in order to facilitate a direct communication between the human brain and computerized devices. Different machine learning approaches, such as support vector machine (SVM), neural network, and linear discrimination analysis (LDA), have been applied to set up automatic subjective-classifier, and the findings for their capacities in this regard have been inconclusive. The present study developed an effective classifier for human mental status using deep learning in a convolutional neural network. In contrast to most previous studies commonly using EEG waveform or numeric value of brain signals for classification, the authors utilised imaging features generated from EEG data at alpha frequency band. A new model proposed in this study provides a simple and computationally efficient approach to distinguish mental status during resting. With training, this model could predict new 2D EEG images with above 90% accuracy, while traditional machine learning techniques failed to achieve this accuracy.

Download Full-text

Sentiment Analysis of Lithuanian Texts Using Traditional and Deep Learning Approaches

Computers ◽

10.3390/computers8010004 ◽

2019 ◽

Vol 8 (1) ◽

pp. 4 ◽

Cited By ~ 4

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Robertas Damaševičius ◽

Marcin Woźniak

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Full Dataset ◽

Learning Techniques ◽

Long Short Term Memory

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.

Download Full-text

PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

eLife ◽

10.7554/elife.70576 ◽

2021 ◽

Vol 10 ◽

Author(s):

Daniel Griffith ◽

Alex S Holehouse

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

High Throughput ◽

Recurrent Neural Network ◽

Transcriptional Activation ◽

Network Architecture ◽

Learning Approaches ◽

Large Protein ◽

Protein Datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Download Full-text

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Applied Sciences ◽

10.3390/app11188438 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8438

Author(s):

Muhammad Mujahid ◽

Ernesto Lee ◽

Furqan Rustam ◽

Patrick Bernard Washington ◽

Saleem Ullah ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Online Education ◽

Sentiment Analysis ◽

Topic Modeling ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

E Learning ◽

Machine Learning Models

Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 tweets about e-learning. Machine learning and deep learning approaches have shown their suitability, capability, and potential for image processing, object detection, and natural language processing tasks and text analysis is no exception. Machine learning approaches have been largely used both for annotation and text and sentiment analysis. Keeping in view the adequacy and efficacy of machine learning models, this study adopts TextBlob, VADER (Valence Aware Dictionary for Sentiment Reasoning), and SentiWordNet to analyze the polarity and subjectivity score of tweets’ text. Furthermore, bearing in mind the fact that machine learning models display high classification accuracy, various machine learning models have been used for sentiment classification. Two feature extraction techniques, TF-IDF (Term Frequency-Inverse Document Frequency) and BoW (Bag of Words) have been used to effectively build and evaluate the models. All the models have been evaluated in terms of various important performance metrics such as accuracy, precision, recall, and F1 score. The results reveal that the random forest and support vector machine classifier achieve the highest accuracy of 0.95 when used with Bow features. Performance comparison is carried out for results of TextBlob, VADER, and SentiWordNet, as well as classification results of machine learning models and deep learning models such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory), CNN-LSTM, and Bi-LSTM (Bidirectional-LSTM). Additionally, topic modeling is performed to find the problems associated with e-learning which indicates that uncertainty of campus opening date, children’s disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.

Download Full-text

Utilização de técnicas de Machine Learning e de Deep Learning para a predição de casos de internações causadas por dengue em municípios da Paraíba

10.5753/ercemapi.2021.17914 ◽

2021 ◽

Author(s):

Ewerthon Dyego de Araújo Batista ◽

Wellington Candeia de Araújo ◽

Romeryto Vieira Lira ◽

Laryssa Izabel de Araújo Batista

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Support Vector

Dengue é um problema de saúde pública no Brasil, os casos da doença voltaram a crescer na Paraíba. O boletim epidemiológico da Paraíba, divulgado em agosto de 2021, informa um aumento de 53% de casos em relação ao ano anterior. Técnicas de Machine Learning (ML) e de Deep Learning estão sendo utilizadas como ferramentas para a predição da doença e suporte ao seu combate. Por meio das técnicas Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Long ShortTerm Memory (LSTM) e Convolutional Neural Network (CNN), este artigo apresenta um sistema capaz de realizar previsões de internações causadas por dengue para as cidades Bayeux, Cabedelo, João Pessoa e Santa Rita. O sistema conseguiu realizar previsões para Bayeux com taxa de erro 0,5290, já em Cabedelo o erro foi 0,92742, João Pessoa 9,55288 e Santa Rita 0,74551.

Download Full-text