Network Intrusion Detection with Auto-encoder and One-class Support Vectors Machine

2022 ◽  
Vol 16 (1) ◽  
pp. 0-0

Recent advances in machine learning have shown promising results for detecting network intrusion through supervised machine learning. However, such techniques are ineffective for new types of attacks. In the preferred unsupervised and semi-supervised cases, these newer techniques suffer from lower accuracy and higher rates of false alarms. This work proposes a machine learning model that combines auto-encoder with one-class support vectors machine. In this model, the auto-encoders learn the representation of the input data in a latent space and reduces the dimensionality of the input data. The dimensionality-reduced input is then extracted from the auto-encoder and passed to a one-class support vectors machine to classify the network event as an attack or a normal event. The model is trained on normal network events only. The proposed model is then evaluated and compared with several existing models. It achieves high accuracy when tested on the NSL-KDD and KDD99 datasets, with total accuracies of 96.24% and 99.45%, respectively.

In this paper we propose a novel supervised machine learning model to predict the polarity of sentiments expressed in microblogs. The proposed model has a stacked neural network structure consisting of Long Short Term Memory (LSTM) and Convolutional Neural Network (CNN) layers. In order to capture the long-term dependencies of sentiments in the text ordering of a microblog, the proposed model employs an LSTM layer. The encodings produced by the LSTM layer are then fed to a CNN layer, which generates localized patterns of higher accuracy. These patterns are capable of capturing both local and global long-term dependences in the text of the microblogs. It was observed that the proposed model performs better and gives improved prediction accuracy when compared to semantic, machine learning and deep neural network approaches such as SVM, CNN, LSTM, CNN-LSTM, etc. This paper utilizes the benchmark Stanford Large Movie Review dataset to show the significance of the new approach. The prediction accuracy of the proposed approach is comparable to other state-of-art approaches.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Data is the most crucial component of a successful ML system. Once a machine learning model is developed, it gets obsolete over time due to presence of new input data being generated every second. In order to keep our predictions accurate we need to find a way to keep our models up to date. Our research work involves finding a mechanism which can retrain the model with new data automatically. This research also involves exploring the possibilities of automating machine learning processes. We started this project by training and testing our model using conventional machine learning methods. The outcome was then compared with the outcome of those experiments conducted using the AutoML methods like TPOT. This helped us in finding an efficient technique to retrain our models. These techniques can be used in areas where people do not deal with the actual working of a ML model but only require the outputs of ML processes


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 800 ◽  
Author(s):  
Irshad Khan ◽  
Seonhwa Choi ◽  
Young-Woo Kwon

Detecting earthquakes using smartphones or IoT devices in real-time is an arduous and challenging task, not only because it is constrained with the hard real-time issue but also due to the similarity of earthquake signals and the non-earthquake signals (i.e., noise or other activities). Moreover, the variety of human activities also makes it more difficult when a smartphone is used as an earthquake detecting sensor. To that end, in this article, we leverage a machine learning technique with earthquake features rather than traditional seismic methods. First, we split the detection task into two categories including static environment and dynamic environment. Then, we experimentally evaluate different features and propose the most appropriate machine learning model and features for the static environment to tackle the issue of noisy components and detect earthquakes in real-time with less false alarm rates. The experimental result of the proposed model shows promising results not only on the given dataset but also on the unseen data pointing to the generalization characteristics of the model. Finally, we demonstrate that the proposed model can be also used in the dynamic environment if it is trained with different dataset.


2020 ◽  
Vol 28 (4) ◽  
pp. 532-551
Author(s):  
Blake Miller ◽  
Fridolin Linder ◽  
Walter R. Mebane

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.


2020 ◽  
Author(s):  
Jihane Elyahyioui ◽  
Valentijn Pauwels ◽  
Edoardo Daly ◽  
Francois Petitjean ◽  
Mahesh Prakash

<p>Flooding is one of the most common and costly natural hazards at global scale. Flood models are important in supporting flood management. This is a computationally expensive process, due to the high nonlinearity of the equations involved and the complexity of the surface topography. New modelling approaches based on deep learning algorithms have recently emerged for multiple applications.</p><p>This study aims to investigate the capacity of machine learning to achieve spatio-temporal flood modelling. The combination of spatial and temporal input data to obtain dynamic results of water levels and flows from a machine learning model on multiple domains for applications in flood risk assessments has not been achieved yet. Here, we develop increasingly complex architectures aimed at interpreting the raw input data of precipitation and terrain to generate essential spatio-temporal variables (water level and velocity fields) and derived products (flood maps) by training these based on hydrodynamic simulations.</p><p>An extensive training dataset is generated by solving the 2D shallow water equations on simplified topographies using Lisflood-FP.</p><p>As a first task, the machine learning model is trained to reproduce the maximum water depth, using as inputs the precipitation time series and the topographic grid. The models combine the spatial and temporal information through a combination of 1D and 2D convolutional layers, pooling, merging and upscaling. Multiple variations of this generic architecture are trained to determine the best one(s). Overall, the trained models return good results regarding performance indices (mean squared error, mean absolute error and classification accuracy) but fail at predicting the maximum water depths with sufficient precision for practical applications.</p><p>A major limitation of this approach is the availability of training examples. As a second task, models will be trained to bring the state of the system (spatially distributed water depth and velocity) from one time step to the next, based on the same inputs as previously, generating the full solution equivalent to that of a hydrodynamic solver. The training database becomes much larger as each pair of consecutive time steps constitutes one training example.</p><p>Assuming that a reliable model can be built and trained, such methodology could be applied to build models that are faster and less computationally demanding than hydrodynamic models. Indeed, in with the synthetic cases shown here, the simulation times of the machine learning models (< seconds) are far shorter than those of the hydrodynamic model (a few minutes at least). These data-driven models could be used for interpolation and forecasting. The potential for extrapolation beyond the range of training datasets will also be investigated (different topography and high intensity precipitation events). </p>


2021 ◽  
Vol 11 (21) ◽  
pp. 9797
Author(s):  
Solaf A. Hussain ◽  
Nadire Cavus ◽  
Boran Sekeroglu

Obesity or excessive body fat causes multiple health problems and diseases. However, obesity treatment and control need an accurate determination of body fat percentage (BFP). The existing methods for BFP estimation require several procedures, which reduces their cost-effectivity and generalization. Therefore, developing cost-effective models for BFP estimation is vital for obesity treatment. Machine learning models, particularly hybrid models, have a strong ability to analyze challenging data and perform predictions by combining different characteristics of the models. This study proposed a hybrid machine learning model based on support vector regression and emotional artificial neural networks (SVR-EANNs) for accurate recent BFP prediction using a primary BFP dataset. SVR was applied as a consistent attribute selection model on seven properties and measurements, using the left-out sensitivity analysis, and the regression ability of the EANN was considered in the prediction phase. The proposed model was compared to seven benchmark machine learning models. The obtained results show that the proposed hybrid model (SVR-EANN) outperformed other machine learning models by achieving superior results in the three considered evaluation metrics. Furthermore, the proposed model suggested that abdominal circumference is a significant factor in BFP prediction, while age has a minor effect.


Sign in / Sign up

Export Citation Format

Share Document