scholarly journals Infrequent Pattern Detection for Reliable Network Traffic Analysis Using Robust Evolutionary Computation

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3005
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Al-Sakib Khan Pathan

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS.

Author(s):  
Catherine Cheung ◽  
Julio J. Valdés ◽  
Richard Salas Chavez ◽  
Srishti Sehgal

In this work, the sensor data related to a diesel engine system and specifically its turbocharger subsystem were analyzed. An incident where the turbocharger seized was recorded by dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. The leader clustering algorithm was also implemented to reduce the amount of data to train and develop the models. This paper describes the results of this modeling process, validated by testing on healthy data from the same propulsion system and a second distinct one. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine (SVM) and Random Forest (RF), performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.


Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2451 ◽  
Author(s):  
Mohsin Munir ◽  
Shoaib Ahmed Siddiqui ◽  
Muhammad Ali Chattha ◽  
Andreas Dengel ◽  
Sheraz Ahmed

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.


Anomaly detection has vital role in data preprocessing and also in the mining of outstanding points for marketing, network sensors, fraud detection, intrusion detection, stock market analysis. Recent studies have been found to concentrate more on outlier detection for real time datasets. Anomaly detection study is at present focuses on the expansion of innovative machine learning methods and on enhancing the computation time. Sentiment mining is the process to discover how people feel about a particular topic. Though many anomaly detection techniques have been proposed, it is also notable that the research focus lacks a comparative performance evaluation in sentiment mining datasets. In this study, three popular unsupervised anomaly detection algorithms such as density based, statistical based and cluster based anomaly detection methods are evaluated on movie review sentiment mining dataset. This paper will set a base for anomaly detection methods in sentiment mining research. The results show that density based (LOF) anomaly detection method suits best for the movie review sentiment dataset.


2010 ◽  
Vol E93-D (9) ◽  
pp. 2544-2554 ◽  
Author(s):  
Jungsuk SONG ◽  
Hiroki TAKAKURA ◽  
Yasuo OKABE ◽  
Daisuke INOUE ◽  
Masashi ETO ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Jiazhong Lu ◽  
Fengmao Lv ◽  
Zhongliu Zhuo ◽  
Xiaosong Zhang ◽  
Xiaolei Liu ◽  
...  

Advanced cyberattacks are often featured by multiple types, layers, and stages, with the goal of cheating the monitors. Existing anomaly detection systems usually search logs or traffics alone for evidence of attacks but ignore further analysis about attack processes. For instance, the traffic detection methods can only detect the attack flows roughly but fail to reconstruct the attack event process and reveal the current network node status. As a result, they cannot fully model the complex multistage attack. To address these problems, we present Traffic-Log Combined Detection (TLCD), which is a multistage intrusion analysis system. Inspired by multiplatform intrusion detection techniques, we integrate traffics with network device logs through association rules. TLCD correlates log data with traffic characteristics to reflect the attack process and construct a federated detection platform. Specifically, TLCD can discover the process steps of a cyberattack attack, reflect the current network status, and reveal the behaviors of normal users. Our experimental results over different cyberattacks demonstrate that TLCD works well with high accuracy and low false positive rate.


Wireless networks are continuously facing challenges in the field of Information Security. This leads to major researches in the area of Intrusion detection. The working of Intrusion detection is performed mainly by signature based detection and anomaly based detection. Anomaly based detection is based on the behavior of the network. One of the major challenge in this domain is to identify and detect the malicious node in wireless networks. The intrusion detection mechanism has to analyse the behavior of the node in the network by means of the several features possessed by each node. Intelligent schemes are the need of the hour in such scenario. This paper has taken a standard dataset for studying the features of the wireless node and reduced the features by applying the most efficient Correlation Attribute feature selection method. The machine learning algorithms are applied to obtain an effective training model which is then applied on the testing dataset to validate the model. The accuracy of the model is determined by the performance parameters such as true positive rate, false positive rate and ROC area. Neural network, bagging and decision tree algorithm RepTree are giving promising results in comparison with other classification algorithms.


Author(s):  
Ravinder Ahuja ◽  
Vishal Vivek ◽  
Manika Chandna ◽  
Shivani Virmani ◽  
Alisha Banga

An early diagnosis of insomnia can prevent further medical aids such as anger issues, heart diseases, anxiety, depression, and hypertension. Fifteen machine learning algorithms have been applied and 14 leading factors have been taken into consideration for predicting insomnia. Seven performance parameters (accuracy, kappa, the true positive rate, false positive rate, precision, f-measure, and AUC) are used and for implementation. The authors have used python language. The support vector machine is giving higher performance out of all algorithms giving accuracy 91.6%, f-measure is 92.13, and kappa is 0.83. Further, SVM is applied on another dataset of 100 patients and giving accuracy 92%. In addition, an analysis of the variable importance of CART, C5.0, decision tree, random forest, adaptive boost, and XG boost is calculated. The analysis shows that insomnia primarily depends on the factors, which are the vision problem, mobility problem, and sleep disorder. This chapter mainly finds the usages and effectiveness of machine learning algorithms in Insomnia diseases prediction.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Kirsi Varpa ◽  
Kati Iltanen ◽  
Martti Juhola

Genetic algorithms have been utilized in many complex optimization and simulation tasks because of their powerful search method. In this research we studied whether the classification performance of the attribute weighted methods based on the nearest neighbour search can be improved when using the genetic algorithm in the evolution of attribute weighting. The attribute weights in the starting population were based on the weights set by the application area experts and machine learning methods instead of random weight setting. The genetic algorithm improved the total classification accuracy and the median true positive rate of the attribute weighted k-nearest neighbour method using neighbour’s class-based attribute weighting. With other methods, the changes after genetic algorithm were moderate.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 194
Author(s):  
Álvaro García Faura ◽  
Dejan Štepec ◽  
Matija Cankar ◽  
Miha Humar

Wood is considered one of the most important construction materials, as well as a natural material prone to degradation, with fungi being the main reason for wood failure in a temperate climate. Visual inspection of wood or other approaches for monitoring are time-consuming, and the incipient stages of decay are not always visible. Thus, visual decay detection and such manual monitoring could be replaced by automated real-time monitoring systems. The capabilities of such systems can range from simple monitoring, periodically reporting data, to the automatic detection of anomalous measurements that may happen due to various environmental or technical reasons. In this paper, we explore the application of Unsupervised Anomaly Detection (UAD) techniques to wood Moisture Content (MC) data. Specifically, data were obtained from a wood construction that was monitored for four years using sensors at different positions. Our experimental results prove the validity of these techniques to detect both artificial and real anomalies in MC signals, encouraging further research to enable their deployment in real use cases.


Sign in / Sign up

Export Citation Format

Share Document