A Random Fourier Features based Streaming Algorithm for Anomaly Detection in Large Datasets

Author(s):  
Deena P. Francis ◽  
Kumudha Raimond
Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1164
Author(s):  
João Henriques ◽  
Filipe Caldeira ◽  
Tiago Cruz ◽  
Paulo Simões

Computing and networking systems traditionally record their activity in log files, which have been used for multiple purposes, such as troubleshooting, accounting, post-incident analysis of security breaches, capacity planning and anomaly detection. In earlier systems those log files were processed manually by system administrators, or with the support of basic applications for filtering, compiling and pre-processing the logs for specific purposes. However, as the volume of these log files continues to grow (more logs per system, more systems per domain), it is becoming increasingly difficult to process those logs using traditional tools, especially for less straightforward purposes such as anomaly detection. On the other hand, as systems continue to become more complex, the potential of using large datasets built of logs from heterogeneous sources for detecting anomalies without prior domain knowledge becomes higher. Anomaly detection tools for such scenarios face two challenges. First, devising appropriate data analysis solutions for effectively detecting anomalies from large data sources, possibly without prior domain knowledge. Second, adopting data processing platforms able to cope with the large datasets and complex data analysis algorithms required for such purposes. In this paper we address those challenges by proposing an integrated scalable framework that aims at efficiently detecting anomalous events on large amounts of unlabeled data logs. Detection is supported by clustering and classification methods that take advantage of parallel computing environments. We validate our approach using the the well known NASA Hypertext Transfer Protocol (HTTP) logs datasets. Fourteen features were extracted in order to train a k-means model for separating anomalous and normal events in highly coherent clusters. A second model, making use of the XGBoost system implementing a gradient tree boosting algorithm, uses the previous binary clustered data for producing a set of simple interpretable rules. These rules represent the rationale for generalizing its application over a massive number of unseen events in a distributed computing environment. The classified anomaly events produced by our framework can be used, for instance, as candidates for further forensic and compliance auditing analysis in security management.


Proceedings ◽  
2020 ◽  
Vol 54 (1) ◽  
pp. 7
Author(s):  
Iñigo López-Riobóo Botana ◽  
Carlos Eiras-Franco ◽  
Amparo Alonso-Betanzos

This work presents EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), a novel approach to address explanation using an anomaly detection algorithm, ADMNC, which provides accurate detections on mixed numerical and categorical input spaces. Our improved algorithm leverages the formulation of the ADMNC model to offer pre-hoc explainability based on CART (Classification and Regression Trees). The explanation is presented as a segmentation of the input data into homogeneous groups that can be described with a few variables, offering supervisors novel information for justifications. To prove scalability and interpretability, we list experimental results on real-world large datasets focusing on network intrusion detection domain.


2014 ◽  
Vol 10 (S306) ◽  
pp. 124-130 ◽  
Author(s):  
Hiranya V. Peiris

AbstractAnomalies drive scientific discovery – they are associated with the cutting edge of the research frontier, and thus typically exploit data in the low signal-to-noise regime. In astronomy, the prevalence of systematics –- both “known unknowns” and “unknown unknowns” – combined with increasingly large datasets, the widespread use of ad hoc estimators for anomaly detection, and the “look-elsewhere” effect, can lead to spurious false detections. In this informal note, I argue that anomaly detection leading to discoveries of new physics requires a combination of physical understanding, careful experimental design to avoid confirmation bias, and self-consistent statistical methods. These points are illustrated with several concrete examples from cosmology.


Nowadays, the internet and network service user’s counts are increasing and the data generation speed also very high. Then again, we see greater security dangers on the internet, enterprise network, websites and the network. Anomaly has been known as one of the effective cyber threats over the internet which increasing exponentially and thus overcomes the commonly used approaches for anomaly detection and classification. Anomaly detection is used in big data analytics to recognize the unexpected behaviour. The most commonly used characteristics in network environment are size and dimensionality, which are big datasets and also impose problems in recognizing useful patterns, For example, to identify the network traffic anomalies from the large datasets. Due to the enormous increase of computer network based facilities it is a challenge to perform fast and efficient anomaly detection. The anomaly recognition in big data sets is more useful to discover fraud and abnormal action. Here, we mainly focus on the problems regarding anomaly detection, so we introduce a novel machine learning based anomaly detection technique. Machine learning approach is used to enhance the anomaly detection speed which is very much useful to detect the anomaly from the large datasets. We evaluate the proposed framework by performing experiments with larger data sets and compare to several existing techniques such as fuzzy, SVM (Support Vector Machine) and PSO (Particle swarm optimization). It has shown 98% percentage of accuracy and the false rate of 0.002 % on proposed classifier. The experimental results illuminate that better performance than existing anomaly detection techniques in big data environment.


2018 ◽  
Vol 18 (1) ◽  
pp. 20-32 ◽  
Author(s):  
Jong-Min Kim ◽  
Jaiwook Baik

2016 ◽  
Vol 136 (3) ◽  
pp. 363-372
Author(s):  
Takaaki Nakamura ◽  
Makoto Imamura ◽  
Masashi Tatedoko ◽  
Norio Hirai

Sign in / Sign up

Export Citation Format

Share Document