Robust anomaly detection algorithms for real-time big data: Comparison of algorithms

Összefoglalás. A mesterséges intelligencia az elmúlt években hatalmas fejlődésen ment keresztül, melynek köszönhetően ma már rengeteg különböző szakterületen megtalálható valamilyen formában, rengeteg kutatás szerves részévé vált. Ez leginkább az egyre inkább fejlődő tanulóalgoritmusoknak, illetve a Big Data környezetnek köszönhető, mely óriási mennyiségű tanítóadatot képes szolgáltatni. A cikk célja, hogy összefoglalja a technológia jelenlegi állapotát. Ismertetésre kerül a mesterséges intelligencia történelme, az alkalmazási területek egy nagyobb része, melyek központi eleme a mesterséges intelligencia. Ezek mellett rámutat a mesterséges intelligencia különböző biztonsági réseire, illetve a kiberbiztonság területén való felhasználhatóságra. A cikk a jelenlegi mesterséges intelligencia alkalmazások egy szeletét mutatja be, melyek jól illusztrálják a széles felhasználási területet. Summary. In the past years artificial intelligence has seen several improvements, which drove its usage to grow in various different areas and became the focus of many researches. This can be attributed to improvements made in the learning algorithms and Big Data techniques, which can provide tremendous amount of training. The goal of this paper is to summarize the current state of artificial intelligence. We present its history, introduce the terminology used, and show technological areas using artificial intelligence as a core part of their applications. The paper also introduces the security concerns related to artificial intelligence solutions but also highlights how the technology can be used to enhance security in different applications. Finally, we present future opportunities and possible improvements. The paper shows some general artificial intelligence applications that demonstrate the wide range usage of the technology. Many applications are built around artificial intelligence technologies and there are many services that a developer can use to achieve intelligent behavior. The foundation of different approaches is a well-designed learning algorithm, while the key to every learning algorithm is the quality of the data set that is used during the learning phase. There are applications that focus on image processing like face detection or other gesture detection to identify a person. Other solutions compare signatures while others are for object or plate number detection (for example the automatic parking system of an office building). Artificial intelligence and accurate data handling can be also used for anomaly detection in a real time system. For example, there are ongoing researches for anomaly detection at the ZalaZone autonomous car test field based on the collected sensor data. There are also more general applications like user profiling and automatic content recommendation by using behavior analysis techniques. However, the artificial intelligence technology also has security risks needed to be eliminated before applying an application publicly. One concern is the generation of fake contents. These must be detected with other algorithms that focus on small but noticeable differences. It is also essential to protect the data which is used by the learning algorithm and protect the logic flow of the solution. Network security can help to protect these applications. Artificial intelligence can also help strengthen the security of a solution as it is able to detect network anomalies and signs of a security issue. Therefore, the technology is widely used in IT security to prevent different type of attacks. As different BigData technologies, computational power, and storage capacity increase over time, there is space for improved artificial intelligence solution that can learn from large and real time data sets. The advancements in sensors can also help to give more precise data for different solutions. Finally, advanced natural language processing can help with communication between humans and computer based solutions.

Download Full-text

FAAD: A Self-Optimizing Algorithm for Anomaly Detection

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/2/16 ◽

2019 ◽

Vol 17 (2) ◽

pp. 272-280

Author(s):

Adeel Hashmi ◽

Tanvir Ahmad

Keyword(s):

Big Data ◽

Anomaly Detection ◽

Unsupervised Learning ◽

Outlier Detection ◽

Data Stream ◽

Firefly Algorithm ◽

Learning Approach ◽

Detection Algorithms ◽

Data Points ◽

Optimizing Algorithm

Anomaly/Outlier detection is the process of finding abnormal data points in a dataset or data stream. Most of the anomaly detection algorithms require setting of some parameters which significantly affect the performance of the algorithm. These parameters are generally set by hit-and-trial; hence performance is compromised with default or random values. In this paper, the authors propose a self-optimizing algorithm for anomaly detection based on firefly meta-heuristic, and named as Firefly Algorithm for Anomaly Detection (FAAD). The proposed solution is a non-clustering unsupervised learning approach for anomaly detection. The algorithm is implemented on Apache Spark for scalability and hence the solution can handle big data as well. Experiments were conducted on various datasets, and the results show that the proposed solution is much accurate than the standard algorithms of anomaly detection.

Download Full-text

Building cyber resilience in space assets with real-time autonomous Graph Database Anomaly Detection Algorithms

ASCEND 2020 ◽

10.2514/6.2020-4113 ◽

2020 ◽

Author(s):

Sam Adhikari

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Graph Database ◽

Detection Algorithms ◽

Time Autonomous

Download Full-text

Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2015.141 ◽

2015 ◽

Cited By ~ 66

Author(s):

Alexander Lavin ◽

Subutai Ahmad

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Detection Algorithms

Download Full-text

BRNADS: Big data real-time node anomaly detection in social networks

2018 2nd International Conference on Inventive Systems and Control (ICISC) ◽

10.1109/icisc.2018.8398937 ◽

2018 ◽

Author(s):

H. C. Manjunatha ◽

R Mohanasundaram

Keyword(s):

Social Networks ◽

Big Data ◽

Anomaly Detection ◽

Real Time

Download Full-text

Multi Comm_Plus: A Community Detection System for Identification of Community in Multi-Dimensional Networks

International Journal of Innovative Research in Computer and Communication Engineering ◽

10.15680/10.15680/ijircce.2016.0405255 ◽

2016 ◽

Vol 4 (05) ◽

pp. 9879-9884

Author(s):

Dhanya Sudhakaran ◽

Shini Renjith

Keyword(s):

Big Data ◽

Real Time ◽

Community Detection ◽

Data Analytics ◽

Selection Criteria ◽

Large Scale ◽

Detection System ◽

Big Data Analytics ◽

Detection Algorithms ◽

Large Scale Networks

Community detection is a common problem in graph and big data analytics. It consists of finding groups of densely connected nodes with few connections to nodes outside of the group. In particular, identifying communities in large-scale networks is an important task in many scientific domains. Community detection algorithms in literature proves to be less efficient, as it leads to generation of communities with noisy interactions. To address this limitation, there is a need to develop a system which identifies the best community among multi-dimensional networks based on relevant selection criteria and dimensionality of entities, thereby eliminating the noisy interactions in a real-time environment.

Download Full-text

A Real-time Anomaly Detection Algorithm for Taxis Based on Trajectory Big Data

Proceedings of the 3rd International Conference on Data Science and Information Technology ◽

10.1145/3414274.3414511 ◽

2020 ◽

Author(s):

Jiahui Zhu ◽

Yuepeng Chen ◽

Qingwen Fu ◽

Jiawen Zhang

Keyword(s):

Big Data ◽

Anomaly Detection ◽

Real Time ◽

Detection Algorithm

Download Full-text

Anomaly detection for machinery by using Big Data Real-Time processing and clustering technique

Proceedings of the 2019 3rd International Conference on Big Data Research ◽

10.1145/3372454.3372480 ◽

2019 ◽

Author(s):

Zhuo Wang ◽

Yanghui Zhou ◽

Gangmin Li

Keyword(s):

Big Data ◽

Anomaly Detection ◽

Real Time ◽

Real Time Processing ◽

Time Processing ◽

Clustering Technique

Download Full-text

Unsupervised Network Anomaly Detection in Real-Time on Big Data

Communications in Computer and Information Science - New Trends in Databases and Information Systems ◽

10.1007/978-3-319-23201-0_22 ◽

2015 ◽

pp. 197-206 ◽

Cited By ~ 7

Author(s):

Juliette Dromard ◽

Gilles Roudière ◽

Philippe Owezarski

Keyword(s):

Big Data ◽

Anomaly Detection ◽

Real Time ◽

Network Anomaly Detection

Download Full-text

Performance Comparison of Anomaly Detection Algorithms for Streaming Data

10.21203/rs.3.rs-28521/v1 ◽

2020 ◽

Author(s):

Zirije Hasani ◽

Jakup Fondaj

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Time Series Data ◽

Scale Up ◽

Detection Algorithm ◽

Streaming Data ◽

Series Data ◽

Process Data ◽

Significant Information ◽

Detection Algorithms

Abstract Most of the today's world data are streaming, time-series data, where anomalies detection gives significant information of possible critical situations. Yet, detecting anomalies in big streaming data is a difficult task, requiring detectors to acquire and process data in a real-time, as they occur, even before they are stored and instantly alarm on potential threats. Suitable to the need for real-time alarm and unsupervised procedures for massive streaming data anomaly detection, algorithms have to be robust, with low processing time, eventually at the cost of the accuracy. In this work we compare the performance of our proposed anomaly detection algorithm HW-GA[1] with other existing methods as ARIMA [10], Moving Average [11] and Holt Winters [12]. The algorithms are tested and results are visualized in the system R, on the three Numenta datasets, with known anomalies and own e-dnevnik dataset with unknown anomalies. Evaluation is done by comparing achieved results (the algorithm execution time and CPU usage). Our interest is monitoring of the streaming log data that are generating in the national educational network (e-dnevnik) that acquires a massive number of online queries and to detect anomalies in order to scale up performance, prevent network downs, alarm on possible attacks and similar.

Download Full-text