scholarly journals Pipeline for Real-time Anomaly Detection in Log Data Streams using Apache Kafka and Apache Spark

2018 ◽  
Vol 182 (24) ◽  
pp. 8-13 ◽  
Author(s):  
Poojitha G. ◽  
Sowmyarani C.
2019 ◽  
Vol 15 (6) ◽  
pp. 814-823
Author(s):  
Jakup Fondaj ◽  
Zirije Hasani

Author(s):  
J. C. Whittier ◽  
S. Nittel ◽  
I. Subasinghe

With live streaming sensors and sensor networks, increasingly large numbers of individual sensors are deployed in physical space. Sensor data streams are a fundamentally novel mechanism to deliver observations to information systems. They enable us to represent spatio-temporal continuous phenomena such as radiation accidents, toxic plumes, or earthquakes almost as instantaneously as they happen in the real world. Sensor data streams discretely sample an earthquake, while the earthquake is continuous over space and time. Programmers attempting to integrate many streams to analyze earthquake activity and scope need to write code to integrate potentially very large sets of asynchronously sampled, concurrent streams in tedious application code. In previous work, we proposed the field stream data model (Liang et al., 2016) for data stream engines. Abstracting the stream of an individual sensor as a temporal field, the field represents the Earth’s movement at the sensor position as continuous. This simplifies analysis across many sensors significantly. In this paper, we undertake a feasibility study of using the field stream model and the open source Data Stream Engine (DSE) Apache Spark(Apache Spark, 2017) to implement a real-time earthquake event detection with a subset of the 250 GPS sensor data streams of the Southern California Integrated GPS Network (SCIGN). The field-based real-time stream queries compute maximum displacement values over the latest query window of each stream, and related spatially neighboring streams to identify earthquake events and their extent. Further, we correlated the detected events with an USGS earthquake event feed. The query results are visualized in real-time.


2017 ◽  
Vol 11 (2) ◽  
pp. 471-482 ◽  
Author(s):  
Brock Bose ◽  
Bhargav Avasarala ◽  
Srikanta Tirthapura ◽  
Yung-Yu Chung ◽  
Donald Steiner

Author(s):  
Sergio Trilles ◽  
Sven Schade ◽  
Óscar Belmonte ◽  
Joaquín Huerta

2018 ◽  
Vol 14 (10) ◽  
pp. 155014771880330 ◽  
Author(s):  
Li Cheng ◽  
Yijie Wang ◽  
Yong Zhou ◽  
Xingkong Ma

Due to the increasing arriving rate and complex relationship of behavior data streams, how to detect sequential behavior anomaly in an efficient and accurate manner has become an emerging challenge. However, most of the existing literature simply calculates the anomaly score for segmented sequence, and there is limited work going deep to investigate data stream segment and structural relationship. Moreover, existing studies cannot meet efficiency requirements because of large number of projected subsequences. In this article, we propose EADetection, an efficient and accurate sequential behavior anomaly detection approach over data streams. EADetection adopts time interval and fuzzy logic–based correlation to segment event stream adaptively based on rolling window. Through dynamic projection space–based fast pruning, large number of repeated patterns are reduced to improve detection efficiency. Meanwhile, EADetection calculates the anomaly score by top-k pattern–based abnormal scoring based on directed loop graph–based storage strategy, which ensures the accuracy of detection. Specially, we design and implement a streaming anomaly detection system based on EADetection to perform real-time detection. Extensive experiments confirm that EADetection can achieve real time and improve accuracy, significantly reduces latency by 36.8% and reduces false positive rate by 6.4% compared with existing approach.


Sign in / Sign up

Export Citation Format

Share Document