Online and Unsupervised Anomaly Detection for Streaming Data Using an Array of Sliding Windows and PDDs

Author(s):  
Lingyu Zhang ◽  
Jiabao Zhao ◽  
Wei Li
Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2451 ◽  
Author(s):  
Mohsin Munir ◽  
Shoaib Ahmed Siddiqui ◽  
Muhammad Ali Chattha ◽  
Andreas Dengel ◽  
Sheraz Ahmed

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.


2020 ◽  
Author(s):  
Bo Zhang ◽  
Hongyu Zhang ◽  
Pablo Moscato

<div>Complex software intensive systems, especially distributed systems, generate logs for troubleshooting. The logs are text messages recording system events, which can help engineers determine the system's runtime status. This paper proposes a novel approach named ADR (stands for Anomaly Detection by workflow Relations) that employs matrix nullspace to mine numerical relations from log data. The mined relations can be used for both offline and online anomaly detection and facilitate fault diagnosis. We have evaluated ADR on log data collected from two distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM Blue Gene/L supercomputers system). ADR successfully mined 87 and 669 numerical relations from the logs and used them to detect anomalies with high precision and recall. For online anomaly detection, ADR employs PSO (Particle Swarm Optimization) to find the optimal sliding windows' size and achieves fast anomaly detection.</div><div>The experimental results confirm that ADR is effective for both offline and online anomaly detection. </div>


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Shoghag Panjarian ◽  
Jozef Madzo ◽  
Kelsey Keith ◽  
Carolyn M. Slater ◽  
Carmen Sapienza ◽  
...  

Abstract Background DNA methylation alterations have similar patterns in normal aging tissue and in cancer. In this study, we investigated breast tissue-specific age-related DNA methylation alterations and used those methylation sites to identify individuals with outlier phenotypes. Outlier phenotype is identified by unsupervised anomaly detection algorithms and is defined by individuals who have normal tissue age-dependent DNA methylation levels that vary dramatically from the population mean. Methods We generated whole-genome DNA methylation profiles (GSE160233) on purified epithelial cells and used publicly available Infinium HumanMethylation 450K array datasets (TCGA, GSE88883, GSE69914, GSE101961, and GSE74214) for discovery and validation. Results We found that hypermethylation in normal breast tissue is the best predictor of hypermethylation in cancer. Using unsupervised anomaly detection approaches, we found that about 10% of the individuals (39/427) were outliers for DNA methylation from 6 DNA methylation datasets. We also found that there were significantly more outlier samples in normal-adjacent to cancer (24/139, 17.3%) than in normal samples (15/228, 5.2%). Additionally, we found significant differences between the predicted ages based on DNA methylation and the chronological ages among outliers and not-outliers. Additionally, we found that accelerated outliers (older predicted age) were more frequent in normal-adjacent to cancer (14/17, 82%) compared to normal samples from individuals without cancer (3/17, 18%). Furthermore, in matched samples, we found that the epigenome of the outliers in the pre-malignant tissue was as severely altered as in cancer. Conclusions A subset of patients with breast cancer has severely altered epigenomes which are characterized by accelerated aging in their normal-appearing tissue. In the future, these DNA methylation sites should be studied further such as in cell-free DNA to determine their potential use as biomarkers for early detection of malignant transformation and preventive intervention in breast cancer.


Sign in / Sign up

Export Citation Format

Share Document