anomaly score
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 30)

H-INDEX

4
(FIVE YEARS 2)

2022 ◽  
Vol 16 (4) ◽  
pp. 1-22
Author(s):  
Siddharth Bhatia ◽  
Rui Liu ◽  
Bryan Hooi ◽  
Minji Yoon ◽  
Kijung Shin ◽  
...  

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas , which focuses on detecting microcluster anomalies , or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose Midas -F, to solve the problem by which anomalies are incorporated into the algorithm’s internal states, creating a “poisoning” effect that can allow future anomalies to slip through undetected. Midas -F introduces two modifications: (1) we modify the anomaly scoring function, aiming to reduce the “poisoning” effect of newly arriving edges; (2) we introduce a conditional merge step, which updates the algorithm’s data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the “poisoning” effect. Experiments show that Midas -F has significantly higher accuracy than Midas . In general, the algorithms proposed in this work have the following properties: (a) they detects microcluster anomalies while providing theoretical guarantees about the false positive probability; (b) they are online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; and (c) they provides up to 62% higher area under the receiver operating characteristic curve than state-of-the-art approaches.


Author(s):  
J Rafael Martínez-Galarza ◽  
Federica B Bianco ◽  
Dennis Crake ◽  
Kushal Tirumala ◽  
Ashish A Mahabal ◽  
...  

Abstract Our understanding of the Universe has profited from deliberate, targeted studies of known phenomena, as well as from serendipitous, unexpected discoveries, such as the discovery of a complex variability pattern in the direction of KIC 8462852 (Boyajian’s star). Upcoming surveys, such as the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), will explore the parameter space of astrophysical transients at all time scales, and offer the opportunity to discover even more extreme examples of unexpected phenomena. We investigate strategies to identify novel objects and to contextualize them within large time-series data sets in order to facilitate the discovery of new classes of objects, as well as the physical interpretation of their anomalous nature. We develop a method that combines tree-based and manifold-learning algorithms for anomaly detection in order to perform two tasks: 1) identify and rank anomalous objects in a time-domain dataset; and 2) group those anomalies according to their similarity in order to identify analogs. We achieve the latter by combining an anomaly score from a tree-based method with a dimensionality manifold-learning reduction strategy. Clustering in the reduced space allows for the successful identification of anomalies and analogs. We also assess the impact of pre-processing and feature engineering schemes and investigate the astrophysical nature of the objects that our models identify as anomalous by augmenting the Kepler data with Gaia colour and luminosity information. We find that multiple models, used in combination, are a promising strategy to identify novel light curves and light curve families.


Author(s):  
Tom Finck ◽  
David Schinz ◽  
Lioba Grundl ◽  
Rami Eisawy ◽  
Mehmet Yiğitsoy ◽  
...  

Abstract Purpose Advanced machine-learning (ML) techniques can potentially detect the entire spectrum of pathology through deviations from a learned norm. We investigated the utility of a weakly supervised ML tool to detect characteristic findings related to ischemic stroke in head CT and provide subsequent patient triage. Methods Patients having undergone non-enhanced head CT at a tertiary care hospital in April 2020 with either no anomalies, subacute or chronic ischemia, lacunar infarcts of the deep white matter or hyperdense vessel signs were retrospectively analyzed. Anomaly detection was performed using a weakly supervised ML classifier. Findings were displayed on a voxel-level (heatmap) and pooled to an anomaly score. Thresholds for this score classified patients into i) normal, ii) inconclusive, iii) pathological. Expert-validated radiological reports were considered as ground truth. Test assessment was performed with ROC analysis; inconclusive results were pooled to pathological predictions for accuracy measurements. Results During the investigation period 208 patients were referred for head CT of which 111 could be included. Definite ratings into normal/pathological were feasible in 77 (69.4%) patients. Based on anomaly scores, the AUC to differentiate normal from pathological scans was 0.98 (95% CI 0.97–1.00). The sensitivity, specificity, positive and negative predictive values were 100%, 40.6%, 80.6% and 100%, respectively. Conclusion Our study demonstrates the potential of a weakly supervised anomaly-detection tool to detect stroke findings in head CT. Definite classification into normal/pathological was made with high accuracy in > 2/3 of patients. Anomaly heatmaps further provide guidance towards pathologies, also in cases with inconclusive ratings.


2021 ◽  
Author(s):  
Mustafa Can Kara ◽  
Malina Majeran ◽  
Bret Peterson ◽  
Tom Wimberly ◽  
Greg Sinclair

Abstract Deepwater wells possess a high risk of sand escaping the reservoir into the production systems. Sand production is a common operational issue which results in potential equipment damage and hence product contamination. Excessive sand erosion causes blockage in tubulars and cavities in downhole equipment (subsea valves, chokes, bends etc.), resulting in maintenance costs for subsea equipment that adds up to millions of dollars yearly to operators. In this work, a scalable Machine Learning (ML) model readily accessing historical and real-time feed of sensor and simulation data is built to develop a predictive solution. Deployed workflow can inform Control Room Operators before significant damage occurs. An anomaly detection architecture, a common unsupervised learning framework for maintenance analytics, is deployed. Anomaly detection models include methods within the scope of dimensionality reduction. Principle Component Analysis (PCA) and Long Short-Term Memory (LSTM) Autoencoders are deployed to tackle the problem through reconstruction of the original input. During the workflow, a threshold is calculated after batch training and passed along with anomaly error scores in real-time. An alarm is triggered once the real-time anomaly score passes the threshold calculated during batch training. ML outputs are streamlined in near real-time to the database. In this study, deployed ML model performance is benchmarked against a GOM Deepwater well where sanding is known to occur often. The ML Model architecture can process data that is captured by OSI PI historian, predict anomalous sanding events in advance, and is shown to be scalable to other wells in GOM. It is noted from this study that streamlined ML architecture and outputs simplify exploratory data analysis and model deployment across Onshore and Offshore Business Units. In addition, sanding stakeholders are notified in advance and can take early mitigative action before significant damage to wellhead or downhole equipment occurs instead of reacting to a possible sanding event offshore. The novelty of the utilized ML algorithm and process is in the ability to predict sanding anomalies in advance through ML batch training, infer prediction values near real-time, and scale to other assets.


Author(s):  
Ziyu Ye ◽  
Yuxin Chen ◽  
Haitao Zheng

Anomaly detection presents a unique challenge in machine learning, due to the scarcity of labeled anomaly data. Recent work attempts to mitigate such problems by augmenting training of deep anomaly detection models with additional labeled anomaly samples. However, the labeled data often does not align with the target distribution and introduces harmful bias to the trained model. In this paper, we aim to understand the effect of a biased anomaly set on anomaly detection. Concretely, we view anomaly detection as a supervised learning task where the objective is to optimize the recall at a given false positive rate. We formally study the relative scoring bias of an anomaly detector, defined as the difference in performance with respect to a baseline anomaly detector. We establish the first finite sample rates for estimating the relative scoring bias for deep anomaly detection, and empirically validate our theoretical results on both synthetic and real-world datasets. We also provide an extensive empirical study on how a biased training anomaly set affects the anomaly score function and therefore the detection performance on different anomaly classes. Our study demonstrates scenarios in which the biased anomaly set can be useful or problematic, and provides a solid benchmark for future research.


2021 ◽  
Author(s):  
Xiangyu Song ◽  
Sunil Aryal ◽  
Kai Ming Ting ◽  
zhen Liu ◽  
Bin He

Anomaly detection in hyperspectral image is affected by redundant bands and the limited utilization capacity of spectral-spatial information. In this article, we propose a novel Improved Isolation Forest (IIF) algorithm based on the assumption that anomaly pixels are more susceptible to isolation than the background pixels. The proposed IIF is a modified version of the Isolation Forest (iForest) algorithm, which addresses the poor performance of iForest in detecting local anomalies and anomaly detection in high-dimensional data. Further, we propose a spectral-spatial anomaly detector based on IIF (SSIIFD) to make full use of global and local information, as well as spectral and spatial information. To be specific, first, we apply the Gabor filter to extract spatial features, which are then employed as input to the Relative Mass Isolation Forest (ReMass-iForest) detector to obtain the spatial anomaly score. Next, original images are divided into several homogeneous regions via the Entropy Rate Segmentation (ERS) algorithm, and the preprocessed images are then employed as input to the proposed IIF detector to obtain the spectral anomaly score. Finally, we fuse the spatial and spectral anomaly scores by combining them linearly to predict anomaly pixels. The experimental results on four real hyperspectral data sets demonstrate that the proposed detector outperforms other state-of-the-art methods.


2021 ◽  
Author(s):  
Ensieh Iranmehr ◽  
Ricardo Ferreira ◽  
Tim Böhnert ◽  
Paulo Freitas

Coming up with a system for early detection of machine damages and failures is one of the important challenges in the industrial maintenance procedure to avoid additional costs and downtimes. To approach this goal, this paper uses the signal gathered by a sensing system which employed a spintropic sensor to measure the magnetic field around the machine which somehow shows the machine's behaviour. Using this signal and focusing on analysing and processing the signal, this paper develops a data-driven method to recognize signal patterns and subsequently detects anomalies. A challenging task that we succeeded to overcome in this paper is recognizing relevant signal patterns without having any prior knowledge. An algorithm designed for this task is therefore completely unsupervised which makes it consistent and suitable to apply it for the signals gathered for other types of machines. Using both frequency and time domain information, the proposed algorithm, which utilizes signal processing and machine learning techniques, is able to efficiently identify relevant signal patterns. Clustering results on the real data gathered by the aforementioned sensor have shown the high accuracy of 99.38% in recognizing patterns. Furthermore, an anomaly score measure is used and according to its distribution, anomalies are detected appropriately. <br>


2021 ◽  
Author(s):  
Zhiwei Ma ◽  
Daniel S. Reich ◽  
Sarah Dembling ◽  
Jeff H. Duyn ◽  
Alan P. Koretsky

The UK Biobank (UKB) is a large-scale epidemiological study and its imaging component focuses on the pre-symptomatic participants. Given its large sample size, rare imaging phenotypes within this unique cohort are of interest, as they are often clinically relevant and could be informative for discovering new processes and mechanisms. Identifying these rare phenotypes is often referred to as "anomaly detection", or "outlier detection". However, anomaly detection in neuroimaging has usually been applied in a supervised or semi-supervised manner for clinically defined cohorts of relatively small size. There has been much less work using anomaly detection on large unlabeled cohorts like the UKB. Here we developed a two-level anomaly screening methodology to systematically identify anomalies from ~19,000 UKB subjects. The same method was also applied to ~1,000 young healthy subjects from the Human Connectome Project (HCP). In primary screening, using ventricular, white matter, and gray matter-based imaging phenotypes derived from multimodal MRI, every subject was parameterized with an anomaly score per phenotype to quantitate the degree of abnormality. These anomaly scores were highly robust. Anomaly score distributions of the UKB cohort were all more outlier-prone than the HCP cohort of young adults. The approach enabled the assessments of test-retest reliability via the anomaly scores, which ranged from excellent reliability for ventricular volume, white matter lesion volume, and fractional anisotropy, to good reliability for mean diffusivity and cortical thickness. In secondary screening, the anomalies due to data collection/processing errors were eliminated. A subgroup of the remaining anomalies were radiologically reviewed, and a substantial percentage of them (UKB: 90.1%; HCP: 42.9%) had various brain pathologies such as masses, cysts, white matter lesions, infarcts, encephalomalacia, or prominent sulci. The remaining anomalies of the subgroup had unexplained causes and would be interesting for follow-up. Finally, we show that anomaly detection applied to resting-state functional connectivity did not identify any reliable anomalies, which was attributed to the confounding effects of brain-wide signal variation. Together, this study establishes an unsupervised framework for investigating rare individual imaging phenotypes within large heterogeneous cohorts.


2021 ◽  
Author(s):  
Xiangyu Song ◽  
Sunil Aryal ◽  
Kai Ming Ting ◽  
zhen Liu ◽  
Bin He

Anomaly detection in hyperspectral image is affected by redundant bands and the limited utilization capacity of spectral-spatial information. In this article, we propose a novel Improved Isolation Forest (IIF) algorithm based on the assumption that anomaly pixels are more susceptible to isolation than the background pixels. The proposed IIF is a modified version of the Isolation Forest (iForest) algorithm, which addresses the poor performance of iForest in detecting local anomalies and anomaly detection in high-dimensional data. Further, we propose a spectral-spatial anomaly detector based on IIF (SSIIFD) to make full use of global and local information, as well as spectral and spatial information. To be specific, first, we apply the Gabor filter to extract spatial features, which are then employed as input to the Relative Mass Isolation Forest (ReMass-iForest) detector to obtain the spatial anomaly score. Next, original images are divided into several homogeneous regions via the Entropy Rate Segmentation (ERS) algorithm, and the preprocessed images are then employed as input to the proposed IIF detector to obtain the spectral anomaly score. Finally, we fuse the spatial and spectral anomaly scores by combining them linearly to predict anomaly pixels. The experimental results on four real hyperspectral data sets demonstrate that the proposed detector outperforms other state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document