scholarly journals Modified Energy Statistic for Unsupervised Anomaly Detection

Author(s):  
Rupam Mukherjee

For prognostics in industrial applications, the degree of anomaly of a test point from a baseline cluster is estimated using a statistical distance metric. Among different statistical distance metrics, energy distance is an interesting concept based on Newton’s Law of Gravitation, promising simpler computation than classical distance metrics. In this paper, we review the state of the art formulations of energy distance and point out several reasons why they are not directly applicable to the anomaly-detection problem. Thereby, we propose a new energy-based metric called the P-statistic which addresses these issues, is applicable to anomaly detection and retains the computational simplicity of the energy distance. We also demonstrate its effectiveness on a real-life data-set.

Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


2018 ◽  
Vol 66 (4) ◽  
pp. 291-307
Author(s):  
Daniel L. C. Mack ◽  
Gautam Biswas ◽  
Hamed Khorasgani ◽  
Dinkar Mylaraswamy ◽  
Raj Bharadwaj

AbstractFault detection and isolation schemes are designed to detect the onset of adverse events during operations of complex systems, such as aircraft, power plants, and industrial processes. In this paper, we combine unsupervised learning techniques with expert knowledge to develop an anomaly detection method to find previously undetected faults from a large database of flight operations data. The unsupervised learning technique combined with a feature extraction scheme applied to the clusters labeled as anomalous facilitates expert analysis in characterizing relevant anomalies and faults in flight operations. We present a case study using a large flight operations data set, and discuss results to demonstrate the effectiveness of our approach. Our method is general, and equally applicable to manufacturing processes and other industrial applications.


2010 ◽  
Vol 09 (06) ◽  
pp. 935-957 ◽  
Author(s):  
JUNLIN ZHOU ◽  
ALEKSANDAR LAZAREVIC ◽  
KUO-WEI HSU ◽  
JAIDEEP SRIVASTAVA ◽  
YAN FU ◽  
...  

Anomaly detection has recently become an important problem in many industrial and financial applications. Very often, the databases from which anomalies have to be found are located at multiple local sites and cannot be merged due to privacy reasons or communication overhead. In this paper, a novel general framework for distributed anomaly detection is proposed. The proposed method consists of three steps: (i) building local models for distributed data sources with unsupervised anomaly detection methods and computing quality measure of local models; (ii) transforming local unsupervised local models into sharing models; and (iii) reusing sharing models for new data and combining their results by considering both quality and diversity of them to detect anomalies in a global view. In experiments performed on synthetic and real-life large data set, the proposed distributed anomaly detection method achieved prediction performance comparable or even slightly better than the global anomaly detection algorithm applied on the data set obtained when all distributed data set were merged.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2532
Author(s):  
Encarna Quesada ◽  
Juan J. Cuadrado-Gallego ◽  
Miguel Ángel Patricio ◽  
Luis Usero

Anomaly Detection research is focused on the development and application of methods that allow for the identification of data that are different enough—compared with the rest of the data set that is being analyzed—and considered anomalies (or, as they are more commonly called, outliers). These values mainly originate from two sources: they may be errors introduced during the collection or handling of the data, or they can be correct, but very different from the rest of the values. It is essential to correctly identify each type as, in the first case, they must be removed from the data set but, in the second case, they must be carefully analyzed and taken into account. The correct selection and use of the model to be applied to a specific problem is fundamental for the success of the anomaly detection study and, in many cases, the use of only one model cannot provide sufficient results, which can be only reached by using a mixture model resulting from the integration of existing and/or ad hoc-developed models. This is the kind of model that is developed and applied to solve the problem presented in this paper. This study deals with the definition and application of an anomaly detection model that combines statistical models and a new method defined by the authors, the Local Transilience Outlier Identification Method, in order to improve the identification of outliers in the sensor-obtained values of variables that affect the operations of wind tunnels. The correct detection of outliers for the variables involved in wind tunnel operations is very important for the industrial ventilation systems industry, especially for vertical wind tunnels, which are used as training facilities for indoor skydiving, as the incorrect performance of such devices may put human lives at risk. In consequence, the use of the presented model for outlier detection may have a high impact in this industrial sector. In this research work, a proof-of-concept is carried out using data from a real installation, in order to test the proposed anomaly analysis method and its application to control the correct performance of wind tunnels.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


2021 ◽  
pp. 58-60
Author(s):  
Naziru Fadisanku Haruna ◽  
Ran Vijay Kumar Singh ◽  
Samsudeen Dahiru

In This paper a modied ratio-type estimator for nite population mean under stratied random sampling using single auxiliary variable has been proposed. The expression for mean square error and bias of the proposed estimator are derived up to the rst order of approximation. The expression for minimum mean square error of proposed estimator is also obtained. The mean square error the proposed estimator is compared with other existing estimators theoretically and condition are obtained under which proposed estimator performed better. A real life population data set has been considered to compare the efciency of the proposed estimator numerically.


2021 ◽  
Author(s):  
Annette Dietmaier ◽  
Thomas Baumann

<p>The European Water Framework Directive (WFD) commits EU member states to achieve a good qualitative and quantitative status of all their water bodies.  WFD provides a list of actions to be taken to achieve the goal of good status.  However, this list disregards the specific conditions under which deep (> 400 m b.g.l.) groundwater aquifers form and exist.  In particular, deep groundwater fluid composition is influenced by interaction with the rock matrix and other geofluids, and may assume a bad status without anthropogenic influences. Thus, a new concept with directions of monitoring and modelling this specific kind of aquifers is needed. Their status evaluation must be based on the effects induced by their exploitation. Here, we analyze long-term real-life production data series to detect changes in the hydrochemical deep groundwater characteristics which might be triggered by balneological and geothermal exploitation. We aim to use these insights to design a set of criteria with which the status of deep groundwater aquifers can be quantitatively and qualitatively determined. Our analysis is based on a unique long-term hydrochemical data set, taken from 8 balneological and geothermal sites in the molasse basin of Lower Bavaria, Germany, and Upper Austria. It is focused on a predefined set of annual hydrochemical concentration values. The data range dates back to 1937. Our methods include developing threshold corridors, within which a good status can be assumed, and developing cluster analyses, correlation, and piper diagram analyses. We observed strong fluctuations in the hydrochemical characteristics of the molasse basin deep groundwater during the last decades. Special interest is put on fluctuations that seem to have a clear start and end date, and to be correlated with other exploitation activities in the region. For example, during the period between 1990 and 2020, bicarbonate and sodium values displayed a clear increase, followed by a distinct dip to below-average values and a subsequent return to average values at site F. During the same time, these values showed striking irregularities at site B. Furthermore, we observed fluctuations in several locations, which come close to disqualifying quality thresholds, commonly used in German balneology. Our preliminary results prove the importance of using long-term (multiple decades) time series analysis to better inform quality and quantity assessments for deep groundwater bodies: most fluctuations would stay undetected within a < 5 year time series window, but become a distinct irregularity when viewed in the context of multiple decades. In the next steps, a quality assessment matrix and threshold corridors will be developed, which take into account methods to identify these fluctuations. This will ultimately aid in assessing the sustainability of deep groundwater exploitation and reservoir management for balneological and geothermal uses.</p>


2020 ◽  
Vol 13 (10) ◽  
pp. 1669-1681
Author(s):  
Zijing Tan ◽  
Ai Ran ◽  
Shuai Ma ◽  
Sheng Qin

Pointwise order dependencies (PODs) are dependencies that specify ordering semantics on attributes of tuples. POD discovery refers to the process of identifying the set Σ of valid and minimal PODs on a given data set D. In practice D is typically large and keeps changing, and it is prohibitively expensive to compute Σ from scratch every time. In this paper, we make a first effort to study the incremental POD discovery problem, aiming at computing changes ΔΣ to Σ such that Σ ⊕ ΔΣ is the set of valid and minimal PODs on D with a set Δ D of tuple insertion updates. (1) We first propose a novel indexing technique for inputs Σ and D. We give algorithms to build and choose indexes for Σ and D , and to update indexes in response to Δ D. We show that POD violations w.r.t. Σ incurred by Δ D can be efficiently identified by leveraging the proposed indexes, with a cost dependent on log (| D |). (2) We then present an effective algorithm for computing ΔΣ, based on Σ and identified violations caused by Δ D. The PODs in Σ that become invalid on D + Δ D are efficiently detected with the proposed indexes, and further new valid PODs on D + Δ D are identified by refining those invalid PODs in Σ on D + Δ D. (3) Finally, using both real-life and synthetic datasets, we experimentally show that our approach outperforms the batch approach that computes from scratch, up to orders of magnitude.


Author(s):  
Juma Ibrahim ◽  
Slavko Gajin

Entropy-based network traffic anomaly detection techniques are attractive due to their simplicity and applicability in a real-time network environment. Even though flow data provide only a basic set of information about network communications, they are suitable for efficient entropy-based anomaly detection techniques. However, a recent work reported a serious weakness of the general entropy-based anomaly detection related to its susceptibility to deception by adding spoofed data that camouflage the anomaly. Moreover, techniques for further classification of the anomalies mostly rely on machine learning, which involves additional complexity. We address these issues by providing two novel approaches. Firstly, we propose an efficient protection mechanism against entropy deception, which is based on the analysis of changes in different entropy types, namely Shannon, R?nyi, and Tsallis entropies, and monitoring the number of distinct elements in a feature distribution as a new detection metric. The proposed approach makes the entropy techniques more reliable. Secondly, we have extended the existing entropy-based anomaly detection approach with the anomaly classification method. Based on a multivariate analysis of the entropy changes of multiple features as well as aggregation by complex feature combinations, entropy-based anomaly classification rules were proposed and successfully verified through experiments. Experimental results are provided to validate the feasibility of the proposed approach for practical implementation of efficient anomaly detection and classification method in the general real-life network environment.


2021 ◽  
Author(s):  
Shikha Suman ◽  
Ashutosh Karna ◽  
Karina Gibert

Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost.


Sign in / Sign up

Export Citation Format

Share Document