scholarly journals Outlier Detection Methods for Uncovering of Critical Events in Historical Phasor Measurement Records

2018 ◽  
Vol 64 ◽  
pp. 08006 ◽  
Author(s):  
Kummerow André ◽  
Nicolai Steffen ◽  
Bretschneider Peter

The scope of this survey is the uncovering of potential critical events from mixed PMU data sets. An unsupervised procedure is introduced with the use of different outlier detection methods. For that, different techniques for signal analysis are used to generate features in time and frequency domain as well as linear and non-linear dimension reduction techniques. That approach enables the exploration of critical grid dynamics in power systems without prior knowledge about existing failure patterns. Furthermore new failure patterns can be extracted for the creation of training data sets used for online detection algorithms.

Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3536
Author(s):  
Jakub Górski ◽  
Adam Jabłoński ◽  
Mateusz Heesch ◽  
Michał Dziendzikowski ◽  
Ziemowit Dworakowski

Condition monitoring is an indispensable element related to the operation of rotating machinery. In this article, the monitoring system for the parallel gearbox was proposed. The novelty detection approach is used to develop the condition assessment support system, which requires data collection for a healthy structure. The measured signals were processed to extract quantitative indicators sensitive to the type of damage occurring in this type of structure. The indicator’s values were used for the development of four different novelty detection algorithms. Presented novelty detection models operate on three principles: feature space distance, probability distribution, and input reconstruction. One of the distance-based models is adaptive, adjusting to new data flowing in the form of a stream. The authors test the developed algorithms on experimental and simulation data with a similar distribution, using the training set consisting mainly of samples generated by the simulator. Presented in the article results demonstrate the effectiveness of the trained models on both data sets.


2017 ◽  
Vol 29 (4) ◽  
pp. 1124-1150 ◽  
Author(s):  
Minnan Luo ◽  
Feiping Nie ◽  
Xiaojun Chang ◽  
Yi Yang ◽  
Alexander G. Hauptmann ◽  
...  

Robust principal component analysis (PCA) is one of the most important dimension-reduction techniques for handling high-dimensional data with outliers. However, most of the existing robust PCA presupposes that the mean of the data is zero and incorrectly utilizes the average of data as the optimal mean of robust PCA. In fact, this assumption holds only for the squared [Formula: see text]-norm-based traditional PCA. In this letter, we equivalently reformulate the objective of conventional PCA and learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on [Formula: see text]-norm. The proposed method is robust to outliers and also invariant to rotation. More important, the reformulated objective not only automatically avoids the calculation of optimal mean and makes the assumption of centered data unnecessary, but also theoretically connects to the minimization of reconstruction error. To solve the proposed nonsmooth problem, we exploit an efficient optimization algorithm to soften the contributions from outliers by reweighting each data point iteratively. We theoretically analyze the convergence and computational complexity of the proposed algorithm. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.


2020 ◽  
Vol 13 (6) ◽  
pp. 2995-3022
Author(s):  
Sini Isokääntä ◽  
Eetu Kari ◽  
Angela Buchholz ◽  
Liqing Hao ◽  
Siegfried Schobesberger ◽  
...  

Abstract. Online analysis with mass spectrometers produces complex data sets, consisting of mass spectra with a large number of chemical compounds (ions). Statistical dimension reduction techniques (SDRTs) are able to condense complex data sets into a more compact form while preserving the information included in the original observations. The general principle of these techniques is to investigate the underlying dependencies of the measured variables by combining variables with similar characteristics into distinct groups, called factors or components. Currently, positive matrix factorization (PMF) is the most commonly exploited SDRT across a range of atmospheric studies, in particular for source apportionment. In this study, we used five different SDRTs in analysing mass spectral data from complex gas- and particle-phase measurements during a laboratory experiment investigating the interactions of gasoline car exhaust and α-pinene. Specifically, we used four factor analysis techniques, namely principal component analysis (PCA), PMF, exploratory factor analysis (EFA) and non-negative matrix factorization (NMF), as well as one clustering technique, partitioning around medoids (PAM). All SDRTs were able to resolve four to five factors from the gas-phase measurements, including an α-pinene precursor factor, two to three oxidation product factors, and a background or car exhaust precursor factor. NMF and PMF provided an additional oxidation product factor, which was not found by other SDRTs. The results from EFA and PCA were similar after applying oblique rotations. For the particle-phase measurements, four factors were discovered with NMF: one primary factor, a mixed-LVOOA factor and two α-pinene secondary-organic-aerosol-derived (SOA-derived) factors. PMF was able to separate two factors: semi-volatile oxygenated organic aerosol (SVOOA) and low-volatility oxygenated organic aerosol (LVOOA). PAM was not able to resolve interpretable clusters due to general limitations of clustering methods, as the high degree of fragmentation taking place in the aerosol mass spectrometer (AMS) causes different compounds formed at different stages in the experiment to be detected at the same variable. However, when preliminary analysis is needed, or isomers and mixed sources are not expected, cluster analysis may be a useful tool, as the results are simpler and thus easier to interpret. In the factor analysis techniques, any single ion generally contributes to multiple factors, although EFA and PCA try to minimize this spread. Our analysis shows that different SDRTs put emphasis on different parts of the data, and with only one technique, some interesting data properties may still stay undiscovered. Thus, validation of the acquired results, either by comparing between different SDRTs or applying one technique multiple times (e.g. by resampling the data or giving different starting values for iterative algorithms), is important, as it may protect the user from dismissing unexpected results as “unphysical”.


2020 ◽  
Vol 5 (1) ◽  
pp. 1
Author(s):  
Omar Alghushairy ◽  
Raed Alsini ◽  
Terence Soule ◽  
Xiaogang Ma

Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.


2004 ◽  
Vol 13 (04) ◽  
pp. 801-811 ◽  
Author(s):  
CHANG-TIEN LU ◽  
DECHANG CHEN ◽  
YUFENG KOU

A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. Previous work in spatial outlier detection focuses on detecting spatial outliers with a single attribute. In the paper, we propose two approaches to discover spatial outliers with multiple attributes. We formulate the multi-attribute spatial outlier detection problem in a general way, provide two effective detection algorithms, and analyze their computation complexity. In addition, using a real-world census data, we demonstrate that our approaches can effectively identify local abnormality in large spatial data sets.


2013 ◽  
Vol 756-759 ◽  
pp. 493-496 ◽  
Author(s):  
Hai Lei Wang ◽  
Wen Bo Li ◽  
Bing Yu Sun

In this paper a novel Support vector clustering (SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only preferred when large scale set is available. To solve this problem, in this paper we focus on establishing the context of support vector clustering approach for outlier detection. Compared to traditional outlier detection methods , the performance of the SVC is not sensitive to the selection of needed parameters. The experiment results proved the efficiency of our method.


Author(s):  
Senol Emir ◽  
Hasan Dincer ◽  
Umit Hacioglu ◽  
Serhat Yuksel

In a data set, an outlier refers to a data point that is considerably different from the others. Detecting outliers provides useful application-specific insights and leads to choosing right prediction models. Outlier detection (also known as anomaly detection or novelty detection) has been studied in statistics and machine learning for a long time. It is an essential preprocessing step of data mining process. In this study, outlier detection step in the data mining process is applied for identifying the top 20 outlier firms. Three outlier detection algorithms are utilized using fundamental analysis variables of firms listed in Borsa Istanbul for the 2011-2014 period. The results of each algorithm are presented and compared. Findings show that 15 different firms are identified by three different outlier detection methods. KCHOL and SAHOL have the greatest number of appearances with 12 observations among these firms. By investigating the results, it is concluded that each of three algorithms makes different outlier firm lists due to differences in their approaches for outlier detection.


2019 ◽  
Author(s):  
Sini Isokääntä ◽  
Eetu Kari ◽  
Angela Buchholz ◽  
Liqing Hao ◽  
Siegfried Schobesberger ◽  
...  

Abstract. Online analysis with mass spectrometers produces complex data sets, consisting of mass spectra with a large number of chemical compounds (ions). Statistical dimension reduction techniques (SDRTs) are able to condense complex data sets into a more compact form while preserving the information included in the original observations. The general principle of these techniques is to investigate the underlying dependencies of the measured variables, by combining variables with similar characteristics to distinct groups, called factors or components. Currently, positive matrix factorization (PMF) is the most commonly exploited SDRT across a range of atmospheric studies, in particular for source apportionment. In this study, we used 5 different SDRTs in analysing mass spectral data from complex gas- and particle phase measurements during laboratory experiment investigating the interactions of gasoline car exhaust and α-pinene. Specifically, we used four factor analysis techniques: principal component analysis (PCA), positive matrix factorization (PMF), exploratory factor analysis (EFA), and non-negative matrix factorization (NMF), as well as one clustering technique, partitioning around medoids (PAM). All SDRTs were able to resolve 4–5 factors from the gas phase measurements, including an α-pinene precursor factor, 2–3 oxidation product factors and a background/car exhaust precursor factor. NMF and PMF provided an additional oxidation product factor, which was not found by other SDRTs. The results from EFA and PCA were similar after applying oblique rotations. For the particle phase measurements, four factors were discovered with NMF and PMF: one primary factor, a mixed LVOOA factor, and two α-pinene SOA derived factors. PAM was not able to resolve interpretable clusters due to general limitations of clustering methods, as the high degree of fragmentation taking place in the AMS causes different compounds formed at different stages in the experiment to be detected at the same variable. However, when preliminary analysis is needed, or isomers and mixed sources are not expected, cluster analysis may be a useful tool as the results are simpler and thus easier to interpret. In the factor analysis techniques, any single ion generally contributes to multiple factors, although EFA and PCA try to minimize this spread. Our analysis shows that different SDRTs put emphasis on different parts of the data, and with only one technique some interesting data properties may still stay undiscovered. Thus, validation of the acquired results either by comparing between different SDRTs or applying one technique multiple times (e.g. by resampling the data or giving different starting values for iterative algorithms) is important as it may protect the user from dismissing unexpected results as unphysical.


2021 ◽  
Vol 11 (24) ◽  
pp. 12073
Author(s):  
Michael Heigl ◽  
Enrico Weigelt ◽  
Dalibor Fiala ◽  
Martin Schramm

Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1096
Author(s):  
Agnieszka Nowak-Brzezińska ◽  
Czesław Horyń

The article presents both methods of clustering and outlier detection in complex data, such as rule-based knowledge bases. What distinguishes this work from others is, first, the application of clustering algorithms to rules in domain knowledge bases, and secondly, the use of outlier detection algorithms to detect unusual rules in knowledge bases. The aim of the paper is the analysis of using four algorithms for outlier detection in rule-based knowledge bases: Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), K-MEANS, and SMALLCLUSTERS. The subject of outlier mining is very important nowadays. Outliers in rules If-Then mean unusual rules, which are rare in comparing to others and should be explored by the domain expert as soon as possible. In the research, the authors use the outlier detection methods to find a given number of outliers in rules (1%, 5%, 10%), while in small groups, the number of outliers covers no more than 5% of the rule cluster. Subsequently, the authors analyze which of seven various quality indices, which they use for all rules and after removing selected outliers, improve the quality of rule clusters. In the experimental stage, the authors use six different knowledge bases. The best results (the most often the clusters quality was improved) are achieved for two outlier detection algorithms LOF and COF.


Sign in / Sign up

Export Citation Format

Share Document