outlier identification
Recently Published Documents


TOTAL DOCUMENTS

129
(FIVE YEARS 37)

H-INDEX

18
(FIVE YEARS 4)

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 226
Author(s):  
Marek Hermansa ◽  
Michał Kozielski ◽  
Marcin Michalak ◽  
Krzysztof Szczyrba ◽  
Łukasz Wróbel ◽  
...  

In this paper, the problem of the identification of undesirable events is discussed. Such events can be poorly represented in the historical data, and it is predominantly impossible to learn from past examples. The discussed issue is considered in the work in the context of two use cases in which vibration and temperature measurements collected by wireless sensors are analysed. These use cases include crushers at a coal-fired power plant and gantries in a steelworks converter. The awareness, resulting from the cooperation with industry, of the need for a system that works in cold start conditions and does not flood the machine operator with alarms was the motivation for proposing a new predictive maintenance method. The proposed solution is based on the methods of outlier identification. These methods are applied to the collected data that was transformed into a multidimensional feature vector. The novelty of the proposed solution stems from the creation of a methodology for the reduction of false positive alarms, which was applied to a system identifying undesirable events. This methodology is based on the adaptation of the system to the analysed data, the interaction with the dispatcher, and the use of the XAI (eXplainable Artificial Intelligence) method. The experiments performed on several data sets showed that the proposed method reduced false alarms by 90.25% on average in relation to the performance of the stand-alone outlier detection method. The obtained results allowed for the implementation of the developed method to a system operating in a real industrial facility. The conducted research may be valuable for systems with a cold start problem where frequent alarms can lead to discouragement and disregard for the system by the user.


2021 ◽  
pp. 33-58
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter focuses on the process of cleaning data and preparing it for further processing. Specifically, the chapter discusses various techniques that you will use, including preprocessing, outlier identification, data consistency, and the normalization or standardization process, used to normalize your data. The chapter further discusses different measurement types and what methods can be used for which types. The chapter also discusses ways to deal with issues you may encounter with inconsistent or dirty data. The chapter takes a more practical approach by integrating several labs with actual game data to demonstrate how you can perform these steps on real game data.


Author(s):  
Pietro Coretto

AbstractIn this paper we study a finite Gaussian mixture model with an additional uniform component that has the role to catch points in the tails of the data distribution. An adaptive constraint enforces a certain level of separation between the Gaussian mixture components and the uniform component representing noise and outliers in the tail of the distribution. The latter makes the proposed tool particularly useful for robust estimation and outlier identification. A constrained ML estimator is introduced for which existence and consistency is shown. One of the attractive features of the methodology is that the noise level is estimated from data. We also develop an EM-type algorithm with proven convergence. Based on numerical evidence we show how the methods developed in this paper are useful for several fundamental data analysis tasks: outlier identification, robust location-scale estimation, clustering, and density estimation.


2021 ◽  
Vol 43 (3) ◽  
pp. 160-170
Author(s):  
Sangsu Park ◽  
No-Suk Park ◽  
Seong-su Kim ◽  
Gwirae Jo ◽  
Sukmin Yoon

Objectives : This study was conducted to propose a new methodology for efficiently identifying and removing various outliers that occur in data collected through automated water quality monitoring systems. In the present study, water temperature data were collected from domestic G_water supply system, and the performance of the proposed methodology was tested for water temperature data collected from domestic G_water supply system.Methods : We applied the following analytical procedure to identify outliers in the water quality data: First, a normality test was performed on the collected data. If normality condition was satisfied, the Z-score was used. However, if the normality condition was not satisfied, outliers were identified using the quartile, and the limitations of the existing methodology were analyzed. Second, we decomposed the intrinsic mode function using empirical mode decomposition and ensemble empirical mode decomposition for the collected data, and then considered the occurrence of modal mixing. Finally, a group of intrinsic mode functions was selected using statistical characteristics to identify outliers. In addition, the performance of the method was verified after removing and interpolating outliers using regression analysis and Cook’s distance.Results and Discussion : In the case of water temperature data, as normality condition was not satisfied, outlier identification was carried out by applying the modified quartile method. It was confirmed that outliers distributed within the seasonal component could not be identified at all. In the case of empirical mode decomposition, modal mixing occurred because of the effect of outliers. However, in the case of the ensemble empirical mode decomposition, modal mixing was resolved and the distinct seasonal components were decomposed as intrinsic mode functions. The intrinsic mode functions were synthesized, which showed statistical correlation with the raw water temperature data. As a result of developing a regression model using the synthesized intrinsic mode functions and raw water temperature data and performing outlier search based on Cook’s distances, we concluded that various outliers distributed within the seasonal component could be effectively identified.Conclusions : Considering that satisfactory results could be derived from statistical analysis of the data collected from the automated water quality monitoring system, it can be concluded that outlier identification procedures are essential. However, in the case of the conventional univariate outlier search method, it is apparent that the outlier search performance is significantly poor for data with strong inherent variability, and the interpolation method for the searched outlier cannot be performed. Conversely, the outlier identification method based on ensemble empirical mode decomposition and regression analysis proposed in this study shows excellent discrimination performance for outliers distributed in data with strong inherent variability. Moreover, this method has the advantage of reducing the analyst’s dependence on subjective judgment by presenting statistical cutoff criteria. An additional advantage of the method is that data can be interpolated after removing outliers using intrinsic mode functions. Therefore, the outlier search and interpolation method proposed in this study is expected to have greater applicability as a more effective analysis tool compared to the existing univariate outlier search method.


2021 ◽  
pp. 1-12
Author(s):  
Anjana Gosain ◽  
Sonika Dahiya

DKIFCM (Density Based Kernelized Intuitionistic Fuzzy C Means) is the new proposed clustering algorithm that is based on outlier identification, kernel functions, and intuitionist fuzzy approach. DKIFCM is an inspiration from Kernelized Intuitionistic Fuzzy C Means (KIFCM) algorithm and it addresses the performance issue in the presence of outliers. It first identifies outliers based on density of data and then clusters are computed accurately by mapping the data to high dimensional feature space. Performance and effectiveness of various algorithms are evaluated on synthetic 2D data sets such as Diamond data set (D10, D12, and D15), and noisy Dunn data set as well as on high dimension real-world data set such as Fisher-Iris, Wine, and Wisconsin Breast Cancer Data-set. Results of DKIFCM are compared with results of other algorithms such as Fuzzy-C-Means (FCM), Intuitionistic FCM (IFCM), Kernel-Intuitionistic FCM (KIFCM), and density-oriented FCM (DOFCM), and the performance of proposed algorithm is found to be superior even in the presence of outliers and noise. Key advantages of DKIFCM are outlier identification, robustness to noise, and accurate centroid computation.


2021 ◽  
Vol 151 ◽  
pp. 107947
Author(s):  
Sung-Chan Jang ◽  
Hyunjong Woo ◽  
Jeong-Guk Kim ◽  
Dong-Ju Lee ◽  
Il-Sik Kang ◽  
...  

Author(s):  
Piers Steel ◽  
Sjoerd Beugelsdijk ◽  
Herman Aguinis

AbstractMeta-analyses summarize a field’s research base and are therefore highly influential. Despite their value, the standards for an excellent meta-analysis, one that is potentially award-winning, have changed in the last decade. Each step of a meta-analysis is now more formalized, from the identification of relevant articles to coding, moderator analysis, and reporting of results. What was exemplary a decade ago can be somewhat dated today. Using the award-winning meta-analysis by Stahl et al. (Unraveling the effects of cultural diversity in teams: A meta-analysis of research on multicultural work groups. Journal of International Business Studies, 41(4):690–709, 2010) as an exemplar, we adopted a multi-disciplinary approach (e.g., management, psychology, health sciences) to summarize the anatomy (i.e., fundamental components) of a modern meta-analysis, focusing on: (1) data collection (i.e., literature search and screening, coding), (2) data preparation (i.e., treatment of multiple effect sizes, outlier identification and management, publication bias), (3) data analysis (i.e., average effect sizes, heterogeneity of effect sizes, moderator search), and (4) reporting (i.e., transparency and reproducibility, future research directions). In addition, we provide guidelines and a decision-making tree for when even foundational and highly cited meta-analyses should be updated. Based on the latest evidence, we summarize what journal editors and reviewers should expect, authors should provide, and readers (i.e., other researchers, practitioners, and policymakers) should consider about meta-analytic reviews.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Zbigniew Wiśniewski ◽  
Marek Hubert Zienkiewicz

AbstractThe paper presents Msplit estimation as an alternative to methods in the class of robust M-estimation. The analysis conducted showed that Msplit estimation is highly efficient in the identification of observations encumbered by gross errors, especially those of small or moderate values. The classical methods of robust estimation provide then unsatisfactory results. Msplit estimation also shows high robustness to single gross errors of large values. The presented analysis of Msplit estimators’ robustness is of a chiefly empirical nature and is based on the example of a simulated levelling network and a real angular-linear network. Using the Monte Carlo method, mean success rates for outlier identification were determined and the courses of empirical influence functions were specified. The outcomes of the analysis were compared with the relevant values achieved via selected methods of robust M-estimation.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2156
Author(s):  
Vilijandas Bagdonavičius ◽  
Linas Petkevičius

We propose a simple multiple outlier identification method for parametric location-scale and shape-scale models when the number of possible outliers is not specified. The method is based on a result giving asymptotic properties of extreme z-scores. Robust estimators of model parameters are used defining z-scores. An extensive simulation study was done for comparing of the proposed method with existing methods. For the normal family, the method is compared with the well known Davies-Gather, Rosner’s, Hawking’s and Bolshev’s multiple outlier identification methods. The choice of an upper limit for the number of possible outliers in case of Rosner’s test application is discussed. For other families, the proposed method is compared with a method generalizing Gather-Davies method. In most situations, the new method has the highest outlier identification power in terms of masking and swamping values. We also created R package outliersTests for proposed test.


Sign in / Sign up

Export Citation Format

Share Document