The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project

Knowledge extraction, data mining, e-learning or web applications platforms use heterogeneous and distributed data. The proliferation of these multifaceted platforms faces many challenges such as high scalability, the coexistence of complex similarity metrics, and the requirement of data quality evaluation. In this study, an extended complete formal taxonomy and some algorithms that utilize in achieving the detection and correction of contextual data quality anomalies were developed and implemented on structured data. Our methods were effective in detecting and correcting more data anomalies than existing taxonomy techniques, and also highlighted the demerit of Support Vector Machine (SVM). These proposed techniques, therefore, will be of relevance in detection and correction of errors in large contextual data (Big data).

Download Full-text

Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality

Sensors ◽

10.3390/s17102329 ◽

2017 ◽

Vol 17 (10) ◽

pp. 2329 ◽

Cited By ~ 2

Author(s):

Robert Vasta ◽

Ian Crandell ◽

Anthony Millican ◽

Leanna House ◽

Eric Smith

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Sensor Data ◽

Sensor Systems

Download Full-text

Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems

ACM Transactions on Cyber-Physical Systems ◽

10.1145/3445812 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-30

Author(s):

Gonçalo Jesus ◽

António Casimiro ◽

Anabela Oliveira

Keyword(s):

Machine Learning ◽

Environmental Monitoring ◽

Data Quality ◽

Outlier Detection ◽

Prediction Models ◽

Sensor Data ◽

Natural Phenomenon ◽

Monitoring Systems ◽

Data Errors ◽

Redundant Data

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.

Download Full-text

Research on Data Quality Evaluation Model for Wide-area Distributed Power Quality Monitoring System

2019 9th International Conference on Power and Energy Systems (ICPES) ◽

10.1109/icpes47639.2019.9105487 ◽

2019 ◽

Author(s):

Xu Siyao ◽

Zhou Gang ◽

Yang Qiang ◽

Xie Shanyi ◽

Wang Xin ◽

...

Keyword(s):

Data Quality ◽

Power Quality ◽

Monitoring System ◽

Quality Evaluation ◽

Evaluation Model ◽

Quality Monitoring ◽

Wide Area ◽

Distributed Power ◽

Power Quality Monitoring ◽

Quality Evaluation Model

Download Full-text

Multivariate Outlier Detection in Postprocessing of Multi-temporal PS-InSAR Results using Deep Learning

Procedia Computer Science ◽

10.1016/j.procs.2021.01.326 ◽

2021 ◽

Vol 181 ◽

pp. 1146-1153

Author(s):

Pedro Aguiar ◽

António Cunha ◽

Matus Bakon ◽

Antonio M. Ruiz-Armenteros ◽

Joaquim J. Sousa

Keyword(s):

Deep Learning ◽

Outlier Detection ◽

Multivariate Outlier Detection ◽

Multi Temporal

Download Full-text

Multivariate Outlier Detection With High-Breakdown Estimators

Journal of the American Statistical Association ◽

10.1198/jasa.2009.tm09147 ◽

2010 ◽

Vol 105 (489) ◽

pp. 147-156 ◽

Cited By ~ 76

Author(s):

Andrea Cerioli

Keyword(s):

Outlier Detection ◽

Multivariate Outlier Detection

Download Full-text

Acceleration data quality assessment for bridge structural health monitoring via statistical and deep-learning approach

10.2749/ghent.2021.0555 ◽

2021 ◽

Author(s):

Huaqiang Zhong ◽

Limin Sun ◽

José Turmo ◽

Ye Xia

Keyword(s):

Structural Health Monitoring ◽

Data Quality ◽

Health Monitoring ◽

Structural Damage ◽

Quality Evaluation ◽

Evaluation Method ◽

Operating Conditions ◽

Monitoring Data ◽

Statistical Characteristics ◽

Structural Health

<p>In recent years, the safety and comfort problems of bridges are not uncommon, and the operating conditions of in-service bridges have received widespread attention. Many large-span key bridges have installed structural health monitoring systems and collected massive amounts of data. Monitoring data is the basis of structural damage identification and performance evaluation, and it is of great significance to analyze and evaluate its quality. This paper takes the acceleration monitoring data of the main girder and arch rib of a long-span arch bridge as the research object, analyzes and summarizes the statistical characteristics of the data, summarizes 6 abnormal data conditions, and proposes a data quality evaluation method of convolutional neural network. This paper conducts frequency statistics on the acceleration vibration amplitude of the bridge in December 2018 in hours. In order to highlight the end effect of frequency statistics, the whole is amplified and used as network input for training and data quality evaluation. The results are good. It provides another new method for structural monitoring data quality evaluation and abnormal data elimination.</p>

Download Full-text