scholarly journals The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project

2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Kelly M. Sunderland ◽  
◽  
Derek Beaton ◽  
Julia Fraser ◽  
Donna Kwan ◽  
...  
2021 ◽  
Vol 25 (4) ◽  
pp. 763-787
Author(s):  
Alladoumbaye Ngueilbaye ◽  
Hongzhi Wang ◽  
Daouda Ahmat Mahamat ◽  
Ibrahim A. Elgendy ◽  
Sahalu B. Junaidu

Knowledge extraction, data mining, e-learning or web applications platforms use heterogeneous and distributed data. The proliferation of these multifaceted platforms faces many challenges such as high scalability, the coexistence of complex similarity metrics, and the requirement of data quality evaluation. In this study, an extended complete formal taxonomy and some algorithms that utilize in achieving the detection and correction of contextual data quality anomalies were developed and implemented on structured data. Our methods were effective in detecting and correcting more data anomalies than existing taxonomy techniques, and also highlighted the demerit of Support Vector Machine (SVM). These proposed techniques, therefore, will be of relevance in detection and correction of errors in large contextual data (Big data).


Sensors ◽  
2017 ◽  
Vol 17 (10) ◽  
pp. 2329 ◽  
Author(s):  
Robert Vasta ◽  
Ian Crandell ◽  
Anthony Millican ◽  
Leanna House ◽  
Eric Smith

2021 ◽  
Vol 5 (3) ◽  
pp. 1-30
Author(s):  
Gonçalo Jesus ◽  
António Casimiro ◽  
Anabela Oliveira

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.


2021 ◽  
Vol 181 ◽  
pp. 1146-1153
Author(s):  
Pedro Aguiar ◽  
António Cunha ◽  
Matus Bakon ◽  
Antonio M. Ruiz-Armenteros ◽  
Joaquim J. Sousa

2021 ◽  
Author(s):  
Huaqiang Zhong ◽  
Limin Sun ◽  
José Turmo ◽  
Ye Xia

<p>In recent years, the safety and comfort problems of bridges are not uncommon, and the operating conditions of in-service bridges have received widespread attention. Many large-span key bridges have installed structural health monitoring systems and collected massive amounts of data. Monitoring data is the basis of structural damage identification and performance evaluation, and it is of great significance to analyze and evaluate its quality. This paper takes the acceleration monitoring data of the main girder and arch rib of a long-span arch bridge as the research object, analyzes and summarizes the statistical characteristics of the data, summarizes 6 abnormal data conditions, and proposes a data quality evaluation method of convolutional neural network. This paper conducts frequency statistics on the acceleration vibration amplitude of the bridge in December 2018 in hours. In order to highlight the end effect of frequency statistics, the whole is amplified and used as network input for training and data quality evaluation. The results are good. It provides another new method for structural monitoring data quality evaluation and abnormal data elimination.</p>


Sign in / Sign up

Export Citation Format

Share Document