Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE ( ANN Outlier Detection ). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.

Download Full-text

Managing Sensor Data Uncertainty

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/jaeis.2013010103 ◽

2013 ◽

Vol 4 (1) ◽

pp. 35-54 ◽

Cited By ~ 7

Author(s):

Claudia C. Gutiérrez Rodríguez ◽

Sylvie Servigne

Keyword(s):

Environmental Monitoring ◽

Data Quality ◽

Real World ◽

Sensor Data ◽

Data Uncertainty ◽

Monitoring Systems ◽

Quality Information ◽

Technological Improvement ◽

Monitoring Applications

With an increasingly technological improvement, sensors infrastructure actually supports many current and promising environmental applications. Environmental Monitoring Systems built on such sensors removes geographical, temporal and other restraints while increasing both the coverage and the quality of real world understanding. However, a main issue for such applications is the uncertainty of data coming from sensors, which may impact experts’ decisions. In this paper, the authors address this problem with an approach dedicated to provide environmental monitoring applications and users with data quality information.

Download Full-text

Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality

Sensors ◽

10.3390/s17102329 ◽

2017 ◽

Vol 17 (10) ◽

pp. 2329 ◽

Cited By ~ 2

Author(s):

Robert Vasta ◽

Ian Crandell ◽

Anthony Millican ◽

Leanna House ◽

Eric Smith

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Sensor Data ◽

Sensor Systems

Download Full-text

Machine Learning Solutions for Bridge Scour Forecast Based on Monitoring Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211012693 ◽

2021 ◽

pp. 036119812110126

Author(s):

Negin Yousefpour ◽

Steve Downie ◽

Steve Walker ◽

Nathan Perkins ◽

Hristo Dikanski

Keyword(s):

Machine Learning ◽

Real Time ◽

Monitoring Data ◽

Sensor Data ◽

Monitoring Systems ◽

Bridge Scour ◽

Scour Monitoring ◽

Novel Approach ◽

Bayesian Inference Method ◽

Water Level Variations

Bridge scour is a challenge throughout the U.S.A. and other countries. Despite the scale of the issue, there is still a substantial lack of robust methods for scour prediction to support reliable, risk-based management and decision making. Throughout the past decade, the use of real-time scour monitoring systems has gained increasing interest among state departments of transportation across the U.S.A. This paper introduces three distinct methodologies for scour prediction using advanced artificial intelligence (AI)/machine learning (ML) techniques based on real-time scour monitoring data. Scour monitoring data included the riverbed and river stage elevation time series at bridge piers gathered from various sources. Deep learning algorithms showed promising in prediction of bed elevation and water level variations as early as a week in advance. Ensemble neural networks proved successful in the predicting the maximum upcoming scour depth, using the observed sensor data at the onset of a scour episode, and based on bridge pier, flow and riverbed characteristics. In addition, two of the common empirical scour models were calibrated based on the observed sensor data using the Bayesian inference method, showing significant improvement in prediction accuracy. Overall, this paper introduces a novel approach for scour risk management by integrating emerging AI/ML algorithms with real-time monitoring systems for early scour forecast.

Download Full-text

Outlier detection in WSN by entropy based machine learning approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i3.pp1435-1443 ◽

2020 ◽

Vol 20 (3) ◽

pp. 1435

Author(s):

Manmohan Singh Yadav ◽

Shish Ahamad

Keyword(s):

Machine Learning ◽

Outlier Detection ◽

Sensor Data ◽

Learning Approach ◽

Random Subspace ◽

Environmental Disasters ◽

Machine Learning Approach ◽

The World ◽

Prediction Capability

Environmental disasters like flooding, earthquake etc. causes catastrophic effects all over the world. WSN based techniques have become popular in susceptibility modelling of such disaster due to their greater strength and efficiency in the prediction of such threats. This paper demonstrates the machine learning-based approach to predict outlier in sensor data with bagging, boosting, random subspace, SVM and KNN based frameworks for outlier prediction using a WSN data. First of all database is pre processed with 14 sensor motes with presence of outlier due to intrusion. Subsequently segmented database is created from sensor pairs. Finally, the data entropy is calculated and used as a feature to determine the presence of outlier used different approach. Results show that the KNN model has the highest prediction capability for outlier assessment.

Download Full-text

Ensuring high sensor data quality through use of online outlier detection techniques

International Journal of Sensor Networks ◽

10.1504/ijsnet.2010.033116 ◽

2010 ◽

Vol 7 (3) ◽

pp. 141 ◽

Cited By ~ 27

Author(s):

Yang Zhang ◽

Nirvana Meratnia ◽

Paul J.M. Havinga

Keyword(s):

Data Quality ◽

Outlier Detection ◽

Sensor Data ◽

Detection Techniques

Download Full-text

Outlier Detection in Sensor Data Using Machine Learning Techniques for IoT Framework and Wireless Sensor Networks: A Brief Study

2019 International Conference on Applied Machine Learning (ICAML) ◽

10.1109/icaml48257.2019.00043 ◽

2019 ◽

Author(s):

Nimisha Ghosh ◽

Krishanu Maity ◽

Rourab Paul ◽

Satyabrata Maity

Keyword(s):

Machine Learning ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Outlier Detection ◽

Sensor Data ◽

Machine Learning Techniques ◽

Wireless Sensor ◽

Learning Techniques

Download Full-text

Improving Data Quality of Low-cost IoT Sensors in Environmental Monitoring Networks Using Data Fusion and Machine Learning Approach

ICT Express ◽

10.1016/j.icte.2020.06.004 ◽

2020 ◽

Vol 6 (3) ◽

pp. 220-228 ◽

Cited By ~ 1

Author(s):

Nwamaka U. Okafor ◽

Yahia Alghorani ◽

Declan T. Delaney

Keyword(s):

Machine Learning ◽

Data Fusion ◽

Environmental Monitoring ◽

Data Quality ◽

Low Cost ◽

Learning Approach ◽

Monitoring Networks ◽

Machine Learning Approach ◽

Using Data

Download Full-text

Medical Analytics in the Presence of Human-Error. An Exploration of EMR Data Quality using MIMIC-III. (Preprint)

10.2196/preprints.16111 ◽

2019 ◽

Author(s):

Tomer Sagi ◽

Nitzan Shmueli ◽

Bruce Friedman ◽

Ruth Bergman

Keyword(s):

Machine Learning ◽

Data Quality ◽

Error Detection ◽

Visual Analytics ◽

Medical Condition ◽

Score Change ◽

Unique Challenge ◽

Data Errors ◽

The Impact ◽

Mimic Iii

BACKGROUND Public Electronic Medical Records (EMR) datasets are a goldmine for vendors and researchers seeking to develop analytics designed to assist caregivers in monitoring, diagnosis, and treatment of patients. Both complex machine-learning-based tools, which require copious amounts of data to train, and a simple trend graph presented in a patient-centered dashboard, are sensitive to noise. OBJECTIVE We aim to systematically explore data errors in MIMIC-III as a representative of secondary use datasets and the impact of these errors on downstream analytics. METHODS We discuss the unique challenge of accounting for the specific patient's medical condition and personal characteristics such as age, weight, gender, and others, in identifying data errors when only a few measurements of each patient are available. To do so, we examine the prevalence and manifestations of errors in one of the most popular public medical research databases - MIMIC-III. We then evaluate how these errors impact visual analytics, score-based sepsis analytics SOFA and qSOFA, and a machine-learning-based sepsis predictor. RESULTS We find a variety of error patterns in MIMIC-III and highlight effective methods to find them. All analytics are found to be sensitive to sporadic error. Visual analytics are severely impacted, limiting their usefulness in the presence of error. qSOFA and SOFA suffer a score change of +1 (of 3) and +2.3-4 (of 15). The sepsis predictor suffers from a 0.01-0.3 score change compared to a median score of 0.08. CONCLUSIONS The use of statistical methods to detect data errors is limited to high-throughput scenarios and large data aggregations. There is a dearth of medical guidelines and error-detection practices to support rule-based systems, required to keep analytics safe and trustworthy in low-volume scenarios. Analytics developers should test their software’s sensitivity to error on public datasets. The medical informatics community should improve support for medical data-quality endeavors by creating guidelines for plausible values and analytics robustness to error and collecting real-world dirty datasets which contain errors as they appear in normal EMR use.

Download Full-text

A data-oriented approach for outlier detection

International Journal of Scientific Research and Management ◽

10.18535/ijsrm/v7i1.ec01 ◽

2019 ◽

Vol 7 (01) ◽

Author(s):

Nripesh Trivedi

Keyword(s):

Machine Learning ◽

Outlier Detection ◽

Data Stream ◽

Sensor Data ◽

Learning Models ◽

Hoeffding Tree ◽

Tree Algorithms ◽

Oriented Approach ◽

Machine Learning Models

In this paper, characteristics of data obtained from the sensors (used in OpenSense project) are identified in order to build a data-oriented approach. This approach consists of application of Class Outliers: Distance Based (CODB) and Hoeffding tree algorithms. Subsequently, machine learning models were built to detect outliers in a sensor data stream. The approach presented in this paper may be used for developing methodologies for data-oriented outlier detection

Download Full-text

Towards AI in Array Databases

10.5194/egusphere-egu21-7409 ◽

2021 ◽

Author(s):

Otoniel José Campos Escobar ◽

Peter Baumann

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Query Language ◽

Data Retrieval ◽

Detailed Comparison ◽

Sensor Data ◽

Valuable Insight ◽

Server Side ◽

Rainfall Prediction ◽

Array Databases

Multi-dimensional arrays (also known as raster data, gridded data, or datacubes) are key, if not essential, in many science and engineering domains. In the case of Earth sciences, a significant amount of the data that is produced falls into the category of array data. That being said, the amount of data that is produced daily from this field is huge. This makes it hard for researchers to analyze and retrieve any valuable insight from it. 1-D sensor data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z subsurface voxel data, 4-D x/y/z/t atmospheric and ocean data often produce dozens of Terabytes of data every day, and the rate is only expected to increase in the future. In response, Array Databases systems were specifically designed and constructed to provide modeling, storage, and processing support for multi-dimensional arrays. They offer a declarative query language for flexible data retrieval and some, e.g., rasdaman, provide federation processing and standard-based query capabilities compliant with OGC standards&#160;such as WCS, WCPS, and WMS. However, despite these advances, the gap between efficient information retrieval and the actual application of this data remains very broad, especially in the domain of artificial intelligence AI and machine learning ML.In this contribution, we present the state-of-art in performing ML through Array Databases. First, a motivating example is introduced from the Deep Rain Project which aims at enhancing rainfall prediction accuracy in mountainous areas by implementing ML code on top of an Array Database. Deep Rain also explores novel methods for training prediction models by implementing server-side ML processing inside the database. A brief introduction of the Array Database rasdaman that is used in this project is also provided featuring its standard-based query capabilities and scalable federation processing features that are required for rainfall data processing. Next, the workflow approach for ML and Array Databases that is employed in the Deep Rain project is described in detail listing the benefits of using an Array Database with declarative query language capabilities in the machine learning pipeline. A concrete use case will be used to illustrate step by step how these tools integrate. Next, an alternative approach will be presented where ML is done inside the Array Database using user-defined functions UDFs. Finally,&#160; a detailed comparison between the UDF and workflow approach is presented explaining their challenges and benefits.

Download Full-text