Supplementary material to "Gaussian Process regression model for dynamically calibrating a wireless low-cost particulate matter sensor network in Delhi"

Abstract. Wireless low-cost particulate matter sensor networks (WLPMSNs) are transforming air quality monitoring by providing particulate matter (PM) information at finer spatial and temporal resolutions. However, large-scale WLPMSN calibration and maintenance remain a challenge. The manual labor involved in initial calibration by collocation and routine recalibration is intensive. The transferability of the calibration models determined from initial collocation to new deployment sites is questionable, as calibration factors typically vary with the urban heterogeneity of operating conditions and aerosol optical properties. Furthermore, the stability of low-cost sensors can drift or degrade over time. This study presents a simultaneous Gaussian process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi, where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available for 59 d from 1 January to 31 March 2018 (PM2.5 averaged 138±31 µg m−3 among 22 reference stations), using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 µg m−3) at a 24 h scale, and it is robust as it is underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. Of the 22 reference stations, high-quality predictions were observed for those stations whose PM2.5 means were close to the Delhi-wide mean (i.e., 138±31 µg m−3), and relatively poor predictions were observed for those nodes whose means differed substantially from the Delhi-wide mean (particularly on the lower end). We also observed washed-out local variability in PM2.5 across the 10 low-cost sites after calibration using our approach, which stands in marked contrast to the true wide variability across the reference sites. These observations revealed that our proposed technique (and more generally the geostatistical technique) requires high spatial homogeneity in the pollutant concentrations to be fully effective. We further demonstrated that our algorithm performance is insensitive to training window size as the mean prediction error rate and the standard error of the mean (SEM) for the 22 reference stations remained consistent at ∼30 % and ∼3 %–4 %, respectively, when an increment of 2 d of data was included in the model training. The markedly low requirement of our algorithm for training data enables the models to always be nearly the most updated in the field, thus realizing the algorithm's full potential for dynamically surveilling large-scale WLPMSNs by detecting malfunctioning low-cost nodes and tracking the drift with little latency. Our algorithm presented similarly stable 26 %–34 % mean prediction errors and ∼3 %–7 % SEMs over the sampling period when pre-trained on the current week's data and predicting 1 week ahead, and therefore it is suitable for online calibration. Simulations conducted using our algorithm suggest that in addition to dynamic calibration, the algorithm can also be adapted for automated monitoring of large-scale WLPMSNs. In these simulations, the algorithm was able to differentiate malfunctioning low-cost nodes (due to either hardware failure or under the heavy influence of local sources) within a network by identifying aberrant model-generated calibration factors (i.e., slopes close to zero and intercepts close to the Delhi-wide mean of true PM2.5). The algorithm was also able to track the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ∼20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.

Download Full-text

Gaussian Process regression model for dynamically calibrating a wireless low-cost particulate matter sensor network in Delhi

10.5194/amt-2019-55 ◽

2019 ◽

Author(s):

Tongshu Zheng ◽

Michael H. Bergin ◽

Ronak Sutaria ◽

Sachchida N. Tripathi ◽

Robert Caldow ◽

...

Keyword(s):

Particulate Matter ◽

Gaussian Process ◽

Large Scale ◽

Low Cost ◽

Gaussian Process Regression ◽

Operating Conditions ◽

Local Source ◽

Model Parameters ◽

Manual Labor ◽

Source Contributions

Abstract. Wireless low-cost particulate matter sensor networks (WLPMSNs) are transforming air quality monitoring by providing PM information at finer spatial and temporal resolutions; however, large-scale WLPMSN calibration and maintenance remain a challenge because the manual labor involved in initial calibration by collocation and routine recalibration is intensive, the transferability of the calibration models determined from initial collocation to new deployment sites is questionable as calibration factors typically vary with urban heterogeneity of operating conditions and aerosol optical properties, and the stability of low-cost sensors can develop drift or degrade over time. This study presents a simultaneous Gaussian Process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available in 59 valid days from 1 January 2018 to 31 March 2018 (PM2.5 averaged 138 ± 31 μg m−3 among 22 reference stations) using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 μg m−3) at a 24 h scale and is robust as underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. We revealed that the accuracy of our calibrations depends on the degree of homogeneity of PM concentrations, and decreases with increasing local source contributions. As by-products of dynamic calibration, our algorithm can be adapted for automated large-scale WLPMSN monitoring as simulations proved its capability of differentiating malfunctioning or singular low-cost nodes within a network via model-generated calibration factors with the aberrant nodes having slopes close to 0 and intercepts close to the global mean of true PM2.5 and of tracking the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ~20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.

Download Full-text

Generalized Gaussian Process Regression Model for Non-Gaussian Functional Data

Journal of the American Statistical Association ◽

10.1080/01621459.2014.889021 ◽

2014 ◽

Vol 109 (507) ◽

pp. 1123-1133 ◽

Cited By ~ 17

Author(s):

Bo Wang ◽

Jian Qing Shi

Keyword(s):

Regression Model ◽

Gaussian Process ◽

Functional Data ◽

Gaussian Process Regression ◽

Non Gaussian

Download Full-text

Parametric lower bound for nonlinear filtering based on Gaussian process regression model

2017 20th International Conference on Information Fusion (Fusion) ◽

10.23919/icif.2017.8009640 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yuxin Zhao ◽

Carsten Fritsche ◽

Fredrik Gunnarsson

Keyword(s):

Lower Bound ◽

Regression Model ◽

Gaussian Process ◽

Nonlinear Filtering ◽

Gaussian Process Regression

Download Full-text

Anomaly Detection in Video Surveillance via Gaussian Process

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415550113 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1555011 ◽

Cited By ~ 8

Author(s):

Nannan Li ◽

Xinyu Wu ◽

Huiwen Guo ◽

Dan Xu ◽

Yongsheng Ou ◽

...

Keyword(s):

Anomaly Detection ◽

Regression Model ◽

Gaussian Process ◽

Video Surveillance ◽

Gaussian Process Regression ◽

Bayesian Regression ◽

Current Frame ◽

Motion Patterns ◽

Online Clustering ◽

Efficient Calculation

In this paper, we propose a new approach for anomaly detection in video surveillance. This approach is based on a nonparametric Bayesian regression model built upon Gaussian process priors. It establishes a set of basic vectors describing motion patterns from low-level features via online clustering, and then constructs a Gaussian process regression model to approximate the distribution of motion patterns in kernel space. We analyze different anomaly measure criterions derived from Gaussian process regression model and compare their performances. To reduce false detections caused by crowd occlusion, we utilize supplement information from previous frames to assist in anomaly detection for current frame. In addition, we address the problem of hyperparameter tuning and discuss the method of efficient calculation to reduce computation overhead. The approach is verified on published anomaly detection datasets and compared with other existing methods. The experiment results demonstrate that it can detect various anomalies efficiently and accurately.

Download Full-text