Gaussian Process regression model for dynamically calibrating a wireless low-cost particulate matter sensor network in Delhi
Abstract. Wireless low-cost particulate matter sensor networks (WLPMSNs) are transforming air quality monitoring by providing PM information at finer spatial and temporal resolutions; however, large-scale WLPMSN calibration and maintenance remain a challenge because the manual labor involved in initial calibration by collocation and routine recalibration is intensive, the transferability of the calibration models determined from initial collocation to new deployment sites is questionable as calibration factors typically vary with urban heterogeneity of operating conditions and aerosol optical properties, and the stability of low-cost sensors can develop drift or degrade over time. This study presents a simultaneous Gaussian Process regression (GPR) and simple linear regression pipeline to calibrate and monitor dense WLPMSNs on the fly by leveraging all available reference monitors across an area without resorting to pre-deployment collocation calibration. We evaluated our method for Delhi where the PM2.5 measurements of all 22 regulatory reference and 10 low-cost nodes were available in 59 valid days from 1 January 2018 to 31 March 2018 (PM2.5 averaged 138 ± 31 μg m−3 among 22 reference stations) using a leave-one-out cross-validation (CV) over the 22 reference nodes. We showed that our approach can achieve an overall 30 % prediction error (RMSE: 33 μg m−3) at a 24 h scale and is robust as underscored by the small variability in the GPR model parameters and in the model-produced calibration factors for the low-cost nodes among the 22-fold CV. We revealed that the accuracy of our calibrations depends on the degree of homogeneity of PM concentrations, and decreases with increasing local source contributions. As by-products of dynamic calibration, our algorithm can be adapted for automated large-scale WLPMSN monitoring as simulations proved its capability of differentiating malfunctioning or singular low-cost nodes within a network via model-generated calibration factors with the aberrant nodes having slopes close to 0 and intercepts close to the global mean of true PM2.5 and of tracking the drift of low-cost nodes accurately within 4 % error for all the simulation scenarios. The simulation results showed that ~20 reference stations are optimum for our solution in Delhi and confirmed that low-cost nodes can extend the spatial precision of a network by decreasing the extent of pure interpolation among only reference stations. Our solution has substantial implications in reducing the amount of manual labor for the calibration and surveillance of extensive WLPMSNs, improving the spatial comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.