Abstract. Low-cost sensors (LCSs) are an appealing solution to the problem of spatial
resolution in air quality measurement, but they currently do not have the
same analytical performance as regulatory reference methods. Individual
sensors can be susceptible to analytical cross-interferences; have random
signal variability; and experience drift over short, medium and long
timescales. To overcome some of the performance limitations of individual
sensors we use a clustering approach using the instantaneous median signal
from six identical electrochemical sensors to minimize the randomized drifts
and inter-sensor differences. We report here on a low-power analytical device
(< 200 W) that is comprised of clusters of sensors for
NO2, Ox, CO and total volatile organic compounds
(VOCs) and that measures supporting parameters such as water vapour and temperature.
This was tested in the field against reference monitors, collecting ambient
air pollution data in Beijing, China. Comparisons were made of NO2
and Ox clustered sensor data against reference methods for
calibrations derived from factory settings, in-field simple linear regression
(SLR) and then against three machine learning (ML) algorithms. The parametric
supervised ML algorithms, boosted regression trees (BRTs) and boosted linear
regression (BLR), and the non-parametric technique, Gaussian process (GP),
used all available sensor data to improve the measurement estimate of
NO2 and Ox. In all cases ML produced an
observational value that was closer to reference measurements than SLR alone.
In combination, sensor clustering and ML generated sensor data of a quality
that was close to that of regulatory measurements (using the RMSE metric) yet
retained a very substantial cost and power advantage.