scholarly journals Evaluating and improving the reliability of gas-phase sensor system calibrations across new locations for ambient measurements and personal exposure monitoring

2019 ◽  
Vol 12 (8) ◽  
pp. 4211-4239 ◽  
Author(s):  
Sharad Vikram ◽  
Ashley Collier-Oxandale ◽  
Michael H. Ostertag ◽  
Massimiliano Menarini ◽  
Camron Chermak ◽  
...  

Abstract. Advances in ambient environmental monitoring technologies are enabling concerned communities and citizens to collect data to better understand their local environment and potential exposures. These mobile, low-cost tools make it possible to collect data with increased temporal and spatial resolution, providing data on a large scale with unprecedented levels of detail. This type of data has the potential to empower people to make personal decisions about their exposure and support the development of local strategies for reducing pollution and improving health outcomes. However, calibration of these low-cost instruments has been a challenge. Often, a sensor package is calibrated via field calibration. This involves colocating the sensor package with a high-quality reference instrument for an extended period and then applying machine learning or other model fitting technique such as multiple linear regression to develop a calibration model for converting raw sensor signals to pollutant concentrations. Although this method helps to correct for the effects of ambient conditions (e.g., temperature) and cross sensitivities with nontarget pollutants, there is a growing body of evidence that calibration models can overfit to a given location or set of environmental conditions on account of the incidental correlation between pollutant levels and environmental conditions, including diurnal cycles. As a result, a sensor package trained at a field site may provide less reliable data when moved, or transferred, to a different location. This is a potential concern for applications seeking to perform monitoring away from regulatory monitoring sites, such as personal mobile monitoring or high-resolution monitoring of a neighborhood. We performed experiments confirming that transferability is indeed a problem and show that it can be improved by collecting data from multiple regulatory sites and building a calibration model that leverages data from a more diverse data set. We deployed three sensor packages to each of three sites with reference monitors (nine packages total) and then rotated the sensor packages through the sites over time. Two sites were in San Diego, CA, with a third outside of Bakersfield, CA, offering varying environmental conditions, general air quality composition, and pollutant concentrations. When compared to prior single-site calibration, the multisite approach exhibits better model transferability for a range of modeling approaches. Our experiments also reveal that random forest is especially prone to overfitting and confirm prior results that transfer is a significant source of both bias and standard error. Linear regression, on the other hand, although it exhibits relatively high error, does not degrade much in transfer. Bias dominated in our experiments, suggesting that transferability might be easily increased by detecting and correcting for bias. Also, given that many monitoring applications involve the deployment of many sensor packages based on the same sensing technology, there is an opportunity to leverage the availability of multiple sensors at multiple sites during calibration to lower the cost of training and better tolerate transfer. We contribute a new neural network architecture model termed split-NN that splits the model into two stages, in which the first stage corrects for sensor-to-sensor variation and the second stage uses the combined data of all the sensors to build a model for a single sensor package. The split-NN modeling approach outperforms multiple linear regression, traditional two- and four-layer neural networks, and random forest models. Depending on the training configuration, compared to random forest the split-NN method reduced error 0 %–11 % for NO2 and 6 %–13 % for O3.

2019 ◽  
Author(s):  
Sharad Vikram ◽  
Ashley Collier-Oxandale ◽  
Michael Ostertag ◽  
Massimiliano Menarini ◽  
Camron Chermak ◽  
...  

Abstract. Advances in ambient environmental monitoring technologies are enabling concerned communities and citizens to collect data to better understand their local environment and potential exposures. These mobile, low-cost tools make it possible to collect data with increased temporal and spatial resolution providing data on a large scale with unprecedented levels of detail. This type of data has the potential to empower people to make personal decisions about their exposure and support the development of local strategies for reducing pollution and improving health outcomes. However, calibration of these low-cost instruments has been a challenge. Often, a sensor package is calibrated via field calibration. This involves colocating the sensor package with a high-quality reference instrument for an extended period and then applying machine learning or other model fitting technique such as multiple-linear regression to develop a calibration model for converting raw sensor signals to pollutant concentrations. Although this method helps to correct for the effects of ambient conditions (e.g., temperature) and cross-sensitivities with non-target pollutants, there is a growing body of evidence that calibration models can overfit to a given location or set of environmental conditions on account of the incidental correlation between pollutant levels and environmental conditions, including diurnal cycles. As a result, a sensor package trained at a field site may provide less reliable data when moved, or transferred, to a different location. This is a potential concern for applications seeking to perform monitoring away from regulatory monitoring sites, such as personal mobile monitoring or high-resolution monitoring of a neighborhood. We performed experiments confirming that transferability is indeed a problem and show that it can be improved by collecting data from multiple regulatory sites and building a calibration model that leverages data from a more diverse dataset. We deployed three sensor packages to each of three sites with reference monitors (nine packages total) and then rotated the sensor packages through the sites over time. Two sites were in San Diego, CA, with a third outside of Bakersfield, CA, offering varying environmental conditions, general air quality composition, and pollutant concentrations. When compared to prior single-site calibration, the multi-site approach exhibits better model transferability for a range of modeling approaches. Our experiments also reveal that random forest is especially prone to overfitting, and confirms prior results that transfer is a significant source of both bias and standard error. Bias dominated in our experiments, suggesting that transferability might be easily increased by detecting and correcting for bias. Also, given that many monitoring applications involve the deployment of many sensor packages based on the same sensing technology, there is an opportunity to leverage the availability of multiple sensors at multiple sites during calibration. We contribute a new neural network architecture model termed split-NN that splits the model into two-stages, in which the first stage corrects for sensor-to-sensor variation and the second stage uses the combined data of all the sensors to build a model for a single sensor package. The split-NN modeling approach outperforms multiple linear regression, traditional 2- and 4-layer neural network, and random forest models.


2017 ◽  
Vol 41 (6) ◽  
pp. 648-664 ◽  
Author(s):  
Sérgio Henrique Godinho Silva ◽  
Anita Fernanda dos Santos Teixeira ◽  
Michele Duarte de Menezes ◽  
Luiz Roberto Guimarães Guilherme ◽  
Fatima Maria de Souza Moreira ◽  
...  

ABSTRACT Determination of soil properties helps in the correct management of soil fertility. The portable X-ray fluorescence spectrometer (pXRF) has been recently adopted to determine total chemical element contents in soils, allowing soil property inferences. However, these studies are still scarce in Brazil and other countries. The objectives of this work were to predict soil properties using pXRF data, comparing stepwise multiple linear regression (SMLR) and random forest (RF) methods, as well as mapping and validating soil properties. 120 soil samples were collected at three depths and submitted to laboratory analyses. pXRF was used in the samples and total element contents were determined. From pXRF data, SMLR and RF were used to predict soil laboratory results, reflecting soil properties, and the models were validated. The best method was used to spatialize soil properties. Using SMLR, models had high values of R² (≥0.8), however the highest accuracy was obtained in RF modeling. Exchangeable Ca, Al, Mg, potential and effective cation exchange capacity, soil organic matter, pH, and base saturation had adequate adjustment and accurate predictions with RF. Eight out of the 10 soil properties predicted by RF using pXRF data had CaO as the most important variable helping predictions, followed by P2O5, Zn and Cr. Maps generated using RF from pXRF data had high accuracy for six soil properties, reaching R2 up to 0.83. pXRF in association with RF can be used to predict soil properties with high accuracy at low cost and time, besides providing variables aiding digital soil mapping.


Author(s):  
Nebojša M. Jurišević ◽  
◽  
Dušan R. Gordić ◽  
Vladimir Vukašinović ◽  
Arso M. Vukicevic ◽  
...  

Preschool buildings are among the biggest water consumers in the public buildings sector, which efficient management of water consumption could make considerable savings in city budgets. The aim of this study was twofold: 1) to assess prognostic performances of 21 parameters that influence the water consumption and 2) to assess performances of two different approaches (statistical and machine learning-based) with 6 various predictive models for the estimation of water consumption by using the observed parameters. The considered data set was collected from the total share of public preschool buildings in the city of Kragujevac, Serbia, over a three-year period. Top-performing statistical-based model was Multiple Linear Regression, while the best machine learning method was Random Forest. Particularly, Random Forest gained the best overall performances while the Multiple linear regression showed the same precision as the Random Forest when dealing with buildings that consume more than 200 m3/month. It is found that both methods provide satisfying estimates, leaving for potential users to choose between better performances (Random Forest) or usability (Multiple Linear Regression).


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 256
Author(s):  
Pengfei Han ◽  
Han Mei ◽  
Di Liu ◽  
Ning Zeng ◽  
Xiao Tang ◽  
...  

Pollutant gases, such as CO, NO2, O3, and SO2 affect human health, and low-cost sensors are an important complement to regulatory-grade instruments in pollutant monitoring. Previous studies focused on one or several species, while comprehensive assessments of multiple sensors remain limited. We conducted a 12-month field evaluation of four Alphasense sensors in Beijing and used single linear regression (SLR), multiple linear regression (MLR), random forest regressor (RFR), and neural network (long short-term memory (LSTM)) methods to calibrate and validate the measurements with nearby reference measurements from national monitoring stations. For performances, CO > O3 > NO2 > SO2 for the coefficient of determination (R2) and root mean square error (RMSE). The MLR did not increase the R2 after considering the temperature and relative humidity influences compared with the SLR (with R2 remaining at approximately 0.6 for O3 and 0.4 for NO2). However, the RFR and LSTM models significantly increased the O3, NO2, and SO2 performances, with the R2 increasing from 0.3–0.5 to >0.7 for O3 and NO2, and the RMSE decreasing from 20.4 to 13.2 ppb for NO2. For the SLR, there were relatively larger biases, while the LSTMs maintained a close mean relative bias of approximately zero (e.g., <5% for O3 and NO2), indicating that these sensors combined with the LSTMs are suitable for hot spot detection. We highlight that the performance of LSTM is better than that of random forest and linear methods. This study assessed four electrochemical air quality sensors and different calibration models, and the methodology and results can benefit assessments of other low-cost sensors.


2021 ◽  
Author(s):  
Daniel Westervelt ◽  
Celeste McFarlane ◽  
Faye McNeill ◽  
R (Subu) Subramanian ◽  
Mike Giordano ◽  
...  

&lt;p&gt;There is a severe lack of air pollution data around the world. This includes large portions of low- and middle-income countries (LMICs), as well as rural areas of wealthier nations as monitors tend to be located in large metropolises. Low cost sensors (LCS) for measuring air pollution and identifying sources offer a possible path forward to remedy the lack of data, though significant knowledge gaps and caveats remain regarding the accurate application and interpretation of such devices.&lt;/p&gt;&lt;p&gt;The Clean Air Monitoring and Solutions Network (CAMS-Net) establishes an international network of networks that unites scientists, decision-makers, city administrators, citizen groups, the private sector, and other local stakeholders in co-developing new methods and best practices for real-time air quality data collection, data sharing, and solutions for air quality improvements. CAMS-Net brings together at least 32 multidisciplinary member networks from North America, Europe, Africa, and India. The project establishes a mechanism for international collaboration, builds technical capacity, shares knowledge, and trains the next generation of air quality practitioners and advocates, including domestic and international graduate students and postdoctoral researchers.&amp;#160;&lt;/p&gt;&lt;p&gt;Here we present some preliminary research accelerated through the CAMS-Net project. Specifically, we present LCS calibration methodology for several co-locations in LMICs (Accra, Ghana; Kampala, Uganda; Nairobi, Kenya; Addis Ababa, Ethiopia; and Kolkata, India), in which reference BAM-1020 PM2.5 monitors were placed side-by-side with LCS. We demonstrate that both simple multiple linear regression calibration methods for bias-correcting LCS and more complex machine learning methods can reduce bias in LCS to close to zero, while increasing correlation. For example, in Kampala, Raw PurpleAir PM2.5 data are strongly correlated with the BAM-1020 PM2.5 (r&lt;sup&gt;2&lt;/sup&gt; = 0.88), but have a mean bias of approximately 12 &amp;#956;g m&lt;sup&gt;-3&lt;/sup&gt;. Two calibration models, multiple linear regression and a random forest approach, decrease mean bias from 12 &amp;#956;g m&lt;sup&gt;-3 &lt;/sup&gt;to -1.84 &amp;#181;g m&lt;sup&gt;-3&lt;/sup&gt; or less and improve the the r&lt;sup&gt;2&lt;/sup&gt; from 0.88 to 0.96. We find similar performance in several other regions of the world. Location-specific calibration of low-cost sensors is necessary in order to obtain useful data, since sensor performance is closely tied to environmental conditions such as relative humidity. This work is a first step towards developing a database of region-specific correction factors for low cost sensors, which are exploding in popularity globally and have the potential to close the air pollution data gap especially in resource-limited countries.&amp;#160;&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;


2017 ◽  
Vol 37 (1) ◽  
pp. 109 ◽  
Author(s):  
Yohanita Maulina Akbar ◽  
Dr. Rudiati Evi Masithoh ◽  
Nafis Khuriyati

In this research, Multiple Linear Regression (MLR) model was used to predict Brix and pH of banana based on RGB and Lab color values. Banana samples varied in color and ripening level from less ripen to ripen. RGB and Lab values were measured non-destructively using colormeter, while Brix and pH were determined using conventional method in laboratory. Multivariate analysis was done using the Unscrambler ® X 10.3 (CAMO, AS, OLSO, Norway, and trial version). Results showed that calibration model using MLR was able to predict Brix and pH of banana based on RGB and Lab color values. Furthermore, validation data were used to test the selected models. MLR model to predict Brix based on RGB and Lab validation resulted in 0.8 and 0.84 of determination coefficient between observation and prediction data. The model was also able to predict pH based on RGB and Lab values with 0.71 and 0.79 of determination coefficient between observation and prediction data. ABSTRAKPada penelitian ini, model Multiple Linear Regression (MLR) digunakan untuk memprediksi Brix dan pH pada buah pisang berdasarkan nilai warna Red Green Blue (RGB) dan Lab. Pisang yang dianalisis mempunyai variasi warna dari kurang masak sampai masak. Parameter warna RGB dan Lab dilakukan secara non-destruktif dengan menggunakan colormeter, sedangkan pengukuran kualitas internal yaitu Brix dan pH ditentukan secara destruktif atau dengan prosedur konvensional di laboratorium. Aplikasi analisis multivariat yang digunakan adalah Unscrambler ® X 10.3 (CAMO, AS, OLSO, Norway, versi trial). Analisis data menunjukkan bahwa model kalibrasi MLR dapat digunakan untuk memprediksi Brix dan pH berdasarkan parameter warna RGB dan Lab pada buah pisang. Selanjutnya, data validasi digunakan untuk menguji model MLR terpilih. Model kalibrasi MLR dapat memprediksi Brix berdasarkan nilai RGB dan Lab dengan nilai koefisien determinasi (R2) sebesar 0,8 dan 0,84, secara berurutan. Sedangkan koefisien determinasi (R2) untuk pH berdasarkan warna RGB dan Lab adalah 0,71 dan 0,79.


2018 ◽  
Vol 11 (6) ◽  
pp. 3717-3735 ◽  
Author(s):  
Alessandro Bigi ◽  
Michael Mueller ◽  
Stuart K. Grange ◽  
Grazia Ghermandi ◽  
Christoph Hueglin

Abstract. Low cost sensors for measuring atmospheric pollutants are experiencing an increase in popularity worldwide among practitioners, academia and environmental agencies, and a large amount of data by these devices are being delivered to the public. Notwithstanding their behaviour, performance and reliability are not yet fully investigated and understood. In the present study we investigate the medium term performance of a set of NO and NO2 electrochemical sensors in Switzerland using three different regression algorithms within a field calibration approach. In order to mimic a realistic application of these devices, the sensors were initially co-located at a rural regulatory monitoring site for a 4-month calibration period, and subsequently deployed for 4 months at two distant regulatory urban sites in traffic and urban background conditions, where the performance of the calibration algorithms was explored. The applied algorithms were Multivariate Linear Regression, Support Vector Regression and Random Forest; these were tested, along with the sensors, in terms of generalisability, selectivity, drift, uncertainty, bias, noise and suitability for spatial mapping intra-urban pollution gradients with hourly resolution. Results from the deployment at the urban sites show a better performance of the non-linear algorithms (Support Vector Regression and Random Forest) achieving RMSE  <  5 ppb, R2 between 0.74 and 0.95 and MAE between 2 and 4 ppb. The combined use of both NO and NO2 sensor output in the estimate of each pollutant showed some contribution by NO sensor to NO2 estimate and vice-versa. All algorithms exhibited a drift ranging between 5 and 10 ppb for Random Forest and 15 ppb for Multivariate Linear Regression at the end of the deployment. The lowest concentration correctly estimated, with a 25 % relative expanded uncertainty, resulted in ca. 15–20 ppb and was provided by the non-linear algorithms. As an assessment for the suitability of the tested sensors for a targeted application, the probability of resolving hourly concentration difference in cities was investigated. It was found that NO concentration differences of 5–10 ppb (8–10 for NO2) can reliably be detected (90 % confidence), depending on the air pollution level. The findings of this study, although derived from a specific sensor type and sensor model, are based on a flexible methodology and have extensive potential for exploring the performance of other low cost sensors, that are different in their target pollutant and sensing technology.


2012 ◽  
Vol 51 (01) ◽  
pp. 39-44 ◽  
Author(s):  
K. Matsuoka ◽  
K. Yoshino

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.


2011 ◽  
Vol 106 (6) ◽  
pp. 3216-3229 ◽  
Author(s):  
L. Hu ◽  
M. Liang ◽  
A. Mouraux ◽  
R. G. Wise ◽  
Y. Hu ◽  
...  

Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLRd) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLRd method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLRd approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLRd effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLRd can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.


Sign in / Sign up

Export Citation Format

Share Document