scholarly journals Input-Adaptive Proxy for Black Carbon as a Virtual Sensor

Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 182 ◽  
Author(s):  
Pak Lun Fung ◽  
Martha A. Zaidan ◽  
Salla Sillanpää ◽  
Anu Kousa ◽  
Jarkko V. Niemi ◽  
...  

Missing data has been a challenge in air quality measurement. In this study, we develop an input-adaptive proxy, which selects input variables of other air quality variables based on their correlation coefficients with the output variable. The proxy uses ordinary least squares regression model with robust optimization and limits the input variables to a maximum of three to avoid overfitting. The adaptive proxy learns from the data set and generates the best model evaluated by adjusted coefficient of determination (adjR2). In case of missing data in the input variables, the proposed adaptive proxy then uses the second-best model until all the missing data gaps are filled up. We estimated black carbon (BC) concentration by using the input-adaptive proxy in two sites in Helsinki, which respectively represent street canyon and urban background scenario, as a case study. Accumulation mode, traffic counts, nitrogen dioxide and lung deposited surface area are found as input variables in models with the top rank. In contrast to traditional proxy, which gives 20–80% of data, the input-adaptive proxy manages to give full continuous BC estimation. The newly developed adaptive proxy also gives generally accurate BC (street canyon: adjR2 = 0.86–0.94; urban background: adjR2 = 0.74–0.91) depending on different seasons and day of the week. Due to its flexibility and reliability, the adaptive proxy can be further extend to estimate other air quality parameters. It can also act as an air quality virtual sensor in support with on-site measurements in the future.

2020 ◽  
Author(s):  
Pak L Fung ◽  
Martha A Zaidan ◽  
Salla Sillanpää ◽  
Anu Kousa ◽  
Jarkko V Niemi ◽  
...  

<p>Urban air pollution has been a global challenge, and continuous air quality measurement is important to understand the nature of the problem. However, missing data has often been an issue in air quality measurement. In this study, we presented a modified method to impute missing data by input-adaptive proxy. We used black carbon (BC) concentration data in Mäkelänkatu traffic site (TR) and Kumpula urban background site (BG) in Helsinki, Finland in 2017–2018 as training sets. The input-adaptive proxy selected input variables of other air quality variables based on their Pearson correlation coefficients with BC. In order to avoid overfitting, this proxy used the algorithm of least squares model with a bisquare weighting function and allowed a maximum of three input variables. The generated models were then evaluated and ranked by adjusted coefficient of determination (adjR<sup>2</sup>), mean absolute error and root mean square error. BC concentration was first estimated by the best model. In case of missing data in the input variables in the best model, the input-adaptive proxy then used the second-best model until all the missing data gaps were filled up.</p><p>The input-adaptive proxy managed to fill up 100% of the missing voids while traditional proxy filled only 20–80% of missing BC data. Furthermore, the overall performance of the input-adaptive proxy is reliable both in TR (adjR<sup>2</sup>=0.86–0.94) and in BG (adjR<sup>2</sup>=0.74–0.91). TR has a generally better regression performance because the level of BC can be mostly explained by traffic count, nitrogen oxides and accumulation mode. On the contrary, the source of BC in BG is more heterogeneous, which includes traffic emission and residential combustion, and the concentration of BC is influenced by meteorological parameters; therefore, the rule of including maximum three input variables might lead to the lower adjR<sup>2</sup>. The proxy works slightly better for workdays scenario than in weekends in both sites. In TR, the proxy works similarly in all seasons, while in BG, the proxy performance is better in winter and autumn than in the other seasons. The simplicity, full coverage and high reliability of the input-adaptive proxy make it sound to further estimate other air quality parameters. Moreover, it can act as an air quality virtual sensor alongside with on-site instruments.</p>


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Lars Gidhagen ◽  
Patricia Krecl ◽  
Admir Créso Targino ◽  
Gabriela Polezer ◽  
Ricardo H. M. Godoi ◽  
...  

AbstractData on airborne fine particle (PM2.5) emissions and concentrations in cities are valuable for traffic and air quality managers, urban planners, health practitioners, researchers, and ultimately for legislators and decision makers. Emissions and ambient concentrations of PM2.5 and black carbon (BC) were assessed in the city of Curitiba, southern Brazil. The methodology combined a month-long monitoring campaign with both fixed and mobile instruments, development of emission inventories, and dispersion model simulations on different scales. The mean urban background PM2.5 concentrations during the campaign were 7.3 μg m−3 in Curitiba city center, but three- to fourfold higher (25.3 μg m-3) in a residential area on the city’s outskirts, indicating the presence of local sources, possibly linked to biomass combustion. BC concentrations seemed to be more uniformly distributed over the city, with mean urban background concentrations around 2 μg m−3, half of which due to local traffic emissions. Higher mean BC concentrations (3–5 μg m-3) were found along busy roads. The dispersion modeling also showed high PM2.5 and BC concentrations along the heavily transited ring road. However, the lack of in situ data over these peripheral areas prevented the verification of the model output. The vehicular emission factors for PM2.5 and BC from the literature were found not to be suitable for Curitiba’s fleet and needed to be adjusted. The integrated approach of this study can be implemented in other cities, as long as an open data policy and a close cooperation among regional, municipal authorities and academia can be achieved.


2020 ◽  
Vol 9 (2) ◽  
pp. 755-763
Author(s):  
Shamihah Muhammad Ghazali ◽  
Norshahida Shaadan ◽  
Zainura Idrus

Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set need to be treated or replaced using imputation method. Thus, exploring missing values pattern must be conducted beforehand to determine a suitable method. This paper discusses on the application of data visualisation as a smart technique for missing data exploration aiming to increase understanding on missing data behaviour which include missing data mechanism (MCAR, MAR and MNAR), distribution pattern of missingness in terms of percentage as well as the gap size. This paper presents the application of several data visualisation tools from five R-packges such as visdat, VIM, ggplot2, Amelia and UpSetR for data missingness exploration.  For an illustration, based on an air quality data set in Malaysia, several graphics were produced and discussed to illustrate the contribution of the visualisation tools in providing input and the insight on the pattern of data missingness. Based on the results, it is shown that missing values in air quality data set of the chosen sites in Malaysia behave as missing at random (MAR) with small percentage of missingness  and do contain long gap size of  missingness.


2021 ◽  
Vol 21 (2) ◽  
pp. 1173-1189
Author(s):  
Krista Luoma ◽  
Jarkko V. Niemi ◽  
Minna Aurela ◽  
Pak Lun Fung ◽  
Aku Helin ◽  
...  

Abstract. In this study, we present results from 12 years of black carbon (BC) measurements at 14 sites around the Helsinki metropolitan area (HMA) and at one background site outside the HMA. The main local sources of BC in the HMA are traffic and residential wood combustion in fireplaces and sauna stoves. All BC measurements were conducted optically, and therefore we refer to the measured BC as equivalent BC (eBC). Measurement stations were located in different environments that represented traffic environment, detached housing area, urban background, and regional background. The measurements of eBC were conducted from 2007 through 2018; however, the times and the lengths of the time series varied at each site. The largest annual mean eBC concentrations were measured at the traffic sites (from 0.67 to 2.64 µg m−3) and the lowest at the regional background sites (from 0.16 to 0.48 µg m−3). The annual mean eBC concentrations at the detached housing and urban background sites varied from 0.64 to 0.80 µg m−3 and from 0.42 to 0.68 µg m−3, respectively. The clearest seasonal variation was observed at the detached housing sites where residential wood combustion increased the eBC concentrations during the cold season. Diurnal variation in eBC concentration in different urban environments depended clearly on the local sources that were traffic and residential wood combustion. The dependency was not as clear for the typically measured air quality parameters, which were here NOx concentration and mass concentration of particles smaller that 2.5 µm in diameter (PM2.5). At four sites which had at least a 4-year-long time series available, the eBC concentrations had statistically significant decreasing trends that varied from −10.4 % yr−1 to −5.9 % yr−1. Compared to trends determined at urban and regional background sites, the absolute trends decreased fastest at traffic sites, especially during the morning rush hour. Relative long-term trends in eBC and NOx were similar, and their concentrations decreased more rapidly than that of PM2.5. The results indicated that especially emissions from traffic have decreased in the HMA during the last decade. This shows that air pollution control, new emission standards, and a newer fleet of vehicles had an effect on air quality.


2016 ◽  
Vol 28 (1) ◽  
pp. 22-42 ◽  
Author(s):  
David Priilaid

Purpose – This paper aims to understand how a fast moving luxury good like whisky is typically positioned within South Africa’s discounted retail environment and how this positioning could be improved. So doing this paper introduces an econometric valuation model to establish the relative efficacy of contending extrinsic cues in the explanation of whisky prices. Design/methodology/approach – An ordinary least squares regression model is developed from a data set of 122 whiskies drawn from the 2014 festive-season catalogues of two large South African discount retailers. In estimating the whisky pricing function, the hedonic contribution of the following input variables is estimated: age in respect of blended whiskies and single premium malts, in-store supply, claims of retail exclusivity, branding, country-of-origin and packaging formats. Findings – Age effects as they relate to single malts, and mass produced grain whiskies offer the greatest explanation of price, while scarcity effects are observed, along with claims of retail exclusivity which are found to reduce product value significantly. Country-of-origin and packaging however have low to negligible effects. Originality/value – To producers and marketers of whisky, these findings offer insight as to which extrinsic factors could be better amplified, modified or excised if the product is to be optimally positioned. Implications are explored.


2013 ◽  
Vol 594-595 ◽  
pp. 889-895 ◽  
Author(s):  
M.N. Noor ◽  
A.S. Yahaya ◽  
N.A. Ramli ◽  
Abdullah Mohd Mustafa Al Bakri

The presence of missing values in statistical survey data is an important issue to deal with. These data usually contained missing values due to many factors such as machine failures, changes in the siting monitors, routine maintenance and human error. Incomplete data set usually cause bias due to differences between observed and unobserved data. Therefore, it is important to ensure that the data analyzed are of high quality. A straightforward approach to deal with this problem is to ignore the missing data and to discard those incomplete cases from the data set. This approach is generally not valid for time-series prediction, in which the value of a system typically depends on the historical time data of the system. One approach that commonly used for the treatment of this missing item is adoption of imputation technique. This paper discusses three interpolation methods that are linear, quadratic and cubic. A total of 8577 observations of PM10 data for a year were used to compare between the three methods when fitting the Gamma distribution. The goodness-of-fit were obtained using three performance indicators that are mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R2). The results shows that the linear interpolation method provides a very good fit to the data.


2019 ◽  
Vol 19 (17) ◽  
pp. 11199-11212 ◽  
Author(s):  
Ana Stojiljkovic ◽  
Mari Kauhaniemi ◽  
Jaakko Kukkonen ◽  
Kaarle Kupiainen ◽  
Ari Karppinen ◽  
...  

Abstract. We have numerically evaluated how effective selected potential measures would be for reducing the impact of road dust on ambient air particulate matter (PM10). The selected measures included a reduction of the use of studded tyres on light-duty vehicles and a reduction of the use of salt or sand for traction control. We have evaluated these measures for a street canyon located in central Helsinki for four years (2007–2009 and 2014). Air quality measurements were conducted in the street canyon for two years, 2009 and 2014. Two road dust emission models, NORTRIP (NOn-exhaust Road TRaffic Induced Particle emissions) and FORE (Forecasting Of Road dust Emissions), were applied in combination with the Operational Street Pollution Model (OSPM), a street canyon dispersion model, to compute the street increments of PM10 (i.e. the fraction of PM10 concentration originating from traffic emissions at the street level) within the street canyon. The predicted concentrations were compared with the air quality measurements. Both road dust emission models reproduced the seasonal variability of the PM10 concentrations fairly well but under-predicted the annual mean values. It was found that the largest reductions of concentrations could potentially be achieved by reducing the fraction of vehicles that use studded tyres. For instance, a 30 % decrease in the number of vehicles using studded tyres would result in an average decrease in the non-exhaust street increment of PM10 from 10 % to 22 %, depending on the model used and the year considered. Modelled contributions of traction sand and salt to the annual mean non-exhaust street increment of PM10 ranged from 4 % to 20 % for the traction sand and from 0.1 % to 4 % for the traction salt. The results presented here can be used to support the development of optimal strategies for reducing high springtime particulate matter concentrations originating from road dust.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ruolan Zeng ◽  
Jiyong Deng ◽  
Limin Dang ◽  
Xinliang Yu

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.


Sign in / Sign up

Export Citation Format

Share Document