scholarly journals Meteorology-driven variability of air pollution (PM<sub>1</sub>) revealed with explainable machine learning

2021 ◽  
Vol 21 (5) ◽  
pp. 3919-3948
Author(s):  
Roland Stirnberg ◽  
Jan Cermak ◽  
Simone Kotthaus ◽  
Martial Haeffelin ◽  
Hendrik Andersen ◽  
...  

Abstract. Air pollution, in particular high concentrations of particulate matter smaller than 1 µm in diameter (PM1), continues to be a major health problem, and meteorology is known to substantially influence atmospheric PM concentrations. However, the scientific understanding of the ways in which complex interactions of meteorological factors lead to high-pollution episodes is inconclusive. In this study, a novel, data-driven approach based on empirical relationships is used to characterize and better understand the meteorology-driven component of PM1 variability. A tree-based machine learning model is set up to reproduce concentrations of speciated PM1 at a suburban site southwest of Paris, France, using meteorological variables as input features. The model is able to capture the majority of occurring variance of mean afternoon total PM1 concentrations (coefficient of determination (R2) of 0.58), with model performance depending on the individual PM1 species predicted. Based on the models, an isolation and quantification of individual, season-specific meteorological influences for process understanding at the measurement site is achieved using SHapley Additive exPlanation (SHAP) regression values. Model results suggest that winter pollution episodes are often driven by a combination of shallow mixed layer heights (MLHs), low temperatures, low wind speeds, or inflow from northeastern wind directions. Contributions of MLHs to the winter pollution episodes are quantified to be on average ∼5 µg/m3 for MLHs below <500 m a.g.l. Temperatures below freezing initiate formation processes and increase local emissions related to residential heating, amounting to a contribution to predicted PM1 concentrations of as much as ∼9 µg/m3. Northeasterly winds are found to contribute ∼5 µg/m3 to predicted PM1 concentrations (combined effects of u- and v-wind components), by advecting particles from source regions, e.g. central Europe or the Paris region. Meteorological drivers of unusually high PM1 concentrations in summer are temperatures above ∼25 ∘C (contributions of up to ∼2.5 µg/m3), dry spells of several days (maximum contributions of ∼1.5 µg/m3), and wind speeds below ∼2 m/s (maximum contributions of ∼3 µg/m3), which cause a lack of dispersion. High-resolution case studies are conducted showing a large variability of processes that can lead to high-pollution episodes. The identification of these meteorological conditions that increase air pollution could help policy makers to adapt policy measures, issue warnings to the public, or assess the effectiveness of air pollution measures.

2020 ◽  
Author(s):  
Roland Stirnberg ◽  
Jan Cermak ◽  
Simone Kotthaus ◽  
Martial Haeffelin ◽  
Hendrik Andersen ◽  
...  

Abstract. Air pollution, in particular high concentrations of particulate matter smaller than 1 µm in diameter (PM1), continues to be a major health problem, and meteorology is known to substantially contribute to atmospheric PM concentrations. However, the scientific understanding of the complex mechanisms leading to high pollution episodes is inconclusive, as the effects of meteorological variables are not easy to separate and quantify. In this study, a novel, data-driven approach based on empirical relationships is used to characterise the role of meteorology on atmospheric concentrations of PM1. A tree-based machine learning model is set up to reproduce concentrations of speciated PM1 at a suburban site southwest of Paris, France, using meteorological variables as input features. The contributions of each meteorological feature to modeled PM1 concentrations are quantified using SHapley Additive exPlanation (SHAP) regression values. Meteorological contributions to PM1 concentrations are analysed in selected high-resolution case studies, contrasting season-specific processes. Model results suggest that winter pollution episodes are often driven by a combination of shallow mixed layer heights (MLH), low temperatures, low wind speeds or inflow from northeastern wind directions. Contributions of MLHs to the winter pollution episodes are quantified to be on average ~ 5 µg/m³ for MLHs below 500 m agl. Temperatures below freezing initiate formation processes and increase local emissions related to residential heating, amounting to a contribution of as much as ~ 9 µg/m³. Northeasterly winds are found to contribute ~ 5 µg/m³ to total PM1 concentrations (combined effects of u- and v-wind components), by advecting particles from source regions, e.g. central Europe or the Paris region. However, in calm conditions (i.e. wind speeds


2019 ◽  
Vol 9 (19) ◽  
pp. 4069 ◽  
Author(s):  
Huixiang Liu ◽  
Qing Li ◽  
Dongbing Yu ◽  
Yu Gu

Air pollution has become an important environmental issue in recent decades. Forecasts of air quality play an important role in warning people about and controlling air pollution. We used support vector regression (SVR) and random forest regression (RFR) to build regression models for predicting the Air Quality Index (AQI) in Beijing and the nitrogen oxides (NOX) concentration in an Italian city, based on two publicly available datasets. The root-mean-square error (RMSE), correlation coefficient (r), and coefficient of determination (R2) were used to evaluate the performance of the regression models. Experimental results showed that the SVR-based model performed better in the prediction of the AQI (RMSE = 7.666, R2 = 0.9776, and r = 0.9887), and the RFR-based model performed better in the prediction of the NOX concentration (RMSE = 83.6716, R2 = 0.8401, and r = 0.9180). This work also illustrates that combining machine learning with air quality prediction is an efficient and convenient way to solve some related environment problems.


2020 ◽  
Author(s):  
Nicola Bodini ◽  
Mike Optis

Abstract. The extrapolation of wind speeds measured at a meteorological mast to wind turbine hub heights is a key component in a bankable wind farm energy assessment and a significant source of uncertainty. Industry-standard methods for extrapolation include the power law and logarithmic profile. The emergence of machine-learning applications in wind energy has led to several studies demonstrating substantial improvements in vertical extrapolation accuracy in machine-learning methods over these conventional power law and logarithmic profile methods. In all cases, these studies assess relative model performance at a measurement site where, critically, the machine-learning algorithm requires knowledge of the hub-height wind speeds in order to train the model. This prior knowledge provides fundamental advantages to the site-specific machine-learning model over the power law and log profile, which, by contrast, are not highly tuned to hub-height measurements but rather can generalize to any site. Furthermore, there is no practical benefit in applying a machine-learning model at a site where hub-height winds are known; rather, its performance at nearby locations (i.e., across a wind farm site) without hub-height measurements is of most practical interest. To more fairly and practically compare machine-learning-based extrapolation to standard approaches, we implemented a round-robin extrapolation model comparison, in which a random forest machine-learning model is trained and evaluated at different sites and then compared against the power law and logarithmic profile. We consider 20 months of lidar and sonic anemometer data collected at four sites between 50–100 kilometers apart in the central United States. We find that the random forest outperforms the standard extrapolation approaches, especially when incorporating surface measurements as inputs to include the influence of atmospheric stability. When compared at a single site (the traditional comparison approach), the machine-learning improvement in mean absolute error was 28 % and 23 % over the power law and logarithmic profile, respectively. Using the round-robin approach proposed here, this improvement drops to 19 % and 14 %, respectively. These latter values better represent practical model performance, and we conclude that round-robin validation should be the standard for machine-learning-based, wind-speed extrapolation methods.


2021 ◽  
Author(s):  
Mike Optis ◽  
Nicola Bodini ◽  
Mithu Debnath ◽  
Paula Doubrawa

Abstract. Accurate characterization of the offshore wind resource has been hindered by a sparsity of wind speed observations that span offshore wind turbine rotor-swept heights. Although public availability of floating lidar data is increasing, most offshore wind speed observations continue to come from buoy-based and satellite-based near-surface measurements. The aim of this study is to develop and validate novel vertical extrapolation methods that can accurately estimate wind speed time series across rotor-swept heights using these near-surface measurements. We contrast the conventional logarithmic profile against three novel approaches: a logarithmic profile with a long-term stability correction, a single-column model, and a machine-learning model. These models are developed and validated using 1 year of observations from two floating lidars deployed in U.S. Atlantic offshore wind energy areas. We find that the machine-learning model significantly outperforms all other models across all stability regimes, seasons, and times of day. Machine-learning model performance is considerably improved by including the air-sea temperature difference, which provides some accounting for offshore atmospheric stability. Finally, we find no degradation in machine-learning model performance when tested 83 km from its training location, suggesting promising future applications in extrapolating 10-m wind speeds from spatially resolved satellite-based wind atlases.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Emily Kogan ◽  
Erik Sjoeland ◽  
Dejan Milentijevic ◽  
Jennifer H Lin ◽  
Mark Alberts

Introduction: The National Institutes of Health Stroke Scale (NIHSS) scores are often not readily available in structured claims databases. We have previously demonstrated that a machine learning model can be used to determine proxies for NIHSS scores. Our current work focuses on creating a model applicable across different databases to validate our approach and enable further outcome studies. Methods: We identified 1,415 eligible hospital-admitted patients in the Optum® de-identified Integrated Claims-EMR database who were diagnosed with ischemic or hemorrhagic stroke, or a transient ischemic attack and had NIHSS scores in medical notes. These patients were split into a training (N=1,192) set for model development and a hold-out test (N=223) set to evaluate model performance. Furthermore, model performance was externally validated using the 286 eligible stroke patients in IBM’s Claims-EMR database (CED). Potential predictors for stroke severity included relevant procedures, diagnoses, patient demographics, and information about the patient hospital stay. Results: The optimal model, a random forest model, achieved a coefficient of determination (R 2 ) between the actual and predicted NIHSS scores in the hold-out Optum dataset of 0.48 and of 0.42 in the secondary CED dataset. The final model incorporated a total of 47 predictors. The strongest predictors included transient ischemic attack diagnosis, length of hospital stay, critical care procedures, patient age, and hemiplegia diagnosis. Conclusion: This study shows that machine learning can be used to determine proxies for NIHSS scores across different real-world databases. Ultimately, this will enable large claims-based outcome studies involving stroke severity to improve our understanding of how stroke severity affects healthcare utilization, total cost of care, and the financial impact on the larger community.


2021 ◽  
Author(s):  
Paul D Rosero-Montalvo ◽  
Vivian F López-Batista ◽  
Ricardo Arciniega-Rocha ◽  
Diego H Peluffo-Ordóñez

Abstract Air pollution is a current concern of people and government entities. Therefore, in urban scenarios, its monitoring and subsequent analysis is a remarkable and challenging issue due mainly to the variability of polluting-related factors. For this reason, the present work shows the development of a wireless sensor network that, through machine learning techniques, can be classified into three different types of environments: high pollution levels, medium pollution and no noticeable contamination into the Ibarra City. To achieve this goal, signal smoothing stages, prototype selection, feature analysis and a comparison of classification algorithms are performed. As relevant results, there is a classification performance of 95% with a significant noisy data reduction.


2020 ◽  
Author(s):  
Binjie Chen ◽  
Yi Lin ◽  
Jinsong Deng ◽  
Zheyu Li ◽  
Li Dong ◽  
...  

Abstract Background Identifying spatiotemporal characteristics of daily fine particulate matter (PM2.5) concentrations is essential for assessing air quality. Exposure analysis can help understand the environmental health impact on human beings and provide basic information for appropriate decision making. This study aimed to estimate daily PM2.5 concentrations and analyze the resident exposure level in the economically developed Yangtze River Delta (YRD) from 2016–2018. Methods An integrated method incorporating satellite-based aerosol optical depth (AOD), machine learning models and multi-time meteorological parameters were developed. Ten-fold cross validation (CV) was implemented to evaluate the model performance. Results Compared to the models with daily means of meteorological fields, the models with multi-time meteorological parameters had higher CV R2 and lower CV root mean square error (RMSE) values. The model with the best performance achieved sample- (site-) based CV R2 values of 0.88 (0.88) and RMSE values of 10.33 (10.35) µg/m3. The YRD region is seriously polluted (exceeding the World Health Organization (WHO) Interim Targets (IT)-1 standard of 35 µg/m3) during our study period, especially in Jiangsu Province, but with an improving trend. The residents in Zhejiang Province suffered the least from exposure, with 39 days (4% of the total days) characterized as over polluted (daily average > 75 µg/m3) in our study period. Air pollution in Shanghai Municipality mitigated the most from 2016 to 2018. Conclusions With the advantages of high-accuracy and high-resolution (daily and 0.01°×0.01° resolutions), the proposed method can help explore the effect of air pollution to human health spatiotemporally and guide for environmental policy planning.


2021 ◽  
Vol 6 (3) ◽  
pp. 935-948
Author(s):  
Mike Optis ◽  
Nicola Bodini ◽  
Mithu Debnath ◽  
Paula Doubrawa

Abstract. Accurate characterization of the offshore wind resource has been hindered by a sparsity of wind speed observations that span offshore wind turbine rotor-swept heights. Although public availability of floating lidar data is increasing, most offshore wind speed observations continue to come from buoy-based and satellite-based near-surface measurements. The aim of this study is to develop and validate novel vertical extrapolation methods that can accurately estimate wind speed time series across rotor-swept heights using these near-surface measurements. We contrast the conventional logarithmic profile against three novel approaches: a logarithmic profile with a long-term stability correction, a single-column model, and a machine-learning model. These models are developed and validated using 1 year of observations from two floating lidars deployed in US Atlantic offshore wind energy areas. We find that the machine-learning model significantly outperforms all other models across all stability regimes, seasons, and times of day. Machine-learning model performance is considerably improved by including the air–sea temperature difference, which provides some accounting for offshore atmospheric stability. Finally, we find no degradation in machine-learning model performance when tested 83 km from its training location, suggesting promising future applications in extrapolating 10 m wind speeds from spatially resolved satellite-based wind atlases.


Author(s):  
Man Tat Lei ◽  
Joana Monjardino ◽  
Luisa Mendes ◽  
David Gonçalves ◽  
Francisco Ferreira

Statistical methods such as multiple linear regression (MLR) and classification and regression tree (CART) analysis were used to build prediction models for the levels of pollutant concentrations in Macao using meteorological and air quality historical data to three periods: (i) from 2013 to 2016, (ii) from 2015 to 2018, and (iii) from 2013 to 2018. The variables retained by the models were identical for nitrogen dioxide (NO2), particulate matter (PM10), PM2.5, but not for ozone (O3) Air pollution data from 2019 was used for validation purposes. The model for the 2013 to 2018 period was the one that performed best in prediction of the next-day concentrations levels in 2019, with high coefficient of determination (R2), between predicted and observed daily average concentrations (between 0.78 and 0.89 for all pollutants), and low root mean square error (RMSE), mean absolute error (MAE), and biases (BIAS). To understand if the prediction model was robust to extreme variations in pollutants concentration, a test was performed under the circumstances of a high pollution episode for PM2.5 and O3 during 2019, and the low pollution episode during the period of implementation of the preventive measures for COVID-19 pandemic. Regarding the high pollution episode, the period of the Chinese National Holiday of 2019 was selected, in which high concentration levels were identified for PM2.5 and O3, with peaks of daily concentration exceeding 55 μg/m3 and 400 μg/m3, respectively. The 2013 to 2018 model successfully predicted this high pollution episode with high coefficients of determination (of 0.92 for PM2.5 and 0.82 for O3). The low pollution episode for PM2.5 and O3 was identified during the 2020 COVID-19 pandemic period, with a low record of daily concentration for PM2.5 levels at 2 μg/m3 and O3 levels at 50 μg/m3, respectively. The 2013 to 2018 model successfully predicted the low pollution episode for PM2.5 and O3 with a high coefficient of determination (0.86 and 0.84, respectively). Overall, the results demonstrate that the statistical forecast model is robust and able to correctly reproduce extreme air pollution events of both high and low concentration levels.


2020 ◽  
Vol 5 (2) ◽  
pp. 489-501 ◽  
Author(s):  
Nicola Bodini ◽  
Mike Optis

Abstract. The extrapolation of wind speeds measured at a meteorological mast to wind turbine rotor heights is a key component in a bankable wind farm energy assessment and a significant source of uncertainty. Industry-standard methods for extrapolation include the power-law and logarithmic profiles. The emergence of machine-learning applications in wind energy has led to several studies demonstrating substantial improvements in vertical extrapolation accuracy in machine-learning methods over these conventional power-law and logarithmic profile methods. In all cases, these studies assess relative model performance at a measurement site where, critically, the machine-learning algorithm requires knowledge of the rotor-height wind speeds in order to train the model. This prior knowledge provides fundamental advantages to the site-specific machine-learning model over the power-law and log profiles, which, by contrast, are not highly tuned to rotor-height measurements but rather can generalize to any site. Furthermore, there is no practical benefit in applying a machine-learning model at a site where winds at the heights relevant for wind energy production are known; rather, its performance at nearby locations (i.e., across a wind farm site) without rotor-height measurements is of most practical interest. To more fairly and practically compare machine-learning-based extrapolation to standard approaches, we implemented a round-robin extrapolation model comparison, in which a random-forest machine-learning model is trained and evaluated at different sites and then compared against the power-law and logarithmic profiles. We consider 20 months of lidar and sonic anemometer data collected at four sites between 50 and 100 km apart in the central United States. We find that the random forest outperforms the standard extrapolation approaches, especially when incorporating surface measurements as inputs to include the influence of atmospheric stability. When compared at a single site (the traditional comparison approach), the machine-learning improvement in mean absolute error was 28 % and 23 % over the power-law and logarithmic profiles, respectively. Using the round-robin approach proposed here, this improvement drops to 20 % and 14 %, respectively. These latter values better represent practical model performance, and we conclude that round-robin validation should be the standard for machine-learning-based wind speed extrapolation methods.


Sign in / Sign up

Export Citation Format

Share Document