scholarly journals Air Quality Forecasts Improved by Combining Data Assimilation and Machine Learning with Satellite AOD

Author(s):  
Seunghee Lee ◽  
Seohui Park ◽  
Myong‐In Lee ◽  
Ganghan Kim ◽  
Jungho Im ◽  
...  
2019 ◽  
Author(s):  
Julien Brajard ◽  
Alberto Carrassi ◽  
Marc Bocquet ◽  
Laurent Bertino

Abstract. A novel method, based on the combination of data assimilation and machine learning is introduced. The new hybrid approach is designed for a two-fold scope: (i) emulating a hidden, possibly chaotic, dynamics and (ii) predicting its future states. The method applies alternatively a data assimilation step, here an ensemble Kalman filter, and a neural network. Data assimilation is used to combine optimally a surrogate model with sparse noisy data. The resulting analysis is spatially complete and can thus be used as a training set by the neural network to upgrade the surrogate model. The two steps are then repeated iteratively. Numerical experiments have been carried out using the chaotic Lorenz 96, a 40-variables model, proving both convergence and statistical skills. The skill metrics include the short-term forecast skills out to two Lyapunov times, the retrieval of positive Lyapunov exponents and the power density spectrum. The sensitivity of the method to critical setup parameters is also presented: forecast skills decrease smoothly with increased observational noise but drops abruptly if less then half of the model domain is observed. The synergy demonstrated with a low-dimensional system is encouraging for more sophisticated dynamics and motivates further investigation to merge data assimilation and machine learning.


Author(s):  
Julien Brajard ◽  
Alberto Carrassi ◽  
Marc Bocquet ◽  
Laurent Bertino

In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most cases, the ML training leverages high-resolution simulations to provide a dense, noiseless target state. Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data, in the realistic scenario of noisy and sparse observations. The algorithm proposed in this work is a two-step process. First, data assimilation (DA) techniques are applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as a model error in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrization model is added to the physical core truncated model to produce a hybrid model. The DA component of the proposed method relies on an ensemble Kalman filter while the ML parametrization is represented by a neural network. The approach is applied to the two-scale Lorenz model and to MAOOAM, a reduced-order coupled ocean-atmosphere model. We show that in both cases, the hybrid model yields forecasts with better skill than the truncated model. Moreover, the attractor of the system is significantly better represented by the hybrid model than by the truncated model. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.


Atmosphere ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 91
Author(s):  
Santiago Lopez-Restrepo ◽  
Andres Yarce ◽  
Nicolás Pinel ◽  
O.L. Quintero ◽  
Arjo Segers ◽  
...  

The use of low air quality networks has been increasing in recent years to study urban pollution dynamics. Here we show the evaluation of the operational Aburrá Valley’s low-cost network against the official monitoring network. The results show that the PM2.5 low-cost measurements are very close to those observed by the official network. Additionally, the low-cost allows a higher spatial representation of the concentrations across the valley. We integrate low-cost observations with the chemical transport model Long Term Ozone Simulation-European Operational Smog (LOTOS-EUROS) using data assimilation. Two different configurations of the low-cost network were assimilated: using the whole low-cost network (255 sensors), and a high-quality selection using just the sensors with a correlation factor greater than 0.8 with respect to the official network (115 sensors). The official stations were also assimilated to compare the more dense low-cost network’s impact on the model performance. Both simulations assimilating the low-cost model outperform the model without assimilation and assimilating the official network. The capability to issue warnings for pollution events is also improved by assimilating the low-cost network with respect to the other simulations. Finally, the simulation using the high-quality configuration has lower error values than using the complete low-cost network, showing that it is essential to consider the quality and location and not just the total number of sensors. Our results suggest that with the current advance in low-cost sensors, it is possible to improve model performance with low-cost network data assimilation.


2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


Sign in / Sign up

Export Citation Format

Share Document