A machine learning approach to the observation operator for satellite radiance data assimilation

Author(s):  
Jianyu Liang ◽  
Koji Terasaki ◽  
Takemasa Miyoshi

<p>The ‘observation operator’ is essential in data assimilation (DA) to derive the model equivalent of the observations from the model variables. For satellite radiance observations, it is usually based on complex radiative transfer model (RTM) with a bias correction procedure. Therefore, it usually takes time to start using new satellite data after launching the satellites. Here we take advantage of the recent fast development of machine learning (ML) which is good at finding the complex relationships within data. ML can potentially be used as the ‘observation operator’ to reveal the relationships between the model variables and the observations without knowing their physical relationships. In this study, we test with the numerical weather prediction system composed of the Nonhydrostatic Icosahedral Atmospheric Model (NICAM) and the Local Ensemble Transform Kalman Filter (LETKF). We focus on the satellite microwave brightness temperature (BT) from the Advanced Microwave Sounding Unit-A (AMSU-A). Conventional observations and AMSU-A data were assimilated every 6 hours. The reference DA system employed the observation operator based on the RTTOV and an online bias correction method.</p><p>We used this reference system to generate 1-month data to train the machine learning model. Since the reference system includes running a physically-based RTM, we implicitly used the information from RTM for training the ML model in this study, although in our future research we will explore methods without the use of RTM. The machine learning model is artificial neural networks with 5 fully connected layers. The input of the ML model includes the NICAM model variables and predictors for bias correction, and the output of the ML model is the corresponding satellite BT in 3 channels from 5 satellites. Next, we ran the DA cycle for the same month the following year to test the performance of the ML model. Two experiments were conducted. The control experiment (CTRL) was performed with the reference system. In the test experiment (TEST), the ML model was used as the observation operator and there is no separate bias correction procedure since the training includes biased differences between the model and observation. The results showed no significant bias of the simulated BT by the ML model. Using the ECMWF global atmospheric reanalysis (ERA-interim) as a benchmark to evaluate the analysis accuracy, the global-mean RMSE, bias, and ensemble spread for temperature in TEST are 2% higher, 4% higher, and 1% lower respectively than those in CTRL. The result is encouraging since our ML can emulate the RTM. The limitation of our study is that we rely on the physically-based RTM in the reference DA system, which is used for training the ML model. This is the first result and still preliminary. We are currently considering other methods to train the ML model without using the RTM at all.</p>

2019 ◽  
Vol 19 (15) ◽  
pp. 10009-10026 ◽  
Author(s):  
Jianbing Jin ◽  
Hai Xiang Lin ◽  
Arjo Segers ◽  
Yu Xie ◽  
Arnold Heemink

Abstract. Data assimilation algorithms rely on a basic assumption of an unbiased observation error. However, the presence of inconsistent measurements with nontrivial biases or inseparable baselines is unavoidable in practice. Assimilation analysis might diverge from reality since the data assimilation itself cannot distinguish whether the differences between model simulations and observations are due to the biased observations or model deficiencies. Unfortunately, modeling of observation biases or baselines which show strong spatiotemporal variability is a challenging task. In this study, we report how data-driven machine learning can be used to perform observation bias correction for data assimilation through a real application, which is the dust emission inversion using PM10 observations. PM10 observations are considered unbiased; however, a bias correction is necessary if they are used as a proxy for dust during dust storms since they actually represent a sum of dust particles and non-dust aerosols. Two observation bias correction methods have been designed in order to use PM10 measurements as proxy for the dust storm loads under severe dust conditions. The first one is the conventional chemistry transport model (CTM) that simulates life cycles of non-dust aerosols. The other one is the machine-learning model that describes the relations between the regular PM10 and other air quality measurements. The latter is trained by learning using 2 years of historical samples. The machine-learning-based non-dust model is shown to be in better agreement with observations compared to the CTM. The dust emission inversion tests have been performed, through assimilating either the raw measurements or the bias-corrected dust observations using either the CTM or machine-learning model. The emission field, surface dust concentration, and forecast skill are evaluated. The worst case is when we directly assimilate the original observations. The forecasts driven by the a posteriori emission in this case even result in larger errors than the reference prediction. This shows the necessities of bias correction in data assimilation. The best results are obtained when using the machine-learning model for bias correction, with the existing measurements used more precisely and the resulting forecasts close to reality.


2019 ◽  
Author(s):  
Jin Jianbing ◽  
Lin Hai Xiang ◽  
Segers Arjo ◽  
Xie Yu ◽  
Heemink Arnold

Abstract. Data assimilation algorithms rely on a basic assumption of an unbiased observation error. However, the presence of inconsistent measurements with nontrivial biases or inseparable baselines is unavoidable in practice. Assimilation analysis might diverge from reality, since the data assimilation itself cannot distinguish whether the differences between model simulations and observations are due to the biased observations or model deficiencies. Unfortunately, modeling of observation biases or baselines which show strong spatiotemporal variability is a challenging task. In this study, we report how data-driven machine learning can be used to perform observation bias correction for data assimilation through a real application, which is the dust emission inversion using PM10 observations. PM10 observations are considered as unbiased, however, a bias correction is necessary if they are used as a proxy for dust during dust storms since they actually represent a sum of dust particles and non-dust aerosols. Two observation bias correction methods have been designed in order to use PM10 measurements as proxy for the dust storm loads under severe dust conditions. The first one is the conventional chemical transport (CTM) model that simulates life cycles of non-dust aerosols. The other one is the machine learning model that describes the relations between the regular PM10 and other air quality measurement. The latter is trained by learning two-year's historical samples. The machine learning based non-dust model is shown to be in better agreements with observations compared to the CTM. The dust emission inversion tests have been performed, either through assimilating the raw measurements, or the bias-corrected dust observations using either the CTM or machine learning model. The emission field, surface dust concentration and forecast skill are evaluated. The worst case is when we directly assimilate the original observations. The forecasts driven by the posterior emission in this case even results in larger errors than the reference prediction. This shows the necessities of bias correction in data assimilation. The best results are obtained when using a machine learning model for bias correction, with the existing measurements used more precisely and the resulting forecasts close to reality.


Author(s):  
R. Meenal ◽  
Prawin Angel Michael ◽  
D. Pamela ◽  
E. Rajasekaran

The complex numerical climate models pose a big challenge for scientists in weather predictions, especially for tropical system. This paper is focused on presenting the importance of weather prediction using machine learning (ML) technique. Recently many researchers recommended that the machine learning models can produce sensible weather predictions in spite of having no precise knowledge of atmospheric physics. In this work, global solar radiation (GSR) in MJ/m2/day and wind speed in m/s is predicted for Tamil Nadu, India using a random forest ML model. The random forest ML model is validated with measured wind and solar radiation data collected from IMD, Pune. The prediction results based on the random forest ML model are compared with statistical regression models and SVM ML model. Overall, random forest machine learning model has minimum error values of 0.750 MSE and R2 score of 0.97. Compared to regression models and SVM ML model, the prediction results of random forest ML model are more accurate. Thus, this study neglects the need for an expensive measuring instrument in all potential locations to acquire the solar radiation and wind speed data.


2018 ◽  
Author(s):  
Steen Lysgaard ◽  
Paul C. Jennings ◽  
Jens Strabo Hummelshøj ◽  
Thomas Bligaard ◽  
Tejs Vegge

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Author(s):  
Dhaval Patel ◽  
Shrey Shrivastava ◽  
Wesley Gifford ◽  
Stuart Siegel ◽  
Jayant Kalagnanam ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document