scholarly journals A comparative assessment of the uncertainties of global surface ocean CO<sub>2</sub> estimates using a machine-learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

2019 ◽  
Vol 12 (12) ◽  
pp. 5113-5136 ◽  
Author(s):  
Luke Gregor ◽  
Alice D. Lebehot ◽  
Schalk Kok ◽  
Pedro M. Scheel Monteiro

Abstract. Over the last decade, advanced statistical inference and machine learning have been used to fill the gaps in sparse surface ocean CO2 measurements (Rödenbeck et al., 2015). The estimates from these methods have been used to constrain seasonal, interannual and decadal variability in sea–air CO2 fluxes and the drivers of these changes (Landschützer et al., 2015, 2016; Gregor et al., 2018). However, it is also becoming clear that these methods are converging towards a common bias and root mean square error (RMSE) boundary: “the wall”, which suggests that pCO2 estimates are now limited by both data gaps and scale-sensitive observations. Here, we analyse this problem by introducing a new gap-filling method, an ensemble average of six machine-learning models (CSIR-ML6 version 2019a, Council for Scientific and Industrial Research – Machine Learning ensemble with Six members), where each model is constructed with a two-step clustering-regression approach. The ensemble average is then statistically compared to well-established methods. The ensemble average, CSIR-ML6, has an RMSE of 17.16 µatm and bias of 0.89 µatm when compared to a test dataset kept separate from training procedures. However, when validating our estimates with independent datasets, we find that our method improves only incrementally on other gap-filling methods. We investigate the differences between the methods to understand the extent of the limitations of gap-filling estimates of pCO2. We show that disagreement between methods in the South Atlantic, southeastern Pacific and parts of the Southern Ocean is too large to interpret the interannual variability with confidence. We conclude that improvements in surface ocean pCO2 estimates will likely be incremental with the optimisation of gap-filling methods by (1) the inclusion of additional clustering and regression variables (e.g. eddy kinetic energy), (2) increasing the sampling resolution and (3) successfully incorporating pCO2 estimates from alternate platforms (e.g. floats, gliders) into existing machine-learning approaches.

2019 ◽  
Author(s):  
Luke Gregor ◽  
Alice D. Lebehot ◽  
Schalk Kok ◽  
Pedro M. Scheel Monteiro

Abstract. Over the last decade, advanced statistical inference and machine learning have been used to fill the gaps in sparse surface ocean CO2 measurements (Rödenbeck et al. 2015). The estimates from these methods have been used to constrain seasonal, interannual and decadal variability in sea-air CO2 fluxes and the drivers of these changes (Landschützer et al. 2015, 2016, Gregor et al. 2018). However, it is also becoming clear that these methods are converging towards a common bias and RMSE boundary: the wall, which suggests that pCO2 estimates are now limited by both data gaps and scale-sensitive observations. Here, we analyse this problem by introducing a new gap-filling method, an ensemble of six machine learning models (CSIR-ML6 version 2019a), where each model is constructed with a two-step clustering-regression approach. The ensemble is then statistically compared to well-established methods. The ensemble, CSIR-ML6, has an RMSE of 17.16 µatm and bias of 0.89 µatm when compared to a test-dataset kept separate from training procedures. However, when validating our estimates with independent datasets, we find that our method improves only incrementally on other gap-filling methods. We investigate the differences between the methods to understand the extent of the limitations of gap-filling estimates of pCO2. We show that disagreement between methods in the South Atlantic, southeastern Pacific and parts of the Southern Ocean are too large to interpret the interannual variability with confidence. We conclude that improvements in surface ocean pCO2 estimates will likely be incremental with the optimisation of gap-filling methods by (1) the inclusion of additional clustering and regression variables (e.g. eddy kinetic energy), (2) increasing the sampling resolution. Larger improvements will only be realised with an increase in CO2 observational coverage, particularly in today's poorly sampled areas.


Author(s):  
STEVE D. JONES ◽  
CORINNE LE QUÉRÉ ◽  
CHRISTIAN RÖDENBECK ◽  
ANDREW C. MANNING ◽  
ARE OLSEN

2015 ◽  
Vol 7 (4) ◽  
pp. 1554-1575 ◽  
Author(s):  
Steve D. Jones ◽  
Corinne Le Quéré ◽  
Christian Rödenbeck ◽  
Andrew C. Manning ◽  
Are Olsen

2020 ◽  
Author(s):  
Seulchan Lee ◽  
Hyunho Jeon ◽  
Jongmin Park ◽  
Minha Choi

&lt;p&gt;As the importance of Soil Moisture (SM) has been recognized in various fields, including agricultural practices, natural hazards, and climate predictions, ground-based SM sensors such as Frequency Domain Reflectometry (FDR), Time Domain Reflectometry (TDR) are being widely used. However, gaps in in-situ SM data are still unavoidable due not only to sensor failure or low voltage supply, but to environmental conditions. Since it is essential to acquire accurate and continuous SM data for its application purpose, the gaps in the data should be handled properly. In this study, we propose a physically based gap-filling method in a mountainous region, in which in-situ SM measurements and flux tower are located. This method is developed only with in-situ SM and precipitation data, by considering variation characteristics of SM: increases rapidly with precipitation and decreases asymptotically afterward. SM data from the past is used to build Look-Up-Tables (LUTs) that contains the amount and speed of increment and decrement of SM, with and without precipitation, respectively. Based on the developed LUTs, the gaps are filled successively from where the gaps started. At the same time, we also introduce a machine learning-based gap-filling framework for the comparison. Ancillary data from the flux tower (e.g. net radiation, relative humidity) was used as input for training, with the same period as in the physically based method. The trained models are then used to fill the gaps. We found that both proposed methods are able to fill the gaps of in-situ SM reasonably, with capabilities to capture the characteristics of SM variation. Results from the comparison indicate that the physically based gap-filling method is very accurate and efficient when there&amp;#8217;s limited information, and also suitable to be used for prediction purposes.&lt;/p&gt;


2020 ◽  
Author(s):  
Jake Stamell ◽  
Rea R. Rustagi ◽  
Lucas Gloege ◽  
Galen A. McKinley

Abstract. Using the Large Enemble Testbed, a collection of 100 members from four independent Earth system models, we test three general-purpose Machine Learning (ML) approaches to understand their strengths and weaknesses in statistically reconstructing full-coverage surface ocean pCO2 from sparse in situ data. To apply the Testbed, we sample the full-field model pCO2 as real-world pCO2 collected from 1982–2016 for each ensemble member. We then use ML approaches to reconstruct the full-field and compare with the original model full-field pCO2 to assess reconstruction skill. We use feed forward neural network (NN), XGBoost (XGB), and random forest (RF) approaches to perform the reconstructions. Our baseline is the NN, since this approach has previously been shown to be a successful method for pCO2 reconstruction. The XGB and RF allow us to test tree-based approaches. We perform comparisons to a test set, which consists of 20% of the real-world sampled data that are withheld from training. Statistical comparisons with this test set are equivalent to that which could be derived using real-world data. Unique to the Testbed is that it allows for comparison to all the "unseen" points to which the ML algorithms extrapolate. When compared to the test set, XGB and RF both perform better than NN based on a suite of regression metrics. However, when compared to the unseen data, degradation of performance is large with XGB and even larger with RF. Degradation is comparatively small with NN, indicating a greater ability to generalize. Despite its larger degradation, in the final comparison to unseen data, XGB slightly outperforms NN and greatly outperforms RF, with lowest mean bias and more consistent performance across Testbed members. All three approaches perform best in the open ocean and for seasonal variability, but performance drops off at longer time scales and in regions of low sampling, such as the Southern Ocean and coastal zones. For decadal variability, all methods overestimate the amplitude of variability and have moderate skill in reconstruction of phase. For this timescale, the greater ability of the NN to generalize allows it to slightly outperform XGB. Taking into account all comparisons, we find XGB to be best able to reconstruct surface ocean pCO2 from the limited available data.


2019 ◽  
Vol 70 (3) ◽  
pp. 214-224
Author(s):  
Bui Ngoc Dung ◽  
Manh Dzung Lai ◽  
Tran Vu Hieu ◽  
Nguyen Binh T. H.

Video surveillance is emerging research field of intelligent transport systems. This paper presents some techniques which use machine learning and computer vision in vehicles detection and tracking. Firstly the machine learning approaches using Haar-like features and Ada-Boost algorithm for vehicle detection are presented. Secondly approaches to detect vehicles using the background subtraction method based on Gaussian Mixture Model and to track vehicles using optical flow and multiple Kalman filters were given. The method takes advantages of distinguish and tracking multiple vehicles individually. The experimental results demonstrate high accurately of the method.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Sign in / Sign up

Export Citation Format

Share Document