A comparative assessment of the uncertainties of global surface ocean CO&lt;sub&gt;2&lt;/sub&gt; estimates using a machine-learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

Abstract. Over the last decade, advanced statistical inference and machine learning have been used to fill the gaps in sparse surface ocean CO2 measurements (Rödenbeck et al., 2015). The estimates from these methods have been used to constrain seasonal, interannual and decadal variability in sea–air CO2 fluxes and the drivers of these changes (Landschützer et al., 2015, 2016; Gregor et al., 2018). However, it is also becoming clear that these methods are converging towards a common bias and root mean square error (RMSE) boundary: “the wall”, which suggests that pCO2 estimates are now limited by both data gaps and scale-sensitive observations. Here, we analyse this problem by introducing a new gap-filling method, an ensemble average of six machine-learning models (CSIR-ML6 version 2019a, Council for Scientific and Industrial Research – Machine Learning ensemble with Six members), where each model is constructed with a two-step clustering-regression approach. The ensemble average is then statistically compared to well-established methods. The ensemble average, CSIR-ML6, has an RMSE of 17.16 µatm and bias of 0.89 µatm when compared to a test dataset kept separate from training procedures. However, when validating our estimates with independent datasets, we find that our method improves only incrementally on other gap-filling methods. We investigate the differences between the methods to understand the extent of the limitations of gap-filling estimates of pCO2. We show that disagreement between methods in the South Atlantic, southeastern Pacific and parts of the Southern Ocean is too large to interpret the interannual variability with confidence. We conclude that improvements in surface ocean pCO2 estimates will likely be incremental with the optimisation of gap-filling methods by (1) the inclusion of additional clustering and regression variables (e.g. eddy kinetic energy), (2) increasing the sampling resolution and (3) successfully incorporating pCO2 estimates from alternate platforms (e.g. floats, gliders) into existing machine-learning approaches.

Download Full-text

A comparative assessment of the uncertainties of global surface-ocean CO2 estimates using a machine learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

10.5194/gmd-2019-46 ◽

2019 ◽

Author(s):

Luke Gregor ◽

Alice D. Lebehot ◽

Schalk Kok ◽

Pedro M. Scheel Monteiro

Keyword(s):

Machine Learning ◽

Decadal Variability ◽

Comparative Assessment ◽

Gap Filling ◽

Test Dataset ◽

Surface Ocean ◽

Southeastern Pacific ◽

Data Gaps ◽

Regression Approach ◽

Filling Method

Abstract. Over the last decade, advanced statistical inference and machine learning have been used to fill the gaps in sparse surface ocean CO2 measurements (Rödenbeck et al. 2015). The estimates from these methods have been used to constrain seasonal, interannual and decadal variability in sea-air CO2 fluxes and the drivers of these changes (Landschützer et al. 2015, 2016, Gregor et al. 2018). However, it is also becoming clear that these methods are converging towards a common bias and RMSE boundary: the wall, which suggests that pCO2 estimates are now limited by both data gaps and scale-sensitive observations. Here, we analyse this problem by introducing a new gap-filling method, an ensemble of six machine learning models (CSIR-ML6 version 2019a), where each model is constructed with a two-step clustering-regression approach. The ensemble is then statistically compared to well-established methods. The ensemble, CSIR-ML6, has an RMSE of 17.16 µatm and bias of 0.89 µatm when compared to a test-dataset kept separate from training procedures. However, when validating our estimates with independent datasets, we find that our method improves only incrementally on other gap-filling methods. We investigate the differences between the methods to understand the extent of the limitations of gap-filling estimates of pCO2. We show that disagreement between methods in the South Atlantic, southeastern Pacific and parts of the Southern Ocean are too large to interpret the interannual variability with confidence. We conclude that improvements in surface ocean pCO2 estimates will likely be incremental with the optimisation of gap-filling methods by (1) the inclusion of additional clustering and regression variables (e.g. eddy kinetic energy), (2) increasing the sampling resolution. Larger improvements will only be realised with an increase in CO2 observational coverage, particularly in today's poorly sampled areas.

Download Full-text

chapter 2 A Statistical Gap-Filling Method to Interpolate Global Monthly Surface Ocean Carbon Dioxide Data

Climate Change and the Oceanic Carbon Cycle ◽

10.1201/9781315207490-3 ◽

2017 ◽

pp. 15-62

Author(s):

STEVE D. JONES ◽

CORINNE LE QUÉRÉ ◽

CHRISTIAN RÖDENBECK ◽

ANDREW C. MANNING ◽

ARE OLSEN

Keyword(s):

Carbon Dioxide ◽

Gap Filling ◽

Surface Ocean ◽

Filling Method ◽

Ocean Carbon

Download Full-text

A statistical gap-filling method to interpolate global monthly surface ocean carbon dioxide data

Journal of Advances in Modeling Earth Systems ◽

10.1002/2014ms000416 ◽

2015 ◽

Vol 7 (4) ◽

pp. 1554-1575 ◽

Cited By ~ 10

Author(s):

Steve D. Jones ◽

Corinne Le Quéré ◽

Christian Rödenbeck ◽

Andrew C. Manning ◽

Are Olsen

Keyword(s):

Carbon Dioxide ◽

Gap Filling ◽

Surface Ocean ◽

Filling Method ◽

Ocean Carbon

Download Full-text

Performance of a Physically Based Gap-Filling Technique of in-situ Soil Moisture, in Comparison with Machine Learning

10.5194/egusphere-egu2020-13671 ◽

2020 ◽

Author(s):

Seulchan Lee ◽

Hyunho Jeon ◽

Jongmin Park ◽

Minha Choi

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Low Voltage ◽

Time Domain Reflectometry ◽

Limited Information ◽

Gap Filling ◽

Flux Tower ◽

Physically Based ◽

Filling Method

As the importance of Soil Moisture (SM) has been recognized in various fields, including agricultural practices, natural hazards, and climate predictions, ground-based SM sensors such as Frequency Domain Reflectometry (FDR), Time Domain Reflectometry (TDR) are being widely used. However, gaps in in-situ SM data are still unavoidable due not only to sensor failure or low voltage supply, but to environmental conditions. Since it is essential to acquire accurate and continuous SM data for its application purpose, the gaps in the data should be handled properly. In this study, we propose a physically based gap-filling method in a mountainous region, in which in-situ SM measurements and flux tower are located. This method is developed only with in-situ SM and precipitation data, by considering variation characteristics of SM: increases rapidly with precipitation and decreases asymptotically afterward. SM data from the past is used to build Look-Up-Tables (LUTs) that contains the amount and speed of increment and decrement of SM, with and without precipitation, respectively. Based on the developed LUTs, the gaps are filled successively from where the gaps started. At the same time, we also introduce a machine learning-based gap-filling framework for the comparison. Ancillary data from the flux tower (e.g. net radiation, relative humidity) was used as input for training, with the same period as in the physically based method. The trained models are then used to fill the gaps. We found that both proposed methods are able to fill the gaps of in-situ SM reasonably, with capabilities to capture the characteristics of SM variation. Results from the comparison indicate that the physically based gap-filling method is very accurate and efficient when there&#8217;s limited information, and also suitable to be used for prediction purposes.

Download Full-text

Strengths and weaknesses of three Machine Learning methods for pCO2 interpolation

10.5194/gmd-2020-311 ◽

2020 ◽

Author(s):

Jake Stamell ◽

Rea R. Rustagi ◽

Lucas Gloege ◽

Galen A. McKinley

Keyword(s):

Machine Learning ◽

Real World ◽

Decadal Variability ◽

Sampled Data ◽

Feed Forward Neural Network ◽

Real World Data ◽

Test Set ◽

Full Field ◽

Surface Ocean ◽

Unseen Data

Abstract. Using the Large Enemble Testbed, a collection of 100 members from four independent Earth system models, we test three general-purpose Machine Learning (ML) approaches to understand their strengths and weaknesses in statistically reconstructing full-coverage surface ocean pCO2 from sparse in situ data. To apply the Testbed, we sample the full-field model pCO2 as real-world pCO2 collected from 1982–2016 for each ensemble member. We then use ML approaches to reconstruct the full-field and compare with the original model full-field pCO2 to assess reconstruction skill. We use feed forward neural network (NN), XGBoost (XGB), and random forest (RF) approaches to perform the reconstructions. Our baseline is the NN, since this approach has previously been shown to be a successful method for pCO2 reconstruction. The XGB and RF allow us to test tree-based approaches. We perform comparisons to a test set, which consists of 20% of the real-world sampled data that are withheld from training. Statistical comparisons with this test set are equivalent to that which could be derived using real-world data. Unique to the Testbed is that it allows for comparison to all the "unseen" points to which the ML algorithms extrapolate. When compared to the test set, XGB and RF both perform better than NN based on a suite of regression metrics. However, when compared to the unseen data, degradation of performance is large with XGB and even larger with RF. Degradation is comparatively small with NN, indicating a greater ability to generalize. Despite its larger degradation, in the final comparison to unseen data, XGB slightly outperforms NN and greatly outperforms RF, with lowest mean bias and more consistent performance across Testbed members. All three approaches perform best in the open ocean and for seasonal variability, but performance drops off at longer time scales and in regions of low sampling, such as the Southern Ocean and coastal zones. For decadal variability, all methods overestimate the amplitude of variability and have moderate skill in reconstruction of phase. For this timescale, the greater ability of the NN to generalize allows it to slightly outperform XGB. Taking into account all comparisons, we find XGB to be best able to reconstruct surface ocean pCO2 from the limited available data.

Download Full-text

Review of Gregor et al: A comparative assessment of the uncertainties of global surface ocean CO2 estimates using a machine learning ensemble (CSIRML6 version 2019a) – have we hit the wall? Submitted to GMDD

10.5194/gmd-2019-46-rc1 ◽

2019 ◽

Author(s):

Peter Landschützer

Keyword(s):

Machine Learning ◽

Comparative Assessment ◽

Surface Ocean ◽

Global Surface

Download Full-text

Supplemental Material for Psychometric and Machine Learning Approaches for Diagnostic Assessment and Tests of Individual Classification

Psychological Methods ◽

10.1037/met0000317.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Diagnostic Assessment ◽

Learning Approaches

Download Full-text

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

AISTech2019 Proceedings of the Iron and Steel Technology Conference ◽

10.33313/377/275 ◽

2019 ◽

Author(s):

M. Webler ◽

B. Abdulsalam

Keyword(s):

Machine Learning ◽

Data Sets ◽

Learning Approaches ◽

Metallic Inclusion

Download Full-text

Multiple vehicles detection and tracking for intelligent transport systems using machine learning approaches

Transport and Communication Science Journal ◽

10.25073/tcsj.70.3.7 ◽

2019 ◽

Vol 70 (3) ◽

pp. 214-224

Author(s):

Bui Ngoc Dung ◽

Manh Dzung Lai ◽

Tran Vu Hieu ◽

Nguyen Binh T. H.

Keyword(s):

Machine Learning ◽

Gaussian Mixture ◽

Research Field ◽

Transport Systems ◽

Learning Approaches ◽

Subtraction Method ◽

Intelligent Transport Systems ◽

Intelligent Transport ◽

Detection And Tracking ◽

Multiple Vehicles

Video surveillance is emerging research field of intelligent transport systems. This paper presents some techniques which use machine learning and computer vision in vehicles detection and tracking. Firstly the machine learning approaches using Haar-like features and Ada-Boost algorithm for vehicle detection are presented. Secondly approaches to detect vehicles using the background subtraction method based on Gaussian Mixture Model and to track vehicles using optical flow and multiple Kalman filters were given. The method takes advantages of distinguish and tracking multiple vehicles individually. The experimental results demonstrate high accurately of the method.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

A comparative assessment of the uncertainties of global surface ocean CO<sub>2</sub> estimates using a machine-learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

A comparative assessment of the uncertainties of global surface-ocean CO<sub>2</sub> estimates using a machine learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

chapter 2 A Statistical Gap-Filling Method to Interpolate Global Monthly Surface Ocean Carbon Dioxide Data

A statistical gap-filling method to interpolate global monthly surface ocean carbon dioxide data

Performance of a Physically Based Gap-Filling Technique of in-situ Soil Moisture, in Comparison with Machine Learning

Strengths and weaknesses of three Machine Learning methods for pCO<sub>2</sub> interpolation

Review of Gregor et al: A comparative assessment of the uncertainties of global surface ocean CO2 estimates using a machine learning ensemble (CSIRML6 version 2019a) – have we hit the wall? Submitted to GMDD

Supplemental Material for Psychometric and Machine Learning Approaches for Diagnostic Assessment and Tests of Individual Classification

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

Multiple vehicles detection and tracking for intelligent transport systems using machine learning approaches

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Export Citation Format

A comparative assessment of the uncertainties of global surface ocean CO<sub>2</sub> estimates using a machine-learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

A comparative assessment of the uncertainties of global surface­-ocean CO<sub>2</sub> estimates using a machine learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?

chapter 2 A Statistical Gap-Filling Method to Interpolate Global Monthly Surface Ocean Carbon Dioxide Data

A statistical gap-filling method to interpolate global monthly surface ocean carbon dioxide data

Performance of a Physically Based Gap-Filling Technique of in-situ Soil Moisture, in Comparison with Machine Learning

Strengths and weaknesses of three Machine Learning methods for pCO<sub>2</sub> interpolation

Review of Gregor et al: A comparative assessment of the uncertainties of global surface ocean CO2 estimates using a machine learning ensemble (CSIRML6 version 2019a) – have we hit the wall? Submitted to GMDD

Supplemental Material for Psychometric and Machine Learning Approaches for Diagnostic Assessment and Tests of Individual Classification

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

Multiple vehicles detection and tracking for intelligent transport systems using machine learning approaches

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

A comparative assessment of the uncertainties of global surface-ocean CO<sub>2</sub> estimates using a machine learning ensemble (CSIR-ML6 version 2019a) – have we hit the wall?