scholarly journals The sensitivity of pCO<sub>2</sub> reconstructions in the Southern Ocean to sampling scales: a semi-idealized model sampling and reconstruction approach

2022 ◽  
Author(s):  
Laique Merlin Djeutchouang ◽  
Nicolette Chang ◽  
Luke Gregor ◽  
Marcello Vichi ◽  
Pedro Manuel Scheel Monteiro

Abstract. The Southern Ocean is a complex system yet is sparsely sampled in both space and time. These factors raise questions about the confidence in present sampling strategies and associated machine learning (ML) reconstructions. Previous studies have not yielded a clear understanding of the origin of uncertainties and biases for the reconstructions of the partial pressure of carbon dioxide (pCO2) at the surface ocean (pCO2ocean). Here, we examine these questions by investigating the sensitivity of pCO2ocean reconstruction uncertainties and biases to a series of semi-idealized observing system simulation experiments (OSSEs) that simulate spatio-temporal sampling scales of surface ocean pCO2 in ways that are comparable to ocean CO2 observing platforms (Ship, Waveglider, Carbon-float, Saildrone). These experiments sampled a high spatial resolution (±10 km) coupled physical and biogeochemical model (NEMO-PISCES) within a sub-domain representative of the Sub-Antarctic and Polar Frontal Zones in the Southern Ocean. The reconstructions were done using a two-member ensemble approach that consisted of two machine learning (ML) methods, (1) the feed-forward neural network and (2) the gradient boosting machines. With the baseline observations being from the simulated ships mimicking observations from the Surface Ocean CO2 Atlas (SOCAT), we applied to each of the scale-sampling simulation scenarios the two-member ensemble method ML2, to reconstruct the full sub-domain pCO2ocean and assess the reconstruction skill through a statistical comparison of reconstructed pCO2ocean and model domain mean. The analysis shows that uncertainties and biases for pCO2ocean reconstructions are very sensitive to both the spatial and temporal scales of pCO2 sampling in the model domain. The four key findings from our investigation are the following: (1) improving ML-based pCO2 reconstructions in the Southern Ocean requires simultaneous high resolution observations of the meridional and the seasonal cycle (< 3 days) of pCO2ocean; (2) Saildrones stand out as the optimal platforms to simultaneously address these requirements; (3) Wavegliders with hourly/daily resolution in pseudo-mooring mode improve on Carbon-floats (10-day period), which suggests that sampling aliases from the low temporal frequency have a greater negative impact on their uncertainties, biases and reconstruction means; and (4) the present summer seasonal sampling biases in SOCAT data in the Southern Ocean may be behind a significant winter bias in the reconstructed seasonal cycle of pCO2ocean.

2019 ◽  
Vol 116 (40) ◽  
pp. 19887-19893 ◽  
Author(s):  
José Marcio Luna ◽  
Efstathios D. Gennatas ◽  
Lyle H. Ungar ◽  
Eric Eaton ◽  
Eric S. Diffenderfer ◽  
...  

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.


2017 ◽  
Author(s):  
Amanda R. Fay ◽  
Nicole S. Lovenduski ◽  
Galen A. McKinley ◽  
David R. Munro ◽  
Colm Sweeney ◽  
...  

Abstract. The Southern Ocean is highly under-sampled for the purpose of assessing total carbon uptake and its variability. Since this region dominates the mean global ocean sink for anthropogenic carbon, understanding temporal change is critical. Underway measurements of pCO2 collected as part of the Drake Passage Time-series (DPT) program that began in 2002 inform our understanding of seasonally changing air-sea gradients in pCO2, and by inference the carbon flux in this region. Here, we utilize all available pCO2 observations collected in the subpolar Southern Ocean to evaluate how the seasonal cycle, interannual variability, and long-term trends in surface ocean pCO2 in the Drake Passage region compare to that of the broader subpolar Southern Ocean. Our results indicate that the Drake Passage is representative of the broader region in both seasonality and long term pCO2 trends shown through the agreement of timing and amplitude of seasonal cycles as well as trend magnitudes. The high temporal density of sampling by the DPT is critical to constraining estimates of the seasonal cycle of surface pCO2 in this region, as winter data remain sparse in areas outside of the Drake Passage. From 2002–2015, data show that carbon uptake has strengthened with surface ocean pCO2 trends less than the global atmospheric trend in the Drake Passage and the broader subpolar Southern Ocean. Analysis of spatial correlation shows Drake Passage pCO2 to be representative of pCO2 and its variability up to several hundred kilometers upstream of the region. We also compare DPT data from 2016 and early 2017 to contemporaneous pCO2 estimates from autonomous biogeochemical floats deployed as part of the Southern Ocean Carbon and Climate Observations and Modeling project (SOCCOM) so as to highlight the opportunity for evaluating data collected on autonomous observational platforms. Though SOCCOM floats sparsely sample the Drake Passage region for 2016–2017, their pCO2 estimates typically fall within the range of underway observations. Going forward, continuation of the Drake Passage Time-series will reduce uncertainties in Southern Ocean carbon uptake seasonality, variability, and trends, and provide an invaluable independent dataset for post-deployment quality control of sensors on autonomous floats. Together, these datasets will vastly increase our ability to monitor change in the ocean carbon sink.


2018 ◽  
Author(s):  
Yingxu Wu ◽  
Mathis P. Hain ◽  
Matthew P. Humphreys ◽  
Sue Hartman ◽  
Toby Tyrrell

Abstract. Previous work has not led to a clear understanding of the causes of spatial pattern in global surface ocean DIC, which generally increases polewards. Here, we revisit this question by investigating the drivers of observed latitudinal gradients in surface salinity-normalized DIC (nDIC) using the Global Ocean Data Analysis Project Version 2 (GLODAPv2) database. We used the database to test three different hypotheses for the driver producing the observed increase in surface nDIC from low to high latitudes. These are: (1) sea surface temperature, through its effect on the CO2 system equilibrium constants, (2) salinity-related total alkalinity (TA), and (3) high latitude upwelling of DIC- and TA-rich deep waters. We find that temperature and upwelling are the two major drivers. TA effects generally oppose the observed gradient, except where higher values are introduced in upwelled waters. Temperature-driven effects explains the majority of the surface nDIC latitudinal gradient (182 out of 223 μmol kg−1 in the high-latitude Southern Ocean). Upwelling, which has not previously been considered as a major driver, additionally drives a substantial latitudinal gradient. Its immediate impact, prior to any induced air-sea CO2 exchange, is to raise Southern Ocean nDIC by 208 μmol kg−1 above the average low latitude value. However, this immediate effect is transitory. The long-term impact of upwelling (brought about by increasing TA), which would persist even if gas exchange were to return the surface ocean to the same CO2 as without upwelling, is to increase nDIC by 74 μmol kg−1 above the low latitude average.


2018 ◽  
Vol 15 (12) ◽  
pp. 3841-3855 ◽  
Author(s):  
Amanda R. Fay ◽  
Nicole S. Lovenduski ◽  
Galen A. McKinley ◽  
David R. Munro ◽  
Colm Sweeney ◽  
...  

Abstract. The Southern Ocean is highly under-sampled for the purpose of assessing total carbon uptake and its variability. Since this region dominates the mean global ocean sink for anthropogenic carbon, understanding temporal change is critical. Underway measurements of pCO2 collected as part of the Drake Passage Time-series (DPT) program that began in 2002 inform our understanding of seasonally changing air–sea gradients in pCO2, and by inference the carbon flux in this region. Here, we utilize available pCO2 observations to evaluate how the seasonal cycle, interannual variability, and long-term trends in surface ocean pCO2 in the Drake Passage region compare to that of the broader subpolar Southern Ocean. Our results indicate that the Drake Passage is representative of the broader region in both seasonality and long-term pCO2 trends, as evident through the agreement of timing and amplitude of seasonal cycles as well as trend magnitudes both seasonally and annually. The high temporal density of sampling by the DPT is critical to constraining estimates of the seasonal cycle of surface pCO2 in this region, as winter data remain sparse in areas outside of the Drake Passage. An increase in winter data would aid in reduction of uncertainty levels. On average over the period 2002–2016, data show that carbon uptake has strengthened with annual surface ocean pCO2 trends in the Drake Passage and the broader subpolar Southern Ocean less than the global atmospheric trend. Analysis of spatial correlation shows Drake Passage pCO2 to be representative of pCO2 and its variability up to several hundred kilometers away from the region. We also compare DPT data from 2016 and 2017 to contemporaneous pCO2 estimates from autonomous biogeochemical floats deployed as part of the Southern Ocean Carbon and Climate Observations and Modeling project (SOCCOM) so as to highlight the opportunity for evaluating data collected on autonomous observational platforms. Though SOCCOM floats sparsely sample the Drake Passage region for 2016–2017 compared to the Drake Passage Time-series, their pCO2 estimates fall within the range of underway observations given the uncertainty on the estimates. Going forward, continuation of the Drake Passage Time-series will reduce uncertainties in Southern Ocean carbon uptake seasonality, variability, and trends, and provide an invaluable independent dataset for post-deployment assessment of sensors on autonomous floats. Together, these datasets will vastly increase our ability to monitor change in the ocean carbon sink.


2020 ◽  
Vol 12 (3) ◽  
pp. 1030 ◽  
Author(s):  
Sabrina Hempel ◽  
Julian Adolphs ◽  
Niels Landwehr ◽  
David Janke ◽  
Thomas Amon

Environmental protection efforts can only be effective in the long term with a reliable quantification of pollutant gas emissions as a first step to mitigation. Measurement and analysis strategies must permit the accurate extrapolation of emission values. We systematically analyzed the added value of applying modern machine learning methods in the process of monitoring emissions from naturally ventilated livestock buildings to the atmosphere. We considered almost 40 weeks of hourly emission values from a naturally ventilated dairy cattle barn in Northern Germany. We compared model predictions using 27 different scenarios of temporal sampling, multiple measures of model accuracy, and eight different regression approaches. The error of the predicted emission values with the tested measurement protocols was, on average, well below 20%. The sensitivity of the prediction to the selected training dataset was worse for the ordinary multilinear regression. Gradient boosting and random forests provided the most accurate and robust emission value predictions, accompanied by the second-smallest model errors. Most of the highly ranked scenarios involved six measurement periods, while the scenario with the best overall performance was: One measurement period in summer and three in the transition periods, each lasting for 14 days.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Sign in / Sign up

Export Citation Format

Share Document