scholarly journals A Bayesian approach to evaluation of soil biogeochemical models

2020 ◽  
Vol 17 (15) ◽  
pp. 4043-4057
Author(s):  
Hua W. Xie ◽  
Adriana L. Romero-Olivares ◽  
Michele Guindani ◽  
Steven D. Allison

Abstract. To make predictions about the carbon cycling consequences of rising global surface temperatures, Earth system scientists rely on mathematical soil biogeochemical models (SBMs). However, it is not clear which models have better predictive accuracy, and a rigorous quantitative approach for comparing and validating the predictions has yet to be established. In this study, we present a Bayesian approach to SBM comparison that can be incorporated into a statistical model selection framework. We compared the fits of linear and nonlinear SBMs to soil respiration data compiled in a recent meta-analysis of soil warming field experiments. Fit quality was quantified using Bayesian goodness-of-fit metrics, including the widely applicable information criterion (WAIC) and leave-one-out cross validation (LOO). We found that the linear model generally outperformed the nonlinear model at fitting the meta-analysis data set. Both WAIC and LOO computed higher overfitting risk and effective numbers of parameters for the nonlinear model compared to the linear model, conditional on the data set. Goodness of fit for both models generally improved when they were initialized with lower and more realistic steady-state soil organic carbon densities. Still, testing whether linear models offer definitively superior predictive performance over nonlinear models on a global scale will require comparisons with additional site-specific data sets of suitable size and dimensionality. Such comparisons can build upon the approach defined in this study to make more rigorous statistical determinations about model accuracy while leveraging emerging data sets, such as those from long-term ecological research experiments.

2020 ◽  
Author(s):  
Hua W. Xie ◽  
Adriana L. Romero-Olivares ◽  
Michele Guindani ◽  
Steven D. Allison

Abstract. To make predictions about the effect of rising global surface temperatures, we rely on mathematical soil biogeochemical models (SBMs). However, it is not clear which models have better predictive accuracy, and a rigorous quantitative approach for comparing and validating the predictions has yet to be established. In this study, we present a Bayesian approach to SBM comparison that can be incorporated into a statistical model selection framework. We compared the fits of a linear and non-linear SBM to soil respiration CO2 flux data compiled in a recent meta-analysis of soil warming field experiments. Fit quality was quantified using two Bayesian goodness-of-fit metrics, the Widely Applicable information criterion (WAIC) and Leave-one-out cross-validation (LOO). We found that the linear model generally out-performed the non-linear model at fitting the meta-analysis data set. Both WAIC and LOO computed a higher overfitting penalty for the non-linear model than the linear model, conditional on the data set. Fits for both models generally improved when they were initialized with lower and more realistic steady state soil organic carbon densities. Testing whether linear models offer definitively superior predictive performance over non-linear models on a global scale will require comparisons with additional site-specific data sets of suitable size and dimensionality. Such comparisons can build upon the approach defined in this study to make more rigorous statistical determinations about model accuracy while leveraging emerging data sets, such as those from long-term ecological research experiments.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


BMJ Open ◽  
2016 ◽  
Vol 6 (10) ◽  
pp. e011784 ◽  
Author(s):  
Anisa Rowhani-Farid ◽  
Adrian G Barnett

ObjectiveTo quantify data sharing trends and data sharing policy compliance at the British Medical Journal (BMJ) by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.DesignObservational study.SettingThe BMJ research archive.Participants160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analysis and systematic reviews.Main outcome measuresPercentages of research articles that indicated the availability of their raw data sets in their data sharing statements, and those that easily made their data sets available on request.Results3 articles contained the data in the article. 50 out of 157 (32%) remaining articles indicated the availability of their data sets. 12 used publicly available data and the remaining 38 were sent email requests to access their data sets. Only 1 publicly available data set could be accessed and only 6 out of 38 shared their data via email. So only 7/157 research articles shared their data sets, 4.5% (95% CI 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the per cent shared was 24% (8% to 47%).ConclusionsDespite the BMJ's strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers spam folders; and that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.


2016 ◽  
Author(s):  
Dorothee C. E. Bakker ◽  
Benjamin Pfeil ◽  
Camilla S. Landa ◽  
Nicolas Metzl ◽  
Kevin M. O'Brien ◽  
...  

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.5 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.4 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "Living Data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014).


2018 ◽  
Author(s):  
Diana E Kornbrot ◽  
George J Georgiou ◽  
Mike Page

Identifying the best framework for two-choice decision-making has been a goal of psychology theory for many decades (Bohil, Szalma, & Hancock, 2015; Macmillan & Creelman, 1991). There are two main candidates: the theory of signal detectability (TSD) (Swets, Tanner Jr, & Birdsall, 1961; Thurstone, 1927) based on a normal distribution/probit function, and the choice-model theory (Link, 1975; Luce, 1959) that uses the logistic distribution/logit function. A probit link function, and hence TSD, was shown to have a better Bayesian Goodness of Fit than the logit function for every one of eighteen diverse psychology data sets (Open-Science-Collaboration, 2015a), conclusions having been obtained using Generalized Linear Mixed Models (Lindstrom & Bates, 1990; Nelder & Wedderburn, 1972) . These findings are important, not only for the psychology of perceptual, cognitive and social decision-making, but for any science that use binary proportions to measure effectiveness, as well as the meta-analysis of such studies.


Author(s):  
Zhiguo Bao ◽  
Shuyu Wang

For hedge funds, return prediction has always been a fundamental and important problem. Usually, a good return prediction model directly determines the performance of a quantitative investment strategy. However, the performance of the model will be influenced by the market-style. Even the models trained through the same data set, their performance is different in different market-styles. Traditional methods hope to train a universal linear or nonlinear model on the data set to cope with different market-styles. However, the linear model has limited fitting ability and is insufficient to deal with hundreds of features in the hedge fund features pool. The nonlinear model has a risk to be over-fitting. Simultaneously, changes in market-style will make certain features valid or invalid, and a traditional linear or nonlinear model is not sufficient to deal with this situation. This thesis proposes a method based on Reinforcement Learning that automatically discriminates market-styles and automatically selects the model that best fits the current market-style from sub-models pre-trained with different categories of features to predict the return of stocks. Compared with the traditional method that training return prediction model directly through the full data sets, the experiment shows that the proposed method has a better performance, which has a higher Sharpe ratio and annualized return.


2018 ◽  
Vol 9 (3) ◽  
pp. 1-15 ◽  
Author(s):  
Michael D'Rosario ◽  
John Zeleznikow

The present article considers the importance of legal system origin in compliance with ‘international soft law,' or normative provisions contained in non-binding texts. The study considers key economic and governance metrics on national acceptance an implementation of the first Basle accord. Employing a data set of 70 countries, the present study considers the role of market forces and bilateral and multi-lateral pressures on implementation of soft law. There is little known about the role of legal system structure-related variables as factors moderating the implementation of multi-lateral agreements and international soft law, such as the 1988 accord. The present study extends upon research within the extant literature by employing a novel estimation method, a neural network modelling technique, with multi-layer perceptron artificial neural network (MPANN). Consistent with earlier studies, the article identifies a significant and positive effect associated with democratic systems and the implementation of the Basle accord. However, extending upon traditional estimation techniques, the study identifies the significance of savings rates and government effectiveness in determining implementation. Notably, the method is able to achieve a superior goodness of fit and predictive accuracy in determining implementation.


2019 ◽  
Vol 65 (8) ◽  
pp. 995-1005 ◽  
Author(s):  
Thomas Røraas ◽  
Sverre Sandberg ◽  
Aasne K Aarsand ◽  
Bård Støve

Abstract BACKGROUND Biological variation (BV) data have many applications for diagnosing and monitoring disease. The standard statistical approaches for estimating BV are sensitive to “noisy data” and assume homogeneity of within-participant CV. Prior knowledge about BV is mostly ignored. The aims of this study were to develop Bayesian models to calculate BV that (a) are robust to “noisy data,” (b) allow heterogeneity in the within-participant CVs, and (c) take advantage of prior knowledge. METHOD We explored Bayesian models with different degrees of robustness using adaptive Student t distributions instead of the normal distributions and when the possibility of heterogeneity of the within-participant CV was allowed. Results were compared to more standard approaches using chloride and triglyceride data from the European Biological Variation Study. RESULTS Using the most robust Bayesian approach on a raw data set gave results comparable to a standard approach with outlier assessments and removal. The posterior distribution of the fitted model gives access to credible intervals for all parameters that can be used to assess reliability. Reliable and relevant priors proved valuable for prediction. CONCLUSIONS The recommended Bayesian approach gives a clear picture of the degree of heterogeneity, and the ability to crudely estimate personal within-participant CVs can be used to explore relevant subgroups. Because BV experiments are expensive and time-consuming, prior knowledge and estimates should be considered of high value and applied accordingly. By including reliable prior knowledge, precise estimates are possible even with small data sets.


mSystems ◽  
2019 ◽  
Vol 4 (6) ◽  
Author(s):  
Adrienne B. Narrowe ◽  
Mikayla A. Borton ◽  
David W. Hoyt ◽  
Garrett J. Smith ◽  
Rebecca A. Daly ◽  
...  

ABSTRACT Wetland soils are one of the largest natural contributors to the emission of methane, a potent greenhouse gas. Currently, microbial contributions to methane emissions from these systems emphasize the roles of acetoclastic and hydrogenotrophic methanogens, while less frequently considering methyl-group substrates (e.g., methanol and methylamines). Here, we integrated laboratory and field experiments to explore the potential for methylotrophic methanogenesis in Old Woman Creek (OWC), a temperate freshwater wetland located in Ohio, USA. We first demonstrated the capacity for methylotrophic methanogenesis in these soils using laboratory soil microcosms amended with trimethylamine. However, subsequent field porewater nuclear magnetic resonance (NMR) analyses to identify methanogenic substrates failed to detect evidence for methylamine compounds in soil porewaters, instead noting the presence of the methylotrophic substrate methanol. Accordingly, our wetland soil-derived metatranscriptomic data indicated that methanol utilization by the Methanomassiliicoccaceae was the likely source of methylotrophic methanogenesis. Methanomassiliicoccaceae relative contributions to mcrA transcripts nearly doubled with depth, accounting for up to 8% of the mcrA transcripts in 25-cm-deep soils. Longitudinal 16S rRNA amplicon and mcrA gene surveys demonstrated that Methanomassiliicoccaceae were stably present over 2 years across lateral and depth gradients in this wetland. Meta-analysis of 16S rRNA sequences similar (>99%) to OWC Methanomassiliicoccaceae in public databases revealed a global distribution, with a high representation in terrestrial soils and sediments. Together, our results demonstrate that methylotrophic methanogenesis likely contributes to methane flux from climatically relevant wetland soils. IMPORTANCE Understanding the sources and controls on microbial methane production from wetland soils is critical to global methane emission predictions, particularly in light of changing climatic conditions. Current biogeochemical models of methanogenesis consider only acetoclastic and hydrogenotrophic sources and exclude methylotrophic methanogenesis, potentially underestimating microbial contributions to methane flux. Our multi-omic results demonstrated that methylotrophic methanogens of the family Methanomassiliicoccaceae were present and active in a freshwater wetland, with metatranscripts indicating that methanol, not methylamines, was the likely substrate under the conditions measured here. However, laboratory experiments indicated the potential for other methanogens to become enriched in response to trimethylamine, revealing the reservoir of methylotrophic methanogenesis potential residing in these soils. Collectively, our approach used coupled field and laboratory investigations to illuminate metabolisms influencing the terrestrial microbial methane cycle, thereby offering direction for increased realism in predictive process-oriented models of methane flux in wetland soils.


2016 ◽  
Vol 72 (6) ◽  
pp. 696-703 ◽  
Author(s):  
Julian Henn

An alternative measure to the goodness of fit (GoF) is developed and applied to experimental data. The alternative goodness of fit squared (aGoFs) demonstrates that the GoF regularly fails to provide evidence for the presence of systematic errors, because certain requirements are not met. These requirements are briefly discussed. It is shown that in many experimental data sets a correlation between the squared residuals and the variance of observed intensities exists. These correlations corrupt the GoF and lead to artificially reduced values in the GoF and in the numerical value of thewR(F2). Remaining systematic errors in the data sets are veiled by this mechanism. In data sets where these correlations do not appear for the entire data set, they often appear for the decile of largest variances of observed intensities. Additionally, statistical errors for the squared goodness of fit, GoFs, and the aGoFs are developed and applied to experimental data. This measure shows how significantly the GoFs and aGoFs deviate from the ideal value one.


Sign in / Sign up

Export Citation Format

Share Document