scholarly journals A Bayesian Approach to Evaluation of Soil Biogeochemical Models

2020 ◽  
Author(s):  
Hua W. Xie ◽  
Adriana L. Romero-Olivares ◽  
Michele Guindani ◽  
Steven D. Allison

Abstract. To make predictions about the effect of rising global surface temperatures, we rely on mathematical soil biogeochemical models (SBMs). However, it is not clear which models have better predictive accuracy, and a rigorous quantitative approach for comparing and validating the predictions has yet to be established. In this study, we present a Bayesian approach to SBM comparison that can be incorporated into a statistical model selection framework. We compared the fits of a linear and non-linear SBM to soil respiration CO2 flux data compiled in a recent meta-analysis of soil warming field experiments. Fit quality was quantified using two Bayesian goodness-of-fit metrics, the Widely Applicable information criterion (WAIC) and Leave-one-out cross-validation (LOO). We found that the linear model generally out-performed the non-linear model at fitting the meta-analysis data set. Both WAIC and LOO computed a higher overfitting penalty for the non-linear model than the linear model, conditional on the data set. Fits for both models generally improved when they were initialized with lower and more realistic steady state soil organic carbon densities. Testing whether linear models offer definitively superior predictive performance over non-linear models on a global scale will require comparisons with additional site-specific data sets of suitable size and dimensionality. Such comparisons can build upon the approach defined in this study to make more rigorous statistical determinations about model accuracy while leveraging emerging data sets, such as those from long-term ecological research experiments.

2020 ◽  
Vol 17 (15) ◽  
pp. 4043-4057
Author(s):  
Hua W. Xie ◽  
Adriana L. Romero-Olivares ◽  
Michele Guindani ◽  
Steven D. Allison

Abstract. To make predictions about the carbon cycling consequences of rising global surface temperatures, Earth system scientists rely on mathematical soil biogeochemical models (SBMs). However, it is not clear which models have better predictive accuracy, and a rigorous quantitative approach for comparing and validating the predictions has yet to be established. In this study, we present a Bayesian approach to SBM comparison that can be incorporated into a statistical model selection framework. We compared the fits of linear and nonlinear SBMs to soil respiration data compiled in a recent meta-analysis of soil warming field experiments. Fit quality was quantified using Bayesian goodness-of-fit metrics, including the widely applicable information criterion (WAIC) and leave-one-out cross validation (LOO). We found that the linear model generally outperformed the nonlinear model at fitting the meta-analysis data set. Both WAIC and LOO computed higher overfitting risk and effective numbers of parameters for the nonlinear model compared to the linear model, conditional on the data set. Goodness of fit for both models generally improved when they were initialized with lower and more realistic steady-state soil organic carbon densities. Still, testing whether linear models offer definitively superior predictive performance over nonlinear models on a global scale will require comparisons with additional site-specific data sets of suitable size and dimensionality. Such comparisons can build upon the approach defined in this study to make more rigorous statistical determinations about model accuracy while leveraging emerging data sets, such as those from long-term ecological research experiments.


2006 ◽  
Vol 39 (2) ◽  
pp. 262-266 ◽  
Author(s):  
R. J. Davies

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.


2020 ◽  
Vol 24 (6 Part A) ◽  
pp. 3795-3806
Author(s):  
Predrag Zivkovic ◽  
Mladen Tomic ◽  
Vukman Bakic

Wind power assessment in complex terrain is a very demanding task. Modeling wind conditions with standard linear models does not sufficiently reproduce wind conditions in complex terrains, especially on leeward sides of terrain slopes, primarily due to the vorticity. A more complex non-linear model, based on Reynolds averaged Navier-Stokes equations has been used. Turbulence was modeled by modified two-equations k-? model for neutral atmospheric boundary-layer conditions, written in general curvelinear non-orthogonal co-ordinate system. The full set of mass and momentum conservation equations as well as turbulence model equations are numerically solved, using the as CFD technique. A comparison of the application of linear model and non-linear model is presented. Considerable discrepancies of estimated wind speed have been obtained using linear and non-linear models. Statistics of annual electricity production vary up to 30% of the model site. Even anemometer measurements directly at a wind turbine?s site do not necessarily deliver the results needed for prediction calculations, as extrapolations of wind speed to hub height is tricky. The results of the simulation are compared by means of the turbine type, quality and quantity of the wind data and capacity factor. Finally, the comparison of the estimated results with the measured data at 10, 30, and 50 m is shown.


2018 ◽  
Vol 49 (6) ◽  
pp. 1788-1803 ◽  
Author(s):  
Mohammad Ebrahim Banihabib ◽  
Arezoo Ahmadian ◽  
Mohammad Valipour

Abstract In this study, to reflect the effect of large-scale climate signals on runoff, these indices are accompanied with rainfall (the most effective local factor in runoff) as the inputs of the hybrid model. Where one-year in advance forecasting of reservoir inflows can provide data to have an optimal reservoir operation, reports show we still need more accurate models which include all effective parameters to have more forecasting accuracy than traditional linear models (ARMA and ARIMA). Thus, hybridization of models was employed for improving the accuracy of flow forecasting. Moreover, various forecasters including large-scale climate signals were tested to promote forecasting. This paper focuses on testing MARMA-NARX hybrid model to enhance the accuracy of monthly inflow forecasts. Since the inflow in different periods of the year has in linear and non-linear trends, the hybrid model is proposed as a means of combining linear model, monthly autoregressive moving average (MARMA), and non-linear model, nonlinear autoregressive model with exogenous (NARX) inputs to upgrade the accuracy of flow forecasting. The results of the study showed enhanced forecasting accuracy through using the hybrid model.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


Author(s):  
Jabbar Ali Zakeri ◽  
Mosab Reza Tajalli

Existence of short wave length irregularities and discontinuities in the rail, such as corrugation, isolated rail joints, crossings and rail breakage, result in impact forces and an increase in wheel-rail contact force. Extreme forces in such could result in non-linear behavior of ballast and pads, and as a result, employing common linear models mihgt over/under estimate contact forces. A 3D model of wheel and rail is developed in this paper, and by considering rail breakage, validity of linear models and considering non-linear behavior of materials are studied. Wheel-rail interactions are studied for two common pads with high stiffness (HDPE) and low stiffness (Studded) for speeds of 20 to 160 km/h. Three behavioral patterns are considered for the developed 3D model: linear pad and ballast (LP-LB), nonlinear pad and linear ballast (NLP, LB), and nonlinear pad and ballast (NLP, NLB), and results are compared. According to the results, for HDPE pads and impact forces of up to 30 tons, linear model for material could estimate acceptable results. Yet for studded pads, linear model estimates forces that are comparably less than those estimated by non-linear model. Moreover employing NLP-LB model overestimates pad and wheel-rail contact forces by a rather small margin, compared to those estimated by NLP-NLB model, and hence, could be a suitable replacement for it. It is also observed that in order to have a reliable estimate of ballast forces, using non-linear ballast models are mandatory, and neither LP-LB nor NLP-LB could be acceptable replacements.


BMJ Open ◽  
2016 ◽  
Vol 6 (10) ◽  
pp. e011784 ◽  
Author(s):  
Anisa Rowhani-Farid ◽  
Adrian G Barnett

ObjectiveTo quantify data sharing trends and data sharing policy compliance at the British Medical Journal (BMJ) by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.DesignObservational study.SettingThe BMJ research archive.Participants160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analysis and systematic reviews.Main outcome measuresPercentages of research articles that indicated the availability of their raw data sets in their data sharing statements, and those that easily made their data sets available on request.Results3 articles contained the data in the article. 50 out of 157 (32%) remaining articles indicated the availability of their data sets. 12 used publicly available data and the remaining 38 were sent email requests to access their data sets. Only 1 publicly available data set could be accessed and only 6 out of 38 shared their data via email. So only 7/157 research articles shared their data sets, 4.5% (95% CI 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the per cent shared was 24% (8% to 47%).ConclusionsDespite the BMJ's strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers spam folders; and that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.


Author(s):  
Vidyullatha P ◽  
D. Rajeswara Rao

<p>Curve fitting is one of the procedures in data analysis and is helpful for prediction analysis showing graphically how the data points are related to one another whether it is in linear or non-linear model. Usually, the curve fit will find the concentrates along the curve or it will just use to smooth the data and upgrade the presence of the plot. Curve fitting checks the relationship between independent variables and dependent variables with the objective of characterizing a good fit model. Curve fitting finds mathematical equation that best fits given information. In this paper, 150 unorganized data points of environmental variables are used to develop Linear and non-linear data modelling which are evaluated by utilizing 3 dimensional ‘Sftool’ and ‘Labfit’ machine learning techniques. In Linear model, the best estimations of the coefficients are realized by the estimation of R- square turns in to one and in Non-Linear models with least Chi-square are the criteria. </p>


2005 ◽  
Vol 30 (4) ◽  
pp. 369-396 ◽  
Author(s):  
Eisuke Segawa

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.


Kybernetes ◽  
2015 ◽  
Vol 44 (5) ◽  
pp. 788-805 ◽  
Author(s):  
Francisco Javier Rondan-Cataluña ◽  
Jorge Arenas-Gaitán ◽  
Patricio Esteban Ramírez-Correa

Purpose – The purpose of this paper is to provide a complete and chronological view of the evolution of the main acceptance and use of technology models, from the 1970s to the present day. Design/methodology/approach – A comparison of partial least squares (linear model) and WarpPLS (non-linear model) has been run for each acceptation of technology model: TRA, TAM0, TAM1, TAM2, TAM3, UTAUT, UTAUT2. The data set collects the information of mobile internet users. Findings – The authors have concluded that UTAUT2 model obtains a better explanation power than the rest of technology acceptance models (TAMs) in the sample of mobile internet users. Furthermore, all models have a better explanation power using non-linear relationships than the traditional linear approach. Originality/value – The vast majority of research published to date with regard to the Theory of Reasoned Action (TRA), the Technology Acceptance Model (TAM), and the Unified Theory of Acceptance and Use of Technology (UTAUT) are based on structural equation models assuming linear relationships between variables. The originality of this study is that it incorporates non-linear relationships and compares the same models using both approaches.


Sign in / Sign up

Export Citation Format

Share Document