An alternative to the goodness of fit

2016 ◽  
Vol 72 (6) ◽  
pp. 696-703 ◽  
Author(s):  
Julian Henn

An alternative measure to the goodness of fit (GoF) is developed and applied to experimental data. The alternative goodness of fit squared (aGoFs) demonstrates that the GoF regularly fails to provide evidence for the presence of systematic errors, because certain requirements are not met. These requirements are briefly discussed. It is shown that in many experimental data sets a correlation between the squared residuals and the variance of observed intensities exists. These correlations corrupt the GoF and lead to artificially reduced values in the GoF and in the numerical value of thewR(F2). Remaining systematic errors in the data sets are veiled by this mechanism. In data sets where these correlations do not appear for the entire data set, they often appear for the decile of largest variances of observed intensities. Additionally, statistical errors for the squared goodness of fit, GoFs, and the aGoFs are developed and applied to experimental data. This measure shows how significantly the GoFs and aGoFs deviate from the ideal value one.

Polymers ◽  
2021 ◽  
Vol 13 (21) ◽  
pp. 3811
Author(s):  
Iosif Sorin Fazakas-Anca ◽  
Arina Modrea ◽  
Sorin Vlase

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.


2017 ◽  
Author(s):  
Alexander P. Browning ◽  
Scott W. McCue ◽  
Rachelle N. Binny ◽  
Michael J. Plank ◽  
Esha T. Shah ◽  
...  

AbstractCollective cell spreading takes place in spatially continuous environments, yet it is often modelled using discrete lattice-based approaches. Here, we use data from a series of cell proliferation assays, with a prostate cancer cell line, to calibrate a spatially continuous individual based model (IBM) of collective cell migration and proliferation. The IBM explicitly accounts for crowding effects by modifying the rate of movement, direction of movement, and the rate of proliferation by accounting for pair-wise interactions. Taking a Bayesian approach we estimate the free parameters in the IBM using rejection sampling on three separate, independent experimental data sets. Since the posterior distributions for each experiment are similar, we perform simulations with parameters sampled from a new posterior distribution generated by combining the three data sets. To explore the predictive power of the calibrated IBM, we forecast the evolution of a fourth experimental data set. Overall, we show how to calibrate a lattice-free IBM to experimental data, and our work highlights the importance of interactions between individuals. Despite great care taken to distribute cells as uniformly as possible experimentally, we find evidence of significant spatial clustering over short distances, suggesting that standard mean-field models could be inappropriate.


2018 ◽  
Vol 233 (9-10) ◽  
pp. 689-694 ◽  
Author(s):  
Julian Henn

Abstract For the evaluation of data sets from dynamic structure crystallography, it may be helpful to predict expected $R = {{{I_{ON}}} \over {{I_{OFF}}}}$ -based agreement factors from the observed intensities and their corresponding standard uncertainties with laser ON and with laser OFF. The predicted R factors serve three purposes: (i) they indicate, which data sets are suitable and promising for further evaluation, (ii) they give a reference R value for the case of absence of systematic errors in the data and (iii) they can be compared to the corresponding predicted F2-based R factors. For point (ii) it is inevitable, that the standard uncertainties from the experiment are adequate, i.e. they should adequately describe the noise in the observed intensities and must not be systematically over- or under estimated for a part of the data or the whole data set. It may be this requirement, which is currently the largest obstacle for further progress in the field of dynamic structure crystallography.


2010 ◽  
Vol 75 (4) ◽  
pp. 483-495 ◽  
Author(s):  
Slavica Eric ◽  
Marko Kalinic ◽  
Aleksandar Popovic ◽  
Halid Makic ◽  
Elvisa Civic ◽  
...  

Aqueous solubility is an important factor influencing several aspects of the pharmacokinetic profile of a drug. Numerous publications present different methodologies for the development of reliable computational models for the prediction of solubility from structure. The quality of such models can be significantly affected by the accuracy of the employed experimental solubility data. In this work, the importance of the accuracy of the experimental solubility data used for model training was investigated. Three data sets were used as training sets - Data Set 1 containing solubility data collected from various literature sources using a few criteria (n = 319), Data Set 2 created by substituting 28 values from Data set 1 with uniformly determined experimental data from one laboratory (n = 319) and Data Set 3 created by including 56 additional components, for which the solubility was also determined under uniform conditions in the same laboratory, in the Data Set 2 (n = 375). The selection of the most significant descriptors was performed by the heuristic method, using one-parameter and multi-parameter analysis. The correlations between the most significant descriptors and solubility were established using multi-linear regression analysis (MLR) for all three investigated data sets. Notable differences were observed between the equations corresponding to different data sets, suggesting that models updated with new experimental data need to be additionally optimized. It was successfully shown that the inclusion of uniform experimental data consistently leads to an improvement in the correlation coefficients. These findings contribute to an emerging consensus that improving the reliability of solubility prediction requires the inclusion of many diverse compounds for which solubility was measured under standardized conditions in the data set.


Author(s):  
Özlem Türkşen ◽  
Suna Ertunç

Beta-glucan (BG) has positive health effects for the mamalians. However, the BG sources have limited content of it. Besides, the production of the BG has stringent procedures with low productivity. Economical production of the BG needs the improvement of the BG production steps. In this study, it is aimed to improve the BG content during the first step of the BG production, microorganism growth step, by obtaining the optimal values of additive materials (EDTA, CaCl2 and Sorbitol). For this purpose, the experimental data sets with replicated response measures (RRM) are obtained at spesific levels of EDTA, CaCl2 and Sorbitol. Fuzzy modeling, a flexible modeling approach, is applied on the experimental data set because of the small sized data set and diffulty of satisfying probabilistic modeling assumptions. The predicted fuzzy function is obtained according to the fuzzy least squares approach. In order to get the optimal values of EDTA, CaCl2 and Sorbitol, the predicted fuzzy function is maximized based on multi-objective optimization (MOO) approach. By using the optimal values of EDTA, CaCl2 and Sorbitol, the uncertainty for predicted BG content is evaluated from the economic perspective.


Author(s):  
Guri Feten ◽  
Trygve Almøy ◽  
Are H. Aastveit

Gene expression microarray experiments generate data sets with multiple missing expression values. In some cases, analysis of gene expression requires a complete matrix as input. Either genes with missing values can be removed, or the missing values can be replaced using prediction. We propose six imputation methods. A comparative study of the methods was performed on data from mice and data from the bacterium Enterococcus faecalis, and a linear mixed model was used to test for differences between the methods. The study showed that different methods' capability to predict is dependent on the data, hence the ideal choice of method and number of components are different for each data set. For data with correlation structure methods based on K-nearest neighbours seemed to be best, while for data without correlation structure using the average of the gene was to be preferred.


Author(s):  
James Simek ◽  
Jed Ludlow ◽  
Phil Tisovec

InLine Inspection (ILI) tools using the magnetic flux leakage (MFL) technique are the most common type used for performing metal loss surveys worldwide. Based upon the very robust and proven magnetic flux leakage technique, these tools have been shown to operate reliably in the extremely harsh environments of transmission pipelines. In addition to metal loss, MFL tools are capable of identifying a broad range of pipeline features. Most MFL surveys to date have used tools employing axially oriented magnetizers, capable of detecting and quantifying many categories of volumetric metal loss features. For certain classes of axially oriented features, MFL tools using axially oriented fields have encountered difficulty in detection and subsequent quantification. To address features in these categories, tools employing circumferential or transversely oriented fields have been designed and placed into service, enabling enhanced detection and sizing for axially oriented features. In most cases, multiple surveys are required, as current tools do not incorporate the ability to collect both data sets concurrently. Applying the magnetic field in an oblique direction will enable detection of axially oriented features and may be used simultaneously with an axially oriented tool. Referencing previous research in adapting circumferential or transverse designs for inline service, the concept of an oblique field magnetizer will be presented. Models developed demonstrating the technique are discussed, shown with experimental data supporting the concept. Efforts involved in the implementation of an oblique magnetizer, including magnetic models for field profiles used to determine magnetizer configurations and sensor locations are presented. Experimental results are provided detailing the response of the system to a full range of metal loss features, supplementing modeling in an effort to determine the effects of variables introduced by magnetic property and velocity induced differences. Included in the experimental data results are extremely narrow axially oriented features, many of which are not detected or identified within the axial data set. Experimental and field verification results for detection accuracies will be described in comparison to an axial field tool.


2015 ◽  
Vol 31 (1) ◽  
pp. 541-564 ◽  
Author(s):  
Clinton M. Wood ◽  
Brady R. Cox

This paper describes two large, high-quality experimental data sets of ground motions collected with locally dense arrays of seismometers deployed on steep mountainous terrain with varying slope angles and topographic features. These data sets were collected in an area of central-eastern Utah that experiences frequent and predictable mining-induced seismicity as a means to study the effects of topography on small-strain seismic ground motions. The data sets are freely available through the George E. Brown, Jr. Network for Earthquake Engineering Simulation data repository ( NEEShub.org ) under the DOI numbers 10.4231/D34M9199S and 10.4231/D3Z31NN4J. This paper documents the data collection efforts and metadata necessary for utilizing the data sets, as well as the availability of supporting data (e.g., high-resolution digital elevation models). The paper offers a brief summary of analyses conducted on the data sets thus far, in addition to ideas about how these data sets may be used in future studies related to topographic effects and mining seismicity.


2010 ◽  
Vol 62 (4) ◽  
pp. 875-882 ◽  
Author(s):  
A. Dembélé ◽  
J.-L. Bertrand-Krajewski ◽  
B. Barillon

Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.


2006 ◽  
Vol 17 (09) ◽  
pp. 1313-1325 ◽  
Author(s):  
NIKITA A. SAKHANENKO ◽  
GEORGE F. LUGER ◽  
HANNA E. MAKARUK ◽  
JOYSREE B. AUBREY ◽  
DAVID B. HOLTKAMP

This paper considers a set of shock physics experiments that investigate how materials respond to the extremes of deformation, pressure, and temperature when exposed to shock waves. Due to the complexity and the cost of these tests, the available experimental data set is often very sparse. A support vector machine (SVM) technique for regression is used for data estimation of velocity measurements from the underlying experiments. Because of good generalization performance, the SVM method successfully interpolates the experimental data. The analysis of the resulting velocity surface provides more information on the physical phenomena of the experiment. Additionally, the estimated data can be used to identify outlier data sets, as well as to increase the understanding of the other data from the experiment.


Sign in / Sign up

Export Citation Format

Share Document