scholarly journals Information content in data sets: A review of methods for interrogation and model comparison

2018 ◽  
Vol 26 (3) ◽  
pp. 423-452 ◽  
Author(s):  
H. Thomas Banks ◽  
Michele L. Joyner

AbstractIn this review we discuss methodology to ascertain the amount of information in given data sets with respect to determination of model parameters with desired levels of uncertainty. We do this in the context of least squares (ordinary, weighted, iterative reweighted weighted or “generalized”, etc.) based inverse problem formulations. The ideas are illustrated with several examples of interest in the biological and environmental sciences.

2018 ◽  
Vol 612 ◽  
pp. A70 ◽  
Author(s):  
J. Olivares ◽  
E. Moraux ◽  
L. M. Sarro ◽  
H. Bouy ◽  
A. Berihuete ◽  
...  

Context. Membership analyses of the DANCe and Tycho + DANCe data sets provide the largest and least contaminated sample of Pleiades candidate members to date. Aims. We aim at reassessing the different proposals for the number surface density of the Pleiades in the light of the new and most complete list of candidate members, and inferring the parameters of the most adequate model. Methods. We compute the Bayesian evidence and Bayes Factors for variations of the classical radial models. These include elliptical symmetry, and luminosity segregation. As a by-product of the model comparison, we obtain posterior distributions for each set of model parameters. Results. We find that the model comparison results depend on the spatial extent of the region used for the analysis. For a circle of 11.5 parsecs around the cluster centre (the most homogeneous and complete region), we find no compelling reason to abandon King’s model, although the Generalised King model introduced here has slightly better fitting properties. Furthermore, we find strong evidence against radially symmetric models when compared to the elliptic extensions. Finally, we find that including mass segregation in the form of luminosity segregation in the J band is strongly supported in all our models. Conclusions. We have put the question of the projected spatial distribution of the Pleiades cluster on a solid probabilistic framework, and inferred its properties using the most exhaustive and least contaminated list of Pleiades candidate members available to date. Our results suggest however that this sample may still lack about 20% of the expected number of cluster members. Therefore, this study should be revised when the completeness and homogeneity of the data can be extended beyond the 11.5 parsecs limit. Such a study will allow for more precise determination of the Pleiades spatial distribution, its tidal radius, ellipticity, number of objects and total mass.


Geophysics ◽  
2001 ◽  
Vol 66 (1) ◽  
pp. 21-24 ◽  
Author(s):  
Sven Treitel ◽  
Larry Lines

Geophysicists have been working on solutions to the inverse problem since the dawn of our profession. An interpreter infers subsurface properties on the basis of observed data sets, such as seismograms or potential field recordings. A rough model of the process that produces the recorded data resides within the interpreter’s brain; the interpreter then uses this rough mental model to reconstruct subsurface properties from the observed data. In modern parlance, the inference of subsurface properties from observed data is identified with the solution of a so‐called “inverse problem.” In contrast, the “forward problem” consists of the determination of the data that would be recorded for a given subsurface configuration and under the assumption that given laws of physics hold. Until the early 1960s, geophysical inversion was carried out almost exclusively within the geophysicist’s brain. Since then, we have learned to make the geophysical inversion process much more quantitative and versatile by recourse to a growing body of theory, along with the computer power to reduce this theory to practice. We should point out the obvious, however, namely that no theory and no computer algorithm can presumably replace the ultimate arbiter who decides whether the results of an inversion make sense or nonsense: the geophysical interpreter. Perhaps our descendants writing a future third Millennium review article can report that a machine has been solving the inverse problem without a human arbiter. For the time being, however, what might be called “unsupervised geophysical inversion” remains but a dream.


1996 ◽  
Vol 26 (4) ◽  
pp. 590-600 ◽  
Author(s):  
Katherine L. Bolster ◽  
Mary E. Martin ◽  
John D. Aber

Further evaluation of near infrared reflectance spectroscopy as a method for the determination of nitrogen, lignin, and cellulose concentrations in dry, ground, temperate forest woody foliage is presented. A comparison is made between two regression methods, stepwise multiple linear regression and partial least squares regression. The partial least squares method showed consistently lower standard error of calibration and higher R2 values with first and second difference equations. The first difference partial least squares regression equation resulted in standard errors of calibration of 0.106%, with an R2 of 0.97 for nitrogen, 1.613% with an R2 of 0.88 for lignin, and 2.103% with an R2 of 0.89 for cellulose. The four most highly correlated wavelengths in the near infrared region, and the chemical bonds represented, are shown for each constituent and both regression methods. Generalizability of both methods for prediction of protein, lignin, and cellulose concentrations on independent data sets is discussed. Prediction accuracy for independent data sets and species from other sites was increased using partial least squares regression, but was poor for sample sets containing tissue types or laboratory-measured concentration ranges beyond those of the calibration set.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Baiyu Wang

This paper investigates the numerical solution of a class of one-dimensional inverse parabolic problems using the moving least squares approximation; the inverse problem is the determination of an unknown source term depending on time. The collocation method is used for solving the equation; some numerical experiments are presented and discussed to illustrate the stability and high efficiency of the method.


2016 ◽  
Vol 34 (3) ◽  
Author(s):  
Fernando De Oliveira Marin ◽  
Orlando Camargo Rodríguez ◽  
Luiz Gallisa Guimarães ◽  
Carlos Eduardo Parente Ribeiro

ABSTRACT. This paper discusses the estimation of sound speed perturbations in a shallow water waveguide, from measurements of modal travel times. The formulation of the Inverse Problem is reduced to a least squares solution, being highlighted that the choice of discretization of the set of model parameters is of fundamental importance. In the...Keywords: shallow water, tomography, orthogonal functions. RESUMO. Este trabalho aborda a estimativa de perturbações de velocidade do som em um ambiente de águas rasas, a partir de medições de tempo de percurso modal. A formulação do Problema Inverso é reduzida a uma solução por mínimos quadrados, sendo destacado que a escolha da discretização do conjunto de parâmetros do modelo é de fundamental...Palavras-chave: águas rasas, tomografia, funções ortogonais.


1985 ◽  
Vol 39 (3) ◽  
pp. 463-470 ◽  
Author(s):  
Yong-Chien Ling ◽  
Thomas J. Vickers ◽  
Charles K. Mann

A study has been made to compare the effectiveness of thirteen methods of spectroscopic background correction in quantitative measurements. These include digital filters, least-squares fitting, and cross-correlation, as well as peak area and height measurements. Simulated data sets with varying S/N and degrees of background curvature were used. The results were compared with the results of corresponding treatments of Raman spectra of dimethyl sulfone, sulfate, and bisulfate. The range of variation of the simulated sets was greater than was possible with the experimental data, but where conditions were comparable, the agreement between them was good. This supports the conclusion that the simulations were valid. Best results were obtained by a least-squares fit with the use of simple polynomials to generate the background correction. Under the conditions employed, limits of detection were about 80 ppm for dimethyl sulfone and sulfate and 420 ppm for bisulfate.


Mathematics ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 135 ◽  
Author(s):  
Ahmed Z. Afify ◽  
Osama Abdo Mohamed

In this paper, we study a new flexible three-parameter exponential distribution called the extended odd Weibull exponential distribution, which can have constant, decreasing, increasing, bathtub, upside-down bathtub and reversed-J shaped hazard rates, and right-skewed, left-skewed, symmetrical, and reversed-J shaped densities. Some mathematical properties of the proposed distribution are derived. The model parameters are estimated via eight frequentist estimation methods called, the maximum likelihood estimators, least squares and weighted least-squares estimators, maximum product of spacing estimators, Cramér-von Mises estimators, percentiles estimators, and Anderson-Darling and right-tail Anderson-Darling estimators. Extensive simulations are conducted to compare the performance of these estimation methods for small and large samples. Four practical data sets from the fields of medicine, engineering, and reliability are analyzed, proving the usefulness and flexibility of the proposed distribution.


2021 ◽  
Vol 50 (3) ◽  
pp. 77-105
Author(s):  
Devendra Kumar ◽  
Mazen Nassar ◽  
Ahmed Z. Afify ◽  
Sanku Dey

A new continuous four-parameter lifetime distribution is introduced by compounding the distribution of the maximum of a sequence of an independently identically exponentiated Lomax distributed random variables and zero truncated Poisson random variable, defined as the complementary exponentiated Lomax Poisson (CELP) distribution. The new distribution which exhibits decreasing and upside down bathtub shaped density while the distribution has the ability to model lifetime data with decreasing, increasing and upside-down bathtub shaped failure rates. The new distribution has a number of well-known lifetime special sub-models, such as Lomax-zero truncated Poisson distribution, exponentiated Pareto-zero truncated Poisson distribution and Pareto- zero truncated Poisson distribution. A comprehensive account of the mathematical and statistical properties of the new distribution is presented. The model parameters are obtained by the methods of maximum likelihood, least squares, weighted least squares, percentiles, maximum product of spacing and Cram\'er-von-Mises and compared them using Monte Carlo simulation study. We illustrate the performance of the proposed distribution by means of two real data sets and both the data sets show the new distribution is more appropriate as compared to the transmuted Lomax, beta exponentiated Lomax, McDonald Lomax, Kumaraswamy Lomax, Weibull Lomax, Burr X Lomax and Lomax distributions.


2021 ◽  
Vol 20 ◽  
pp. 135-146
Author(s):  
B. Hossieni ◽  
M. Afshari ◽  
M. Alizadeh ◽  
H. Karamikabir

n many applied areas there is a clear need for the extended forms of the well-known distributions.The new distributions are more flexible to model real data that present a high degree of skewness and kurtosis, such that each one solves a particular part of the classical distribution problems. In this paper, a new two-parameter Generalized Odd Gamma distribution, called the (GOGaU) distribution, is introduced and the fitness capability of this model are investigated. Some structural properties of the new distribution are obtained. The different methods including: Maximum likelihood estimators, Bayesian estimators (posterior mean and maximum a posterior), least squares estimators, weighted least squares estimators, Cramér-von-Mises estimators, Anderson-Darling and right tailed Anderson-Darling estimators are discussed to estimate the model parameters. In order to perform the applications, the importance and flexibility of the new model are also illustrated empirically by means of two real data sets. For simulation Stan and JAGS software were utilized in which we have built the GOGaU JAGS module


Author(s):  
Douglas L. Dorset

The quantitative use of electron diffraction intensity data for the determination of crystal structures represents the pioneering achievement in the electron crystallography of organic molecules, an effort largely begun by B. K. Vainshtein and his co-workers. However, despite numerous representative structure analyses yielding results consistent with X-ray determination, this entire effort was viewed with considerable mistrust by many crystallographers. This was no doubt due to the rather high crystallographic R-factors reported for some structures and, more importantly, the failure to convince many skeptics that the measured intensity data were adequate for ab initio structure determinations.We have recently demonstrated the utility of these data sets for structure analyses by direct phase determination based on the probabilistic estimate of three- and four-phase structure invariant sums. Examples include the structure of diketopiperazine using Vainshtein's 3D data, a similar 3D analysis of the room temperature structure of thiourea, and a zonal determination of the urea structure, the latter also based on data collected by the Moscow group.


Sign in / Sign up

Export Citation Format

Share Document