statistical model selection
Recently Published Documents


TOTAL DOCUMENTS

61
(FIVE YEARS 10)

H-INDEX

11
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Daniele Fanelli

Scientists' ability to integrate diverse forms of evidence and evaluate how well they can explain and predict phenomena, in other words, $\textit{to know how much they know}$, struggles to keep pace with technological innovation. Central to the challenge of extracting knowledge from data is the need to develop a metric of knowledge itself. A candidate metric of knowledge, $K$, was recently proposed by the author. This essay further advances and integrates that proposal, by developing a methodology to measure its key variable, symbolized with the Greek letter $\tau$ ("tau"). It will be shown how a $\tau$ can represent the description of any phenomenon, any theory to explain it, and any methodology to study it, allowing the knowledge about that phenomenon to be measured with $K$.To illustrate potential applications, the essay calculates $\tau$ and $K$ values of: logical syllogisms and proofs, mathematical calculations, empirical quantitative knowledge, statistical model selection problems, including how to correct for "forking paths" and "P-hacking" biases, randomised controlled experiments, reproducibility and replicability, qualitative analyses via process tracing, and mixed quantitative and qualitative evidence.Whilst preliminary in many respects, these results suggest that $K$ theory offers a meaningful understanding of knowledge, which makes testable metascientific predictions, and which may be used to analyse and integrate qualitative and quantitative evidence to tackle complex problems.


2021 ◽  
Vol 8 (10) ◽  
Author(s):  
Sean T. Vittadello ◽  
Michael P. H. Stumpf

In many scientific and technological contexts, we have only a poor understanding of the structure and details of appropriate mathematical models. We often, therefore, need to compare different models. With available data we can use formal statistical model selection to compare and contrast the ability of different mathematical models to describe such data. There is, however, a lack of rigorous methods to compare different models a priori . Here, we develop and illustrate two such approaches that allow us to compare model structures in a systematic way by representing models as simplicial complexes. Using well-developed concepts from simplicial algebraic topology, we define a distance between models based on their simplicial representations. Employing persistent homology with a flat filtration provides for alternative representations of the models as persistence intervals, which represent model structure, from which the model distances are also obtained. We then expand on this measure of model distance to study the concept of model equivalence to determine the conceptual similarity of models. We apply our methodology for model comparison to demonstrate an equivalence between a positional-information model and a Turing-pattern model from developmental biology, constituting a novel observation for two classes of models that were previously regarded as unrelated.


2021 ◽  
Vol 15 (1) ◽  
pp. 219-232
Author(s):  
Meysam Mohammadpour ◽  
Hossein Bevrani ◽  
Reza Arabi Belaghi ◽  
◽  
◽  
...  

Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1400
Author(s):  
Kateřina Hlaváčková-Schindler ◽  
Claudia Plant

The heterogeneous graphical Granger model (HGGM) for causal inference among processes with distributions from an exponential family is efficient in scenarios when the number of time observations is much greater than the number of time series, normally by several orders of magnitude. However, in the case of “short” time series, the inference in HGGM often suffers from overestimation. To remedy this, we use the minimum message length principle (MML) to determinate the causal connections in the HGGM. The minimum message length as a Bayesian information-theoretic method for statistical model selection applies Occam’s razor in the following way: even when models are equal in their measure of fit-accuracy to the observed data, the one generating the most concise explanation of data is more likely to be correct. Based on the dispersion coefficient of the target time series and on the initial maximum likelihood estimates of the regression coefficients, we propose a minimum message length criterion to select the subset of causally connected time series with each target time series and derive its form for various exponential distributions. We propose two algorithms—the genetic-type algorithm (HMMLGA) and exHMML to find the subset. We demonstrated the superiority of both algorithms in synthetic experiments with respect to the comparison methods Lingam, HGGM and statistical framework Granger causality (SFGC). In the real data experiments, we used the methods to discriminate between pregnancy and labor phase using electrohysterogram data of Islandic mothers from Physionet databasis. We further analysed the Austrian climatological time measurements and their temporal interactions in rain and sunny days scenarios. In both experiments, the results of HMMLGA had the most realistic interpretation with respect to the comparison methods. We provide our code in Matlab. To our best knowledge, this is the first work using the MML principle for causal inference in HGGM.


Author(s):  
Brian Mandikiana

To shed light on the demand for private tutoring, this paper presents new evidence for the case of Qatar. The household demand for private tutoring is estimated using the double hurdle model using a sample of 1226 participants from the 2012 Qatar Education Survey. Using statistical model selection criterion, the Cragg model is preferred overall to establish the demand for private tutoring in Qatar. The findings show that nationality of parents, mother’s educational background, the grade the student attends, and the type of school attended pose a significant influence on the likelihood of using private tutoring and the amount. These findings suggest that without monitoring, access to high quality education will be unequal. In particular, students from well-off families will benefit the most from additional hours of education and build an advantage that could eventually lead to the creation of an unequal society.


Author(s):  
Jan Sprenger ◽  
Stephan Hartmann

Is simplicity a virtue of a good scientific theory, and are simpler theories more likely to be true or predictively successful? If so, how much should simplicity count vis-à-vis predictive accuracy? We address this question using Bayesian inference, focusing on the context of statistical model selection and an interpretation of simplicity via the degree of freedoms of a model. We rebut claims to prove the epistemic value of simplicity by means of showing its particular role in Bayesian model selection strategies (e.g., the BIC or the MML). Instead, we show that Bayesian inference in the context of model selection is usually done in a philosophically eclectic, instrumental fashion that is more tuned to practical applications than to philosophical foundations. Thus, these techniques cannot justify a particular “appropriate weight of simplicity in model selection”.


2019 ◽  
Author(s):  
Jay I. Myung ◽  
Mark A. Pitt ◽  
Danielle Navarro

Smith and Minda (1998, 2002) argued that the response scaling parameter γ in the exemplar-based generalized context model (GCM) makes the model unnecessarily complex and allows it to mimic the behavior of a prototype model. We evaluated this criticism in two ways. First, we estimated the complexity of the GCM with and without the γ parameter and also compared its complexity to that of a prototype model. Next, we assessed the extent to which the models mimic each other, using two experimental designs (Nosofsky & Zaki, 2002, Experiment 3; Smith & Minda, 1998, Experiment 2), chosen because these designs are thought to differ in the degree to which they can discriminate the models. The results show that γ can increase the complexity of the GCM, but this complexity does not necessarily allow mimicry. Furthermore, if statistical model selection methods such as minimum description length are adopted as the measure of model performance, the models will be highly discriminable, irrespective of design.


Sign in / Sign up

Export Citation Format

Share Document