scholarly journals How many data clusters are in the Galaxy data set?

Author(s):  
Bettina Grün ◽  
Gertraud Malsiner-Walli ◽  
Sylvia Frühwirth-Schnatter

AbstractIn model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin (Stat Model 1:287–304) compares maximum likelihood and Bayesian analyses of the Galaxy data set and expresses reservations about the Bayesian approach due to the fact that the prior assumptions imposed remain rather obscure while playing a major role in the results obtained and conclusions drawn. The aim of the paper is to address Aitkin’s concerns about the Bayesian approach by shedding light on how the specified priors influence the number of estimated clusters. We perform a sensitivity analysis of different prior specifications for the mixtures of finite mixture model, i.e., the mixture model where a prior on the number of components is included. We use an extensive set of different prior specifications in a full factorial design and assess their impact on the estimated number of clusters for the Galaxy data set. Results highlight the interaction effects of the prior specifications and provide insights into which prior specifications are recommended to obtain a sparse clustering solution. A simulation study with artificial data provides further empirical evidence to support the recommendations. A clear understanding of the impact of the prior specifications removes restraints preventing the use of Bayesian methods due to the complexity of selecting suitable priors. Also, the regularizing properties of the priors may be intentionally exploited to obtain a suitable clustering solution meeting prior expectations and needs of the application.

2018 ◽  
Vol 36 (1) ◽  
pp. 1
Author(s):  
Eduardo Campana BARBOSA ◽  
Carlos Henrique Osório SILVA ◽  
Moysés NASCIMENTO ◽  
Fabyano Fonseca e SILVA ◽  
Valéria Paula Rodrigues MINIM ◽  
...  

This paper presents a Bayesian approach to Frequentist Logit Multinomial model used in Choice-based Conjoint Analysis. The analysis was conducted with data of choice for the assessment by 144 consumers, eight samples of light strawberry flavored yogurts, ranging information from three ingredients (sugar, fat and protein) in a full factorial design. the results and inferences obtained by the Bayesian approach are presented in terms of estimating the main effect of the attributes, the choice probabilities and choice ratio. The frequentist results are also reported and discussed. The Bayesian analysis showed similar results to frequentist and allowed the construction of credibility intervals for choice probabilities and choice ratio, allowing statistically compare such quantities. About the practical results, the most likely choice was associated with yoghurt containing strawberry flavor light information "0% sugar", "0% fat" and "bioactive proteins enriched"


2010 ◽  
Vol 39 (2) ◽  
pp. 419-424
Author(s):  
Robson Marcelo Rossi ◽  
Elias Nunes Martins ◽  
Terezinha Aparecida Guedes ◽  
Clóves Cabreira Jobim

This paper shows the Bayesian approach as an alternative to the classical analysis of nonlinear models for ruminal degradation data. The data set was obtained from a Latin square experimental design, established for testing the ruminal degradation of dry matter, crude protein and fiber in neutral detergent of three silages: elephant grass (Pennisetum purpureum Schum) with bacterial inoculant or enzyme-bacterial inoculant and corn silage (Zea mays L.). The incubation times were 0, 2, 6, 12, 24, 48, 72 and 96 hours. The parameter estimates of the equations fitted by both methods showed small differences, but by the Bayesian approach it was possible to compare the estimates correctly, that does not happen with the frequentist methodology because it is much more restricted in the applications due to the demand for a larger number of presuppositions.


2019 ◽  
Author(s):  
Mathew Hardy ◽  
Tom Griffiths

Bayesian models that optimally integrate prior probabilities with observations have successfully explained many aspects of human cognition. Research on decision-making under risk, however, is usually done through laboratory tasks that attempt to remove the effect of prior knowledge on choice. We ran a large online experiment in which risky options paid out according to the distribution of Democratic and Republican voters in US congressional districts to test the effects of manipulating prior probabilities on participants’ choices. We find evidence that people’s risk preferences are appropriately influenced by prior probabilities, and discuss how the study of risky choice can be integrated into the Bayesian approach to studying cognition.


2020 ◽  
Vol 497 (1) ◽  
pp. 210-228
Author(s):  
J Sánchez ◽  
C W Walter ◽  
H Awan ◽  
J Chiang ◽  
S F Daniel ◽  
...  

ABSTRACT Data Challenge 1 (DC1) is the first synthetic data set produced by the Rubin Observatory Legacy Survey of Space and Time (LSST) Dark Energy Science Collaboration (DESC). DC1 is designed to develop and validate data reduction and analysis and to study the impact of systematic effects that will affect the LSST data set. DC1 is comprised of r-band observations of 40 deg2 to 10 yr LSST depth. We present each stage of the simulation and analysis process: (a) generation, by synthesizing sources from cosmological N-body simulations in individual sensor-visit images with different observing conditions; (b) reduction using a development version of the LSST Science Pipelines; and (c) matching to the input cosmological catalogue for validation and testing. We verify that testable LSST requirements pass within the fidelity of DC1. We establish a selection procedure that produces a sufficiently clean extragalactic sample for clustering analyses and we discuss residual sample contamination, including contributions from inefficiency in star–galaxy separation and imperfect deblending. We compute the galaxy power spectrum on the simulated field and conclude that: (i) survey properties have an impact of 50 per cent of the statistical uncertainty for the scales and models used in DC1; (ii) a selection to eliminate artefacts in the catalogues is necessary to avoid biases in the measured clustering; and (iii) the presence of bright objects has a significant impact (2σ–6σ) in the estimated power spectra at small scales (ℓ > 1200), highlighting the impact of blending in studies at small angular scales in LSST.


2020 ◽  
Vol 63 (1) ◽  
pp. 26-40
Author(s):  
Brian T. McCann

Decision making requires managers to constantly estimate the probability of uncertain outcomes and update those estimates in light of new information. This article provides guidance to managers on how they can improve that process by more explicitly adopting a Bayesian approach. Clear understanding and application of the Bayesian approach leads to more accurate probability estimates, resulting in better informed decisions. More importantly, adopting a Bayesian approach, even informally, promises to improve the quality of managerial thinking, analysis, and decisions in a variety of additional ways.


2019 ◽  
Author(s):  
Beatriz Mello ◽  
Qiqing Tao ◽  
Sudhir Kumar

AbstractConcurrent molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics, and species delimitation studies. Multiple sequence alignments used in these investigations frequently consist of both intra- and inter-species samples (mixed samples). As a result, the phylogenetic trees contain inter-species, inter-population, and within population divergences. To date these sequence divergences, Bayesian relaxed clock methods are often employed, but they assume the same tree prior for both inter- and intra-species branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of using the same tree prior on the Bayesian divergence time estimates by analyzing computer-simulated datasets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with Skyline-coalescent tree priors generally produced excellent molecular dates, with some tree priors (e.g., Yule) performing the best when evolutionary rates were autocorrelated, and lineage sorting was incomplete. We compared the performance of the Bayesian approach with a non-Bayesian, the RelTime method, which does not require specification of a tree prior or selection of a clock model. We found that RelTime performed as well as the Bayesian approach, and when the clock model was mis-specified, RelTime performed slightly better. These results suggest that the computationally efficient RelTime approach is also suitable to analyze datasets containing both populations and species variation.


Crisis ◽  
2018 ◽  
Vol 39 (1) ◽  
pp. 27-36 ◽  
Author(s):  
Kuan-Ying Lee ◽  
Chung-Yi Li ◽  
Kun-Chia Chang ◽  
Tsung-Hsueh Lu ◽  
Ying-Yeh Chen

Abstract. Background: We investigated the age at exposure to parental suicide and the risk of subsequent suicide completion in young people. The impact of parental and offspring sex was also examined. Method: Using a cohort study design, we linked Taiwan's Birth Registry (1978–1997) with Taiwan's Death Registry (1985–2009) and identified 40,249 children who had experienced maternal suicide (n = 14,431), paternal suicide (n = 26,887), or the suicide of both parents (n = 281). Each exposed child was matched to 10 children of the same sex and birth year whose parents were still alive. This yielded a total of 398,081 children for our non-exposed cohort. A Cox proportional hazards model was used to compare the suicide risk of the exposed and non-exposed groups. Results: Compared with the non-exposed group, offspring who were exposed to parental suicide were 3.91 times (95% confidence interval [CI] = 3.10–4.92 more likely to die by suicide after adjusting for baseline characteristics. The risk of suicide seemed to be lower in older male offspring (HR = 3.94, 95% CI = 2.57–6.06), but higher in older female offspring (HR = 5.30, 95% CI = 3.05–9.22). Stratified analyses based on parental sex revealed similar patterns as the combined analysis. Limitations: As only register-­based data were used, we were not able to explore the impact of variables not contained in the data set, such as the role of mental illness. Conclusion: Our findings suggest a prominent elevation in the risk of suicide among offspring who lost their parents to suicide. The risk elevation differed according to the sex of the afflicted offspring as well as to their age at exposure.


Sign in / Sign up

Export Citation Format

Share Document