Towards an improved label noise proportion estimation in small data: a Bayesian approach

Author(s):  
Jakramate Bootkrajang ◽  
Jeerayut Chaijaruwanich
2019 ◽  
Vol 65 (8) ◽  
pp. 995-1005 ◽  
Author(s):  
Thomas Røraas ◽  
Sverre Sandberg ◽  
Aasne K Aarsand ◽  
Bård Støve

Abstract BACKGROUND Biological variation (BV) data have many applications for diagnosing and monitoring disease. The standard statistical approaches for estimating BV are sensitive to “noisy data” and assume homogeneity of within-participant CV. Prior knowledge about BV is mostly ignored. The aims of this study were to develop Bayesian models to calculate BV that (a) are robust to “noisy data,” (b) allow heterogeneity in the within-participant CVs, and (c) take advantage of prior knowledge. METHOD We explored Bayesian models with different degrees of robustness using adaptive Student t distributions instead of the normal distributions and when the possibility of heterogeneity of the within-participant CV was allowed. Results were compared to more standard approaches using chloride and triglyceride data from the European Biological Variation Study. RESULTS Using the most robust Bayesian approach on a raw data set gave results comparable to a standard approach with outlier assessments and removal. The posterior distribution of the fitted model gives access to credible intervals for all parameters that can be used to assess reliability. Reliable and relevant priors proved valuable for prediction. CONCLUSIONS The recommended Bayesian approach gives a clear picture of the degree of heterogeneity, and the ability to crudely estimate personal within-participant CVs can be used to explore relevant subgroups. Because BV experiments are expensive and time-consuming, prior knowledge and estimates should be considered of high value and applied accordingly. By including reliable prior knowledge, precise estimates are possible even with small data sets.


2020 ◽  
Author(s):  
Laetitia Zmuda ◽  
Charlotte Baey ◽  
Paolo Mairano ◽  
Anahita Basirat

It is well-known that individuals can identify novel words in a stream of an artificial language using statistical dependencies. While underlying computations are thought to be similar from one stream to another (e.g. transitional probabilities between syllables), performance are not similar. According to the “linguistic entrenchment” hypothesis, this would be due to the fact that individuals have some prior knowledge regarding co-occurrences of elements in speech which intervene during verbal statistical learning. The focus of previous studies was on task performance. The goal of the current study is to examine the extent to which prior knowledge impacts metacognition (i.e. ability to evaluate one’s own cognitive processes). Participants were exposed to two different artificial languages. Using a fully Bayesian approach, we estimated an unbiased measure of metacognitive efficiency and compared the two languages in terms of task performance and metacognition. While task performance was higher in one of the languages, the metacognitive efficiency was similar in both languages. In addition, a model assuming no correlation between the two languages better accounted for our results compared to a model where correlations were introduced. We discuss the implications of our findings regarding the computations which underlie the interaction between input and prior knowledge during verbal statistical learning.


Sign in / Sign up

Export Citation Format

Share Document