scholarly journals Are Multi-Armed Bandits Susceptible to Peeking?

2018 ◽  
Vol 21 (1) ◽  
pp. 95-104
Author(s):  
Markus Loecher

Abstract A standard method to evaluate new features and changes to e.g. Web sites is A/B testing. A common pitfall in performing A/B testing is the habit of looking at a test while it’s running, then stopping early. Due to the implicit multiple testing, the p-value is no longer trustworthy and usually too small. We investigate the claim that Bayesian methods, unlike frequentist tests, are immune to this “peeking” problem. We demonstrate that two regularly used measures, namely posterior probability and value remaining are severely affected by repeated testing. We further show a strong dependence on the prior probability of the parameters of interest.

2015 ◽  
Vol 105 (11) ◽  
pp. 1400-1407 ◽  
Author(s):  
L. V. Madden ◽  
D. A. Shah ◽  
P. D. Esker

The P value (significance level) is possibly the mostly widely used, and also misused, quantity in data analysis. P has been heavily criticized on philosophical and theoretical grounds, especially from a Bayesian perspective. In contrast, a properly interpreted P has been strongly defended as a measure of evidence against the null hypothesis, H0. We discuss the meaning of P and null-hypothesis statistical testing, and present some key arguments concerning their use. P is the probability of observing data as extreme as, or more extreme than, the data actually observed, conditional on H0 being true. However, P is often mistakenly equated with the posterior probability that H0 is true conditional on the data, which can lead to exaggerated claims about the effect of a treatment, experimental factor or interaction. Fortunately, a lower bound for the posterior probability of H0 can be approximated using P and the prior probability that H0 is true. When one is completely uncertain about the truth of H0 before an experiment (i.e., when the prior probability of H0 is 0.5), the posterior probability of H0 is much higher than P, which means that one needs P values lower than typically accepted for statistical significance (e.g., P = 0.05) for strong evidence against H0. When properly interpreted, we support the continued use of P as one component of a data analysis that emphasizes data visualization and estimation of effect sizes (treatment effects).


2021 ◽  
Vol 4 ◽  
Author(s):  
Markus Loecher

The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (Thompson sampling). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.


Author(s):  
Eitel J.M. Lauria

Bayesian methods provide a probabilistic approach to machine learning. The Bayesian framework allows us to make inferences from data using probability models for values we observe and about which we want to draw some hypotheses. Bayes theorem provides the means of calculating the probability of a hypothesis (posterior probability) based on its prior probability, the probability of the observations and the likelihood that the observational data fit the hypothesis.


1975 ◽  
Vol 41 (2) ◽  
pp. 475-478 ◽  
Author(s):  
An-Yen Liu

All the necessary prior probabilities for making a Bayesian estimation of the posterior probability were provided to subjects along with a cover story. The prior probabilities, but not the cover story, were non-systematically varied to provide 12 situations. Bayesian solutions of these probability problems indicated p values ranging from .4 to .9; but subjects estimated a p value around .7 across all conditions. For all conditions except one, the mode estimate was identical to one of the prior probabilities. High degree of similarity found among the dominating prior probability, the median and modal estimates of posterior probability within a condition seems to indicate the existence of specific information effect in probability estimation.


2011 ◽  
Vol 2011 ◽  
pp. 1-5 ◽  
Author(s):  
Thomas Z. Fahidy

Bayesian methods stem from the principle of linking prior probability and conditional probability (likelihood) to posterior probability via Bayes' rule. The posterior probability is an updated (improved) version of the prior probability of an event, through the likelihood of finding empirical evidence if the underlying assumptions (hypothesis) are valid. In the absence of a frequency distribution for the prior probability, Bayesian methods have been found more satisfactory than distribution-based techniques. The paper illustrates the utility of Bayes' rule in the analysis of electrocatalytic reactor performance by means of four numerical examples involving a catalytic oxygen cathode, hydrogen evolution on a synthetic metal, the reliability of a device testing the quality of an electrocatalyst, and the range of Tafel slopes exhibited by an electrocatalyst.


2013 ◽  
Vol 45 (2) ◽  
pp. 79-88 ◽  
Author(s):  
Virginia M. Miller ◽  
Tanya M. Petterson ◽  
Elysia N. Jeavons ◽  
Abhinita S. Lnu ◽  
David N. Rider ◽  
...  

Menopausal hormone treatment (MHT) may limit progression of cardiovascular disease (CVD) but poses a thrombosis risk. To test targeted candidate gene variation for association with subclinical CVD defined by carotid artery intima-media thickness (CIMT) and coronary artery calcification (CAC), 610 women participating in the Kronos Early Estrogen Prevention Study (KEEPS), a clinical trial of MHT to prevent progression of CVD, were genotyped for 13,229 single nucleotide polymorphisms (SNPs) within 764 genes from anticoagulant, procoagulant, fibrinolytic, or innate immunity pathways. According to linear regression, proportion of European ancestry correlated negatively, but age at enrollment and pulse pressure correlated positively with CIMT. Adjusting for these variables, two SNPs, one on chromosome 2 for MAP4K4 gene (rs2236935, β = 0.037, P value = 2.36 × 10−06) and one on chromosome 5 for IL5 gene (rs739318, β = 0.051, P value = 5.02 × 10−05), associated positively with CIMT; two SNPs on chromosome 17 for CCL5 (rs4796119, β = −0.043, P value = 3.59 × 10−05; rs2291299, β = −0.032, P value = 5.59 × 10−05) correlated negatively with CIMT; only rs2236935 remained significant after correcting for multiple testing. Using logistic regression, when we adjusted for waist circumference, two SNPs (rs11465886, IRAK2, chromosome 3, OR = 3.91, P value = 1.10 × 10−04; and rs17751769, SERPINA1, chromosome 14, OR = 1.96, P value = 2.42 × 10−04) associated positively with a CAC score of >0 Agatston unit; one SNP (rs630014, ABO, OR = 0.51, P value = 2.51 × 10−04) associated negatively; none remained significant after correcting for multiple testing. Whether these SNPs associate with CIMT and CAC in women randomized to MHT remains to be determined.


Breast Care ◽  
2016 ◽  
Vol 11 (4) ◽  
pp. 240-246 ◽  
Author(s):  
Ute Berndt ◽  
Bernd Leplow ◽  
Robby Schoenfeld ◽  
Tilmann Lantzsch ◽  
Regina Grosse ◽  
...  

Introduction: It is generally accepted that estrogens play a protective role in cognitive function. Therefore, it can be expected that subtotal estrogen deprivation following aromatase inhibition will alter cognitive performance. Methods: In a cross-sectional study we investigated 80 postmenopausal women with breast cancer. Memory and spatial cognition were compared across 4 treatment groups: tamoxifen only (TAM, n = 22), aromatase inhibitor only (AI, n = 22), TAM followed by AI (‘SWITCH group', n = 15), and patients with local therapy (LT) only (surgery and radiation, n = 21). Duration of the 2 endocrine monotherapy arms prior to the assessment ranged from 1 to 3 years. The ‘SWITCH group' received 2-3 years TAM followed by at least 1 year and at most 3 years of AI. Memory and spatial cognition were investigated as planned comparisons. Investigations of processing speed, attention, executive function, visuoconstruction and self-perception of memory were exploratory. Results: With regard to general memory, AI patients performed significantly worse than the LT group (p = 0.013). Significant differences in verbal memory did not remain significant after p-value correction for multiple testing. We found no significant differences concerning spatial cognition between the groups. Conclusion: AI treatment alone significantly impairs general memory compared to the LT group.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Sarah E Wetzel-Strong ◽  
Shantel M Weinsheimer ◽  
Jeffrey Nelson ◽  
Ludmila Pawlikowska ◽  
Dewi Clark ◽  
...  

Objective: Circulating plasma protein profiling may aid in the identification of cerebrovascular disease signatures. This study aimed to identify circulating angiogenic and inflammatory biomarkers that may serve as biomarkers to differentiate sporadic brain arteriovenous malformation (bAVM) patients from other conditions with brain AVMs, including hereditary hemorrhagic telangiectasia (HHT) patients. Methods: The Quantibody Human Angiogenesis Array 1000 (Raybiotech) is an ELISA multiplex panel that was used to assess the levels of 60 proteins related to angiogenesis and inflammation in heparin plasma samples from 13 sporadic unruptured bAVM patients (69% male, mean age 51 years) and 37 patients with HHT (40% male, mean age 47 years, n=19 (51%) with bAVM). The Quantibody Q-Analyzer tool was used to calculate biomarker concentrations based on the standard curve for each marker and log-transformed marker levels were evaluated for associations between disease states using a multivariable interval regression model adjusted for age, sex, ethnicity and collection site. Statistical significance was based on Bonferroni correction for multiple testing of 60 biomarkers (P< 8.3x10 - 4 ). Results: Circulating levels of two plasma proteins differed significantly between sporadic bAVM and HHT patients: PDGF-BB (P=2.6x10 -4 , PI= 3.37, 95% CI:1.76-6.46) and CCL5 (P=6.0x10 -6 , PI=3.50, 95% CI=2.04-6.03). When considering markers with a nominal p-value of less than 0.01, MMP1 and angiostatin levels also differed between patients with sporadic bAVM and HHT. Markers with nominal p-values less than 0.05 when comparing sporadic brain AVM and HHT patients also included angiostatin, IL2, VEGF, GRO, CXCL16, ITAC, and TGFB3. Among HHT patients, the circulating levels of UPAR and IL6 were elevated in patients with documented bAVMs when considering markers with nominal p-values less than 0.05. Conclusions: This study identified differential expression of two promising plasma biomarkers that differentiate sporadic bAVMs from patients with HHT. Furthermore, this study allowed us to evaluate markers that are associated with the presence of bAVMs in HHT patients, which may offer insight into mechanisms underlying bAVM pathophysiology.


2021 ◽  
Vol 5 ◽  
pp. 205
Author(s):  
Maria C. Magnus ◽  
Diana D. S. Ferreira ◽  
Maria Carolina Borges ◽  
Kate Tilling ◽  
Deborah A. Lawlor ◽  
...  

Background: Several studies have found that women who are overweight or obese have an increased risk of miscarriage. There is also some evidence of associations of other aspects of cardiometabolic health, including blood pressure and lipids, with miscarriage risk, although these have not been examined to the same extent as body-mass index (BMI). Methods: Our objective was to investigate the risk of miscarriage according to pre-pregnancy cardiometabolic health. We examined pre-pregnancy levels of BMI, blood pressure, fasting insulin and metabolites profile at age 18 and risk of miscarriage by age 24. The study included adult female offspring in the Avon Longitudinal Study of Parents and Children with a pregnancy between 18 and 24 years of age (n=434 for BMI and blood pressure; n=265 for metabolites). We used log-binomial regression to calculate adjusted associations between cardiometabolic health measures and miscarriage. Results: The overall risk of miscarriage was 22%.  The adjusted relative risks for miscarriage were 0.96 (95% CI: 0.92-1.00) for BMI (per unit increase), 0.98 (0.96-1.00) for systolic blood pressure, and 1.00 (0.97-1.04) for diastolic blood pressure (per 1 mmHg increase).  Total cholesterol, total lipids and phospholipids in HDL-cholesterol were associated with increased likelihood of miscarriage, but none of the p-values for the metabolites were below the corrected threshold for multiple testing (p-value ≤0.003). Conclusions: Our findings indicate no strong evidence to support a relationship between pre-pregnancy cardiometabolic health and risk of miscarriage in young, healthy women who became pregnant before age 24. Future studies are necessary that are able to evaluate this question in samples with a wider age range.


2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


Sign in / Sign up

Export Citation Format

Share Document