scholarly journals Reasonable Doubt: Experimental Detection of Job‐Level Employment Discrimination

Econometrica ◽  
2021 ◽  
Vol 89 (2) ◽  
pp. 765-792
Author(s):  
Patrick Kline ◽  
Christopher Walters

This paper develops methods for detecting discrimination by individual employers using correspondence experiments that send fictitious resumes to real job openings. We establish identification of higher moments of the distribution of job‐level callback rates as a function of the number of resumes sent to each job and propose shape‐constrained estimators of these moments. Applying our methods to three experimental data sets, we find striking job‐level heterogeneity in the extent to which callback probabilities differ by race or sex. Estimates of higher moments reveal that while most jobs barely discriminate, a few discriminate heavily. These moment estimates are then used to bound the share of jobs that discriminate and the posterior probability that each individual job is engaged in discrimination. In a recent experiment manipulating racially distinctive names, we find that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in discrimination. To assess the potential value of our methods for regulators, we consider the accuracy of decision rules for investigating suspicious callback behavior in various experimental designs under a simple two‐type model that rationalizes the experimental data. Though we estimate that only 17% of employers discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable detection of 7–10% of discriminatory jobs while yielding Type I error rates below 0.2%. A minimax decision rule acknowledging partial identification of the distribution of callback rates yields only slightly fewer investigations than a Bayes decision rule based on the two‐type model. These findings suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing correspondence designs.

2017 ◽  
Vol 284 (1851) ◽  
pp. 20161850 ◽  
Author(s):  
Nick Colegrave ◽  
Graeme D. Ruxton

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.


2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.


2021 ◽  
pp. 001316442199489
Author(s):  
Luyao Peng ◽  
Sandip Sinharay

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.


2001 ◽  
Vol 26 (1) ◽  
pp. 105-132 ◽  
Author(s):  
Douglas A. Powell ◽  
William D. Schafer

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.


2021 ◽  
Vol 34 (1) ◽  
pp. 79-88
Author(s):  
Dean Radin ◽  
Helané Wahbeh ◽  
Leena Michel ◽  
Arnaud Delorme

An experiment we conducted from 2012 to 2013, which had not been previously reported, was designed to explore possible psychophysical effects resulting from the interaction of a human mind with a quantum system. Participants focused their attention toward or away from the slits in a double-slit optical system to see if the interference pattern would be affected. Data were collected from 25 people in individual half-hour sessions; each person repeated the test ten times for a total of 250 planned sessions. “Sham” sessions designed to mimic the experimental sessions without observers present were run immediately before and after as controls. Based on the planned analysis, no evidence for a psychophysical effect was found. Because this experiment differed in two essential ways from similar, previously reported double-slit experiments, two exploratory analyses were developed, one based on a simple spectral analysis of the interference pattern and the other based on fringe visibility. For the experimental data, the outcome supported a pattern of results predicted by a causal psychophysical effect, with the spectral metric resulting in a 3.4 sigma effect (p = 0.0003), and the fringe visibility metric resulting in 7 of 22 fringes tested above 2.3 sigma after adjustment for type I error inflation, with one of those fringes at 4.3 sigma above chance (p = 0.00001). The same analyses applied to the sham data showed uniformly null outcomes. Other analyses exploring the potential that these results were due to mundane artifacts, such as fluctuations in temperature or vibration, showed no evidence of such influences. Future studies using the same protocols and analytical methods will be required to determine if these exploratory results are idiosyncratic or reflect a genuine psychophysical influence.


2019 ◽  
Vol 14 (2) ◽  
pp. 399-425 ◽  
Author(s):  
Haolun Shi ◽  
Guosheng Yin

2014 ◽  
Vol 38 (2) ◽  
pp. 109-112 ◽  
Author(s):  
Daniel Furtado Ferreira

Sisvar is a statistical analysis system with a large usage by the scientific community to produce statistical analyses and to produce scientific results and conclusions. The large use of the statistical procedures of Sisvar by the scientific community is due to it being accurate, precise, simple and robust. With many options of analysis, Sisvar has a not so largely used analysis that is the multiple comparison procedures using bootstrap approaches. This paper aims to review this subject and to show some advantages of using Sisvar to perform such analysis to compare treatments means. Tests like Dunnett, Tukey, Student-Newman-Keuls and Scott-Knott are performed alternatively by bootstrap methods and show greater power and better controls of experimentwise type I error rates under non-normal, asymmetric, platykurtic or leptokurtic distributions.


Sign in / Sign up

Export Citation Format

Share Document