scholarly journals A New Multinomial Accuracy Measure for Polling Bias

2014 ◽  
Vol 22 (1) ◽  
pp. 31-44 ◽  
Author(s):  
Kai Arzheimer ◽  
Jocelyn Evans

In this article, we propose a polling accuracy measure for multi-party elections based on a generalization of Martin, Traugott, and Kennedy's two-party predictive accuracy index. Treating polls as random samples of a voting population, we first estimate an intercept only multinomial logit model to provide proportionate odds measures of each party's share of the vote, and thereby both unweighted and weighted averages of these values as a summary index for poll accuracy. We then propose measures for significance testing, and run a series of simulations to assess possible bias from the resulting folded normal distribution across different sample sizes, finding that bias is small even for polls with small samples. We apply our measure to the 2012 French presidential election polls to demonstrate its applicability in tracking overall polling performance across time and polling organizations. Finally, we demonstrate the practical value of our measure by using it as a dependent variable in an explanatory model of polling accuracy, testing the different possible sources of bias in the French data.

2016 ◽  
Vol 41 (5) ◽  
pp. 472-505 ◽  
Author(s):  
Elizabeth Tipton ◽  
Kelly Hallberg ◽  
Larry V. Hedges ◽  
Wendy Chan

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.


Author(s):  
Ping Liu ◽  
Mengchu Xie ◽  
Jing Bian ◽  
Huishan Li ◽  
Liangliang Song

Incorporating safety risk into the design process is one of the most effective design sciences to enhance the safety of metro station construction. In such a case, the concept of Design for Safety (DFS) has attracted much attention. However, most of the current research overlooks the risk-prediction process in the application of DFS. Therefore, this paper proposes a hybrid risk-prediction framework to enhance the effectiveness of DFS in practice. Firstly, 12 influencing factors related to the safety risk of metro construction are identified by adopting the literature review method and code of construction safety management analysis. Then, a structured interview is used to collect safety risk cases of metro construction projects. Next, a developed support vector machine (SVM) model based on particle swarm optimization (PSO) is presented to predict the safety risk in metro construction, in which the multi-class SVM prediction model with an improved binary tree is designed. The results show that the average accuracy of the test sets is 85.26%, and the PSO–SVM model has a high predictive accuracy for non-linear relationship and small samples. The results show that the average accuracy of the test sets is 85.26%, and the PSO–SVM model has a high predictive accuracy for non-linear relationship and small samples. Finally, the proposed framework is applied to a case study of metro station construction. The prediction results show the PSO–SVM model is applicable and reasonable for safety risk prediction. This research also identifies the most important influencing factors to reduce the safety risk of metro station construction, which provides a guideline for the safety risk prediction of metro construction for design process.


2019 ◽  
pp. 089443931988844
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.


2015 ◽  
Vol 63 (3) ◽  
pp. 228-234 ◽  
Author(s):  
Juraj Parajka ◽  
Ralf Merz ◽  
Jon Olav Skøien ◽  
Alberto Viglione

Abstract Direct interpolation of daily runoff observations to ungauged sites is an alternative to hydrological model regionalisation. Such estimation is particularly important in small headwater basins characterized by sparse hydrological and climate observations, but often large spatial variability. The main objective of this study is to evaluate predictive accuracy of top-kriging interpolation driven by different number of stations (i.e. station densities) in an input dataset. The idea is to interpolate daily runoff for different station densities in Austria and to evaluate the minimum number of stations needed for accurate runoff predictions. Top-kriging efficiency is tested for ten different random samples in ten different stations densities. The predictive accuracy is evaluated by ordinary cross-validation and full-sample crossvalidations. The methodology is tested by using 555 gauges with daily observations in the period 1987-1997. The results of the cross-validation indicate that, in Austria, top-kriging interpolation is superior to hydrological model regionalisation if station density exceeds approximately 2 stations per 1000 km2 (175 stations in Austria). The average median of Nash-Sutcliffe cross-validation efficiency is larger than 0.7 for densities above 2.4 stations/1000 km2. For such densities, the variability of runoff efficiency is very small over ten random samples. Lower runoff efficiency is found for low station densities (less than 1 station/1000 km2) and in some smaller headwater basins.


1989 ◽  
Vol 26 (1) ◽  
pp. 56-68 ◽  
Author(s):  
David S. Bunch ◽  
Richard R. Batsell

Marketing researchers use the multinomial logit (MNL) model to analyze discrete choice, and estimate parameters either by maximum likelihood (ML) or minimum logit chi square (MLCS). Some controversy persists, however, over which is better. Review articles in marketing recommend ML over MLCS, but the statistics literature suggests that MLCS should be preferred. No studies have directly compared the performance of ML and MLCS in a marketing context. The authors assess the relative performance of ML, MLCS, and three other candidate estimators for MNL marketing applications involving repeated-measures datasets collected by means of multiple-subset designs. In contrast to most previous findings in the statistics literature, the results strongly support the use of ML. ML is found to outperform the other estimators on a variety of point estimation, predictive accuracy, and statistical inference criteria and ML test statistics are found to have asymptotic behavior for datasets involving relatively few replications.


2021 ◽  
Vol 10 (9) ◽  
pp. 597
Author(s):  
Chaitanya Joshi ◽  
Sophie Curtis-Ham ◽  
Clayton D’Ath ◽  
Deane Searle

A literature review of the important trends in predictive crime modeling and the existing measures of accuracy was undertaken. It highlighted the need for a robust, comprehensive and independent evaluation and the need to include complementary measures for a more complete assessment. We develop a new measure called the penalized predictive accuracy index (PPAI), propose the use of the expected utility function to combine multiple measures and the use of the average logarithmic score, which measures accuracy differently than existing measures. The measures are illustrated using hypothetical examples. We illustrate how PPAI could identify the best model for a given problem, as well as how the expected utility measure can be used to combine different measures in a way that is the most appropriate for the problem at hand. It is important to develop measures that empower the practitioner with the ability to input the choices and preferences that are most appropriate for the problem at hand and to combine multiple measures. The measures proposed here go some way towards providing this ability. Further development along these lines is needed.


1933 ◽  
Vol 23 (1) ◽  
pp. 6-17 ◽  
Author(s):  
T. Eden ◽  
F. Yates

Summary1. Previous work on the validity of the t and z tests on non-normal distributions is described. The question as to whether these tests, which are all on small samples from theoretical distributions, are really apposite is discussed.2. The necessity of making a practical test with actual data which shall comply with the usual conditions obtaining in agricultural experiments is urged.3. A practical test has been made on a skew distribution obtained from the observation of 256 height measurements on wheat. The distribution of the values of R. A. Fisher's z from a thousand random samples has been obtained and found to agree satisfactorily with the theoretical distribution.


Tripodos ◽  
2020 ◽  
pp. 69-84
Author(s):  
Spencer Kimball

Is there a way to make pre-election polls more accurate? This paper seeks to test some of the most popular methods of allocating ‘undecided’ voters, based on the underlying theory that the allocation of undecided voters will improve the public’s expectations of election results and a pollster’s claims about accuracy. Polling literature states the most popular methods to incorporate undecided voters include asking a “leaner” question that follows a ballot test question, or allocating the undecided proportionally to their vote preference. Both methods were used in this study, along with a third option in which an even-allocation, or essentially no allocation of undecided voters, took place. The study incorporates n=54 pre-election polls conducted in 20 different states, between October 26 and November 4, 2018, which were used to compare the three allocation methods. This includes an Absolute Error test (deviation between poll results and election results, Mosteller et al., 1949), a Statistical Accuracy test (absolute error compared with the poll’s margin of error, Kimball, 2017), and a Predictive Accuracy test (did the poll predict the actual election winner?). The study found no significant difference between the accuracy of the polls that included an allocation of undecided voters as compared to those that did not (χ2 (2, N=161)=.200, p=.905), suggesting that allocating undecided voters does not detract from, nor add to the reliability and validity of a pre-election poll. Keywords: undecided voter, pre-election polling, poll accuracy, allocation of undecided voter, political communication.


Sign in / Sign up

Export Citation Format

Share Document