sequential importance sampling
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 9)

H-INDEX

18
(FIVE YEARS 1)

2020 ◽  
Vol 42 (4) ◽  
pp. A2062-A2087 ◽  
Author(s):  
F. Wagner ◽  
J. Latz ◽  
I. Papaioannou ◽  
E. Ullmann

2019 ◽  
Author(s):  
Kai Fricke ◽  
Philipp Herzberg

Large behavioral datasets collect ecologically valid data on user behavior, but only seldom provide more data on the users. To gain further insights on the participants, small subsamples can be collected, assessing variables of interest, such as personality characteristics. However, such samples are often biased through self-selection of participation. Removal of this bias is often hard because no demographic data is present and only distributions of (multiple) continuous target variables in the population are known from the large dataset. In two studies, we examined de-biasing methods with respect to arbitrary continuous variables. In the first study, we examined an artificial dataset with full information on the population and found that sequential importance sampling and single-nearest-neighbour matching proved successful in removing bias from subsamples. In the second study, we took this method to practice and examined the relationship of personality characteristics and music preferences in a sample of Spotify users with respect to the musical preference distributions of one million users of the Million Song Dataset. Our results show that sequential importance sampling is a promising way to remove bias from samples when the distribution of relevant variables in the population is known.


Author(s):  
Ruriko Yoshida ◽  
Hisayuki Hara ◽  
Patrick M. Saluke

Logistic regression is one of the most popular models to classify in data science, and in general, it is easy to use. However, in order to conduct a goodness-of-fit test, we cannot apply asymptotic methods if we have sparse datasets. In the case, we have to conduct an exact conditional inference via a sampler, such as Markov Chain Monte Carlo (MCMC) or Sequential Importance Sampling (SIS). In this chapter, the authors investigate the rejection rate of the SIS procedure on a multiple logistic regression models with categorical covariates. Using tools from algebra, they show that in general SIS can have a very high rejection rate even though we apply Linear Integer Programming (IP) to compute the support of the marginal distribution for each variable. More specifically, the semigroup generated by the columns of the design matrix for a multiple logistic regression has infinitely many “holes.” They end with application of a hybrid scheme of MCMC and SIS to NUN study data on Alzheimer disease study.


PAMM ◽  
2018 ◽  
Vol 18 (1) ◽  
Author(s):  
Max Ehre ◽  
Iason Papaioannou ◽  
Daniel Straub

Sign in / Sign up

Export Citation Format

Share Document