scholarly journals Supplementing Small Probability Samples with Nonprobability Samples: A Bayesian Approach

2019 ◽  
Vol 35 (3) ◽  
pp. 653-681 ◽  
Author(s):  
Joseph W. Sakshaug ◽  
Arkadiusz Wiśniowski ◽  
Diego Andres Perez Ruiz ◽  
Annelies G. Blom

Abstract Carefully designed probability-based sample surveys can be prohibitively expensive to conduct. As such, many survey organizations have shifted away from using expensive probability samples in favor of less expensive, but possibly less accurate, nonprobability web samples. However, their lower costs and abundant availability make them a potentially useful supplement to traditional probability-based samples. We examine this notion by proposing a method of supplementing small probability samples with nonprobability samples using Bayesian inference. We consider two semi-conjugate informative prior distributions for linear regression coefficients based on nonprobability samples, one accounting for the distance between maximum likelihood coefficients derived from parallel probability and non-probability samples, and the second depending on the variability and size of the nonprobability sample. The method is evaluated in comparison with a reference prior through simulations and a real-data application involving multiple probability and nonprobability surveys fielded simultaneously using the same questionnaire. We show that the method reduces the variance and mean-squared error (MSE) of coefficient estimates and model-based predictions relative to probability-only samples. Using actual and assumed cost data we also show that the method can yield substantial cost savings (up to 55%) for a fixed MSE.

2016 ◽  
Vol 2 (11) ◽  
Author(s):  
William Stewart

<p>For modern linkage studies involving many small families, Stewart et al. (2009)[1] introduced an efficient estimator of disease gene location (denoted ) that averages location estimates from random subsamples of the dense SNP data. Their estimator has lower mean squared error than competing estimators and yields narrower confidence intervals (CIs) as well. However, when the number of families is small and the pedigree structure is large (possibly extended), the computational feasibility and statistical properties of  are not known. We use simulation and real data to show that (1) for this extremely important but often overlooked study design, CIs based on  are narrower than CIs based on a single subsample, and (2) the reduction in CI length is proportional to the square root of the expected Monte Carlo error. As a proof of principle, we applied  to the dense SNP data of four large, extended, specific language impairment (SLI) pedigrees, and reduced the single subsample CI by 18%. In summary, confidence intervals based on  should minimize re-sequencing costs beneath linkage peaks, and reduce the number of candidate genes to investigate.</p>


2021 ◽  
Vol 5 (1) ◽  
pp. 192-199
Author(s):  
Ronald Onyango ◽  
◽  
Brian Oduor ◽  
Francis Odundo ◽  
◽  
...  

The present study proposes a generalized mean estimator for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors based on the Randomized Response Technique (RRT). Expressions for the bias and mean squared error for the proposed estimator are correctly derived up to the first order of approximation. Furthermore, the optimum conditions and minimum mean squared error for the proposed estimator are determined. The efficiency of the proposed estimator is studied both theoretically and numerically using simulated and real data sets. The numerical study reveals that the use of the Randomized Response Technique (RRT) in a survey contaminated with measurement errors increases the variances and mean squared errors of estimators of the finite population mean.


2020 ◽  
Author(s):  
Ali Ghazizadeh ◽  
Frederic Ambroggi

AbstractPeri-event time histograms (PETH) are widely used to study correlations between experimental events and neuronal firing. The accuracy of firing rate estimate using a PETH depends on the choice of binsize. We show that the optimal binsize for a PETH depends on factors such as the number of trials and the temporal dynamics of the firing rate. These factors argue against the use of a one-size-fits-all binsize when making PETHs for an inhomogeneous population of neurons. Here we propose a binsize selection method by adapting the Akaike Information Criterion (AIC). Simulations show that optimal binsizes estimated by AIC closely match the optimal binsizes using mean squared error (MSE). Furthermore, using real data, we find that optimal binning improves detection of responses and their dynamics. Together our analysis strongly supports optimal binning of PETHs and proposes a computationally efficient method for this optimization based on AIC approach to model selection.


2021 ◽  
pp. 096228022110342
Author(s):  
Denis Talbot ◽  
Awa Diop ◽  
Mathilde Lavigne-Robichaud ◽  
Chantal Brisson

Background The change in estimate is a popular approach for selecting confounders in epidemiology. It is recommended in epidemiologic textbooks and articles over significance test of coefficients, but concerns have been raised concerning its validity. Few simulation studies have been conducted to investigate its performance. Methods An extensive simulation study was realized to compare different implementations of the change in estimate method. The implementations were also compared when estimating the association of body mass index with diastolic blood pressure in the PROspective Québec Study on Work and Health. Results All methods were susceptible to introduce important bias and to produce confidence intervals that included the true effect much less often than expected in at least some scenarios. Overall mixed results were obtained regarding the accuracy of estimators, as measured by the mean squared error. No implementation adequately differentiated confounders from non-confounders. In the real data analysis, none of the implementation decreased the estimated standard error. Conclusion Based on these results, it is questionable whether change in estimate methods are beneficial in general, considering their low ability to improve the precision of estimates without introducing bias and inability to yield valid confidence intervals or to identify true confounders.


Biometrika ◽  
2017 ◽  
Vol 104 (4) ◽  
pp. 845-861 ◽  
Author(s):  
Takamichi Baba ◽  
Takayuki Kanemori ◽  
Yoshiyuki Ninomiya

Summary For marginal structural models, which play an important role in causal inference, we consider a model selection problem within a semiparametric framework using inverse-probability-weighted estimation or doubly robust estimation. In this framework, the modelling target is a potential outcome that may be missing, so there is no classical information criterion. We define a mean squared error for treating the potential outcome and derive an asymptotic unbiased estimator as a $C_{p}$ criterion using an ignorable treatment assignment condition. Simulation shows that the proposed criterion outperforms a conventional one by providing smaller squared errors and higher frequencies of selecting the true model in all the settings considered. Moreover, in a real-data analysis we found a clear difference between the two criteria.


2020 ◽  
Vol 16 (4) ◽  
pp. 387-412
Author(s):  
Lafaiet Silva ◽  
Nádia Félix Silva ◽  
Thierson Rosa

Purpose This study aims to analyze Kickstarter data along with social media data from a data mining perspective. Kickstarter is a crowdfunding financing plataform and is a form of fundraising and is increasingly being adopted as a source for achieving the viability of projects. Despite its importance and adoption growth, the success rate of crowdfunding campaigns was 47% in 2017, and it has decreased over the years. A way of increasing the chances of success of campaigns would be to predict, by using machine learning techniques, if a campaign would be successful. By applying classification models, it is possible to estimate if whether or not a campaign will achieve success, and by applying regression models, the authors can forecast the amount of money to be funded. Design/methodology/approach The authors propose a solution in two phases, namely, launching and campaigning. As a result, models better suited for each point in time of a campaign life cycle. Findings The authors produced a static predictor capable of classifying the campaigns with an accuracy of 71%. The regression method for phase one achieved a 6.45 of root mean squared error. The dynamic classifier was able to achieve 85% of accuracy before 10% of campaign duration, the equivalent of 3 days, given a campaign with 30 days of length. At this same period time, it was able to achieve a forecasting performance of 2.5 of root mean squared error. Originality/value The authors carry out this research presenting the results with a set of real data from a crowdfunding platform. The results are discussed according to the existing literature. This provides a comprehensive review, detailing important research instructions for advancing this field of literature.


2017 ◽  
Vol 39 (3) ◽  
pp. 539 ◽  
Author(s):  
Ricardo Puziol Oliveira ◽  
Josmar Mazucheli ◽  
Jorge Alberto Achcar

The methods of generate a probability function from a probability density function has long been used in recent years. In general, the discretization process produces probability functions that can be rivals to traditional distributions used in the analysis of count data as the geometric, the Poisson and negative binomial distributions. In this paper, by the method based on an infinite series, we studied an alternative discrete Lindley distribution to those study in Gomez (2011) and Bakouch (2014). For both distributions, a simulation study is carried out to examine the bias and mean squared error of the maximum likelihood estimators of the parameters as well as the coverage probability and the width of the confidence intervals. For the discrete Lindley distribution obtained by infinite series method we present the analytical expression for bias reduction of the maximum likelihood estimator. Some examples using real data from the literature show the potential of these distributions. 


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


Author(s):  
Nadia Hashim Al-Noor ◽  
Shurooq A.K. Al-Sultany

        In real situations all observations and measurements are not exact numbers but more or less non-exact, also called fuzzy. So, in this paper, we use approximate non-Bayesian computational methods to estimate inverse Weibull parameters and reliability function with fuzzy data. The maximum likelihood and moment estimations are obtained as non-Bayesian estimation. The maximum likelihood estimators have been derived numerically based on two iterative techniques namely “Newton-Raphson” and the “Expectation-Maximization” techniques. In addition, we provide compared numerically through Monte-Carlo simulation study to obtained estimates of the parameters and reliability function in terms of their mean squared error values and integrated mean squared error values respectively.


Sign in / Sign up

Export Citation Format

Share Document