Interval Estimation of Bivariate Correlations With Missing Data on Both Variables: A Bayesian Approach

1997 ◽  
Vol 22 (4) ◽  
pp. 407-424 ◽  
Author(s):  
Alan L. Gross

The posterior distribution of the bivariate correlation ( ρxy) is analytically derived given a data set consisting N1 cases measured on both x and y, N2 cases measured only on x, and N3 cases measured only on y. The posterior distribution is shown to be a function of the subsample sizes, the sample correlation ( rxy) computed from the N1 complete cases, a set of four statistics which measure the extent to which the missing data are not missing completely at random, and the specified prior distribution for ρxy. A sampling study suggests that in small ( N = 20) and moderate ( N = 50) sized samples, posterior Bayesian interval estimates will dominate maximum likelihood based estimates in terms of coverage probability and expected interval widths when the prior distribution for ρxy is simply uniform on (0, 1). The advantage of the Bayesian method when more informative priors based on beta densities are employed is not as consistent.

2021 ◽  
Vol 10 (3) ◽  
pp. 413-422
Author(s):  
Nur Azizah ◽  
Sugito Sugito ◽  
Hasbi Yasin

Hospital service facilities cannot be separated from queuing events. Queues are an unavoidable part of life, but they can be minimized with a good system. The purpose of this study was to find out how the queuing system at Dr. Kariadi. Bayesian method is used to combine previous research and this research in order to obtain new information. The sample distribution and prior distribution obtained from previous studies are combined with the sample likelihood function to obtain a posterior distribution. After calculating the posterior distribution, it was found that the queuing model in the outpatient installation at Dr. Kariadi Semarang is (G/G/c): (GD/∞/∞) where each polyclinic has met steady state conditions and the level of busyness is greater than the unemployment rate so that the queuing system at Dr. Kariadi is categorized as good, except in internal medicine poly. 


2022 ◽  
Vol 2022 ◽  
pp. 1-8
Author(s):  
Xiaoying Lv ◽  
Ruonan Zhao ◽  
Tongsheng Su ◽  
Liyun He ◽  
Rui Song ◽  
...  

Objective. To explore the optimal fitting path of missing data of the Scale to make the fitting data close to the real situation of patients’ data. Methods. Based on the complete data set of the SDS of 507 patients with stroke, the data simulation sets of Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) were constructed by R software, respectively, with missing rates of 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40% under three missing mechanisms. Mean substitution (MS), random forest regression (RFR), and predictive mean matching (PMM) were used to fit the data. Root mean square error (RMSE), the width of 95% confidence intervals (95% CI), and Spearman correlation coefficient (SCC) were used to evaluate the fitting effect and determine the optimal fitting path. Results. when dealing with the problem of missing data in scales, the optimal fitting path is ① under the MCAR deletion mechanism, when the deletion proportion is less than 20%, the MS method is the most convenient; when the missing ratio is greater than 20%, RFR algorithm is the best fitting method. ② Under the Mar mechanism, when the deletion ratio is less than 35%, the MS method is the most convenient. When the deletion ratio is greater than 35%, RFR has a better correlation. ③ Under the mechanism of MNAR, RFR is the best data fitting method, especially when the missing proportion is greater than 30%. In reality, when the deletion ratio is small, the complete case deletion method is the most commonly used, but the RFR algorithm can greatly expand the application scope of samples and save the cost of clinical research when the deletion ratio is less than 30%. The best way to deal with data missing should be based on the missing mechanism and proportion of actual data, and choose the best method between the statistical analysis ability of the research team, the effectiveness of the method, and the understanding of readers.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 934
Author(s):  
Yuxuan Zhang ◽  
Kaiwei Liu ◽  
Wenhao Gui

For the purpose of improving the statistical efficiency of estimators in life-testing experiments, generalized Type-I hybrid censoring has lately been implemented by guaranteeing that experiments only terminate after a certain number of failures appear. With the wide applications of bathtub-shaped distribution in engineering areas and the recently introduced generalized Type-I hybrid censoring scheme, considering that there is no work coalescing this certain type of censoring model with a bathtub-shaped distribution, we consider the parameter inference under generalized Type-I hybrid censoring. First, estimations of the unknown scale parameter and the reliability function are obtained under the Bayesian method based on LINEX and squared error loss functions with a conjugate gamma prior. The comparison of estimations under the E-Bayesian method for different prior distributions and loss functions is analyzed. Additionally, Bayesian and E-Bayesian estimations with two unknown parameters are introduced. Furthermore, to verify the robustness of the estimations above, the Monte Carlo method is introduced for the simulation study. Finally, the application of the discussed inference in practice is illustrated by analyzing a real data set.


BJPsych Open ◽  
2018 ◽  
Vol 4 (6) ◽  
pp. 486-491 ◽  
Author(s):  
Christine Cocker ◽  
Helen Minnis ◽  
Helen Sweeting

BackgroundRoutine screening to identify mental health problems in English looked-after children has been conducted since 2009 using the Strengths and Difficulties Questionnaire (SDQ).AimsTo investigate the degree to which data collection achieves screening aims (identifying scale of problem, having an impact on mental health) and the potential analytic value of the data set.MethodDepartment for Education data (2009–2017) were used to examine: aggregate, population-level trends in SDQ scores in 4/5- to 16/17-year-olds; representativeness of the SDQ sample; attrition in this sample.ResultsMean SDQ scores (around 50% ‘abnormal’ or ‘borderline’) were stable over 9 years. Levels of missing data were high (25–30%), as was attrition (28% retained for 4 years). Cross-sectional SDQ samples were not representative and longitudinal samples were biased.ConclusionsMental health screening appears justified and the data set has research potential, but the English screening programme falls short because of missing data and inadequate referral routes for those with difficulties.Declaration of interestNone.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Helena Mouriño ◽  
Maria Isabel Barão

Missing-data problems are extremely common in practice. To achieve reliable inferential results, we need to take into account this feature of the data. Suppose that the univariate data set under analysis has missing observations. This paper examines the impact of selecting an auxiliary complete data set—whose underlying stochastic process is to some extent interdependent with the former—to improve the efficiency of the estimators for the relevant parameters of the model. The Vector AutoRegressive (VAR) Model has revealed to be an extremely useful tool in capturing the dynamics of bivariate time series. We propose maximum likelihood estimators for the parameters of the VAR(1) Model based on monotone missing data pattern. Estimators’ precision is also derived. Afterwards, we compare the bivariate modelling scheme with its univariate counterpart. More precisely, the univariate data set with missing observations will be modelled by an AutoRegressive Moving Average (ARMA(2,1)) Model. We will also analyse the behaviour of the AutoRegressive Model of order one, AR(1), due to its practical importance. We focus on the mean value of the main stochastic process. By simulation studies, we conclude that the estimator based on the VAR(1) Model is preferable to those derived from the univariate context.


1979 ◽  
Vol 4 (1) ◽  
pp. 41-58 ◽  
Author(s):  
Thomas R. Knapp

This paper is an attempt to illustrate the generality of incidence sampling for estimating a parameter whose estimator preserves the unbiasedness of generalized symmetric means, a property which the sample covariance possesses but which the sample correlation coefficient does not. The problem of missing data is also addressed.


Entropy ◽  
2018 ◽  
Vol 20 (9) ◽  
pp. 642 ◽  
Author(s):  
Erlandson Saraiva ◽  
Adriano Suzuki ◽  
Luis Milan

In this paper, we study the performance of Bayesian computational methods to estimate the parameters of a bivariate survival model based on the Ali–Mikhail–Haq copula with marginal distributions given by Weibull distributions. The estimation procedure was based on Monte Carlo Markov Chain (MCMC) algorithms. We present three version of the Metropolis–Hastings algorithm: Independent Metropolis–Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis–Hastings with a natural-candidate generating density (MH). Since the creation of a good candidate generating density in IMH and RWM may be difficult, we also describe how to update a parameter of interest using the slice sampling (SS) method. A simulation study was carried out to compare the performances of the IMH, RWM and SS. A comparison was made using the sample root mean square error as an indicator of performance. Results obtained from the simulations show that the SS algorithm is an effective alternative to the IMH and RWM methods when simulating values from the posterior distribution, especially for small sample sizes. We also applied these methods to a real data set.


2020 ◽  
Vol 9 (4) ◽  
pp. 495-504
Author(s):  
Lifana Nugraeni ◽  
Sugito Sugito ◽  
Dwi Ispriyanti

Along with the times, transportation has progressed. Regarding the means of transportation, one of the phenomenon that is easily encountered in everyday life is the queue at public transportation facilities. One of the queues that occurred at public transportation facilities is  the train queue at Semarang Tawang Station. The number of trains that passes the station can cause the train service at the station busy. This study aims to see whether the train service system of Semarang Tawang Station is good or not. This can be consider by the queues method, determining the distribution of arrival patterns and service patterns to obtain a queues system model and a system performance standard. In this study, the distribution of arrival patterns and service patterns are determined by calculating the posterior distribution using the Bayesian method. The bayesian method was chosen because it is able to combine the sample distribution in the current study with the previous information for the same cases. The prior distribution and the likelihood function are the elements needed to obtain the posterior distribution. The distribution of arrival patterns and service patterns obtained from previous information follows the Poisson distribution. Based on the calculation of the posterior distribution, the result shows that the distribution of the arrival pattern is a discrete uniform distribution and the distribution of the service pattern is a Poisson distribution. The result shows that the train service system at Semarang Tawang Station has a model (Uniform Discrete / Gamma / 7: GD / ~ / ~) and has good service based on the performance values obtained.


Sign in / Sign up

Export Citation Format

Share Document