scholarly journals EM algorithm for longitudinal data with non-ignorable missing values: An application to health data

2016 ◽  
Vol 27 (2) ◽  
pp. 133-142
Author(s):  
Radia Taisir ◽  
M Ataharul Islam

Longitudinal studies involves repeated observations over time on the same experimental units and missingness may occur in non-ignorable fashion. For such longitudinal missing data, a Markov model may be used to model the binary response along with a suitable non-response model for the missing portion of the data. It is of the primary interest to estimate the effects of covariates on the binary response. Similar model for such incomplete longitudinal data exists where estimation of the regression parameters are obtained using likelihood method by summing over all possible values of the missing responses. In this paper, we propose an expectation-maximization (EM) algorithm technique for the estimation of the regression parameters which is computationally simple and produces similar efficient estimates as compared to the existing complex method of estimation. A comparison of the existing and the proposed estimation methods has been made by analyzing the Health and Retirement Survey (HRS) data of United States.Bangladesh J. Sci. Res. 27(2): 133-142, December-2014

2019 ◽  
Vol 20 (2) ◽  
pp. 148-170 ◽  
Author(s):  
Jayabrata Biswas ◽  
Kiranmoy Das

There is a rich literature on the analysis of longitudinal data with missing values. However, the analysis becomes complex for semi-continuous (zero-inflated) longitudinal response with missingness. In this article, we propose a partially varying coefficients regression model for analysing such data. We use a two-part model, where in the first part we propose a latent dynamic model for accounting a ‘zero’ or a ‘non-zero’ response, and in the second part we use another dynamic model for estimating the mean trajectories of non-zero responses. Two dynamic models are linked through subject-specific random effects. The missing covariates are imputed repeatedly based on their respective posterior predictive distributions and the missing responses are imputed using the working model under different identifying restrictions. We analyse data from the Health and Retirement Study (HRS) for aged individuals and develop a dynamic model for predicting out-of-pocket medical expenditures (OOPME) containing excess zeros. The operating characteristics of the proposed model are investigated through extensive simulation studies.


2019 ◽  
Vol 29 (5) ◽  
pp. 1288-1304 ◽  
Author(s):  
Tsung-I Lin ◽  
Wan-Lun Wang

Multivariate longitudinal data arisen in medical studies often exhibit complex features such as censored responses, intermittent missing values, and atypical or outlying observations. The multivariate- t linear mixed model (MtLMM) has been recognized as a powerful tool for robust modeling of multivariate longitudinal data in the presence of potential outliers or fat-tailed noises. This paper presents a generalization of MtLMM, called the MtLMM-CM, to properly adjust for censorship due to detection limits of the assay and missingness embodied within multiple outcome variables recorded at irregular occasions. An expectation conditional maximization either (ECME) algorithm is developed to compute parameter estimates using the maximum likelihood (ML) approach. The asymptotic standard errors of the ML estimators of fixed effects are obtained by inverting the empirical information matrix according to Louis' method. The techniques for the estimation of random effects and imputation of missing responses are also investigated. The proposed methodology is illustrated on two real-world examples from HIV-AIDS studies and a simulation study under a variety of scenarios.


Biostatistics ◽  
2014 ◽  
Vol 15 (4) ◽  
pp. 731-744 ◽  
Author(s):  
Cong Xu ◽  
Paul D. Baines ◽  
Jane-Ling Wang

Abstract Joint modeling of survival and longitudinal data has been studied extensively in the recent literature. The likelihood approach is one of the most popular estimation methods employed within the joint modeling framework. Typically, the parameters are estimated using maximum likelihood, with computation performed by the expectation maximization (EM) algorithm. However, one drawback of this approach is that standard error (SE) estimates are not automatically produced when using the EM algorithm. Many different procedures have been proposed to obtain the asymptotic covariance matrix for the parameters when the number of parameters is typically small. In the joint modeling context, however, there may be an infinite-dimensional parameter, the baseline hazard function, which greatly complicates the problem, so that the existing methods cannot be readily applied. The profile likelihood and the bootstrap methods overcome the difficulty to some extent; however, they can be computationally intensive. In this paper, we propose two new methods for SE estimation using the EM algorithm that allow for more efficient computation of the SE of a subset of parametric components in a semiparametric or high-dimensional parametric model. The precision and computation time are evaluated through a thorough simulation study. We conclude with an application of our SE estimation method to analyze an HIV clinical trial dataset.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Ferhat Bingöl

Wind farm siting relies on in situ measurements and statistical analysis of the wind distribution. The current statistical methods include distribution functions. The one that is known to provide the best fit to the nature of the wind is the Weibull distribution function. It is relatively straightforward to parameterize wind resources with the Weibull function if the distribution fits what the function represents but the estimation process gets complicated if the distribution of the wind is diverse in terms of speed and direction. In this study, data from a 101 m meteorological mast were used to test several estimation methods. The available data display seasonal variations, with low wind speeds in different seasons and effects of a moderately complex surrounding. The results show that the maximum likelihood method is much more successful than industry standard WAsP method when the diverse winds with high percentile of low wind speed occur.


2002 ◽  
Vol 53 (5) ◽  
pp. 869 ◽  
Author(s):  
Richard McGarvey ◽  
Andrew H. Levings ◽  
Janet M. Matthews

The growth of Australian giant crabs, Pseudocarcinus gigas, has not been previously studied. A tagging program was undertaken in four Australian states where the species is subject to commercial exploitation. Fishers reported a recapture sample of 1372 females and 383 males from commercial harvest, of which 190 females and 160 males had moulted at least once. Broad-scale modes of growth increment were readily identified and interpreted as 0 , 1 and 2 moults during time at large. Single-moult increments were normally distributed for six of seven data sets. Moult increments were constant with length for males and declined slowly for three of four female data sets. Seasonality of moulting in South Australia was inferred from monthly proportions captured with newly moulted shells. Female moulting peaked strongly in winter (June and July). Males moult in summer (November and December). Intermoult period estimates for P. gigas varied from 3 to 4 years for juvenile males and females (80–120 mm carapace length, CL), with rapid lengthening in time between moulting events to approximately seven years for females and four and a half years for males at legal minimum length of 150 mm CL. New moulting growth estimation methods include a generalization of the anniversary method for estimating intermoult period that uses (rather than rejects) most capture–recapture data and a multiple likelihood method for assigning recaptures to their most probable number of moults during time at large.


Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


Sign in / Sign up

Export Citation Format

Share Document