Ensemble estimation and variable selection with semiparametric regression models

Summary We consider scenarios in which the likelihood function for a semiparametric regression model factors into separate components, with an efficient estimator of the regression parameter available for each component. An optimal weighted combination of the component estimators, named an ensemble estimator, may be employed as an overall estimate of the regression parameter, and may be fully efficient under uncorrelatedness conditions. This approach is useful when the full likelihood function may be difficult to maximize, but the components are easy to maximize. It covers settings where the nuisance parameter may be estimated at different rates in the component likelihoods. As a motivating example we consider proportional hazards regression with prospective doubly censored data, in which the likelihood factors into a current status data likelihood and a left-truncated right-censored data likelihood. Variable selection is important in such regression modelling, but the applicability of existing techniques is unclear in the ensemble approach. We propose ensemble variable selection using the least squares approximation technique on the unpenalized ensemble estimator, followed by ensemble re-estimation under the selected model. The resulting estimator has the oracle property such that the set of nonzero parameters is successfully recovered and the semiparametric efficiency bound is achieved for this parameter set. Simulations show that the proposed method performs well relative to alternative approaches. Analysis of an AIDS cohort study illustrates the practical utility of the method.

Download Full-text

Semiparametric analysis of failure time data with complex structures

10.32469/10355/57411 ◽

2016 ◽

Author(s):

◽

Yeqian Liu

Keyword(s):

Regression Analysis ◽

Censored Data ◽

Failure Time ◽

Current Status Data ◽

Current Status ◽

Failure Time Data ◽

Interval Censored Data ◽

Time Data ◽

Right Censored Data ◽

Interval Censored

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Failure time data arise in many fields including biomedical studies and industrial life testing. Right-censored failure time data are often observed from a cohort of prevalent cases that are subject to length-biased sampling, which are termed as length-biased and right-censored data. Interval-censored failure time data arise when the failure time of interest in a survival study is not exactly observed but known only to fall within some interval or window. One area that often produces such data is medical studies with periodic follow-ups, in which the medical condition of interest such as the onset of a disease is only known to occur between two adjacent examination times. An important special case of interval-censored data is current status data which arise when each study subject is observed only once and the only information available is whether the failure event of interest has occurred or not by the observation time. Sometimes we also refer current status data as case I interval-censored data and the general case as case II interval-censored data. Semiparametric regression analysis of both right-censored and interval-censored failure time data has recently attracted a great deal of attention. Many procedures have been proposed for their regression analysis under various models. However, in many settings, the population include a cured (nonsusceptible) subpopulation, where only individuals in the susceptible subpopulation will go on to experience the event. Since classical survival models implicitly assume that all individuals will eventually experience the event of interest, they cannot be used in such contexts. They would in fact lead to incorrect results such as, among others, an overestimation of the survival of the non-cured subjects. The research in this dissertation focuses on the statistical analysis for right-censored data with length-biased sampling, interval-censored data with a cured subgroup in the presence of potential dependent censoring and measurement errors. Chapter 1 describes specific examples of right-censored and interval-censored failure time data and reviews the literature on some important topics, including nonparametric and semiparametric estimation, regression analysis in the presence of length-biased sampling and a cured subgroup respectively. Chapter 2 discusses regression analysis of length-biased and right-censored data with with partially linear varying effects. For this problem, we consider quantile regression analysis of right-censored and length-biased data and present a semiparametric varying coefficient partially linear model. For estimation of regression parameters, a three-stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also an illustrative example is provided. Chapter 3 considers regression analysis of current status data in the presence of a cured subgroup and dependent censoring. For the problem, we develop a sieve maximum likelihood estimation approach with the use of latent variables and Bernstein polynomials. For the determination of the proposed estimators, an EM algorithm and the asymptotic properties of the estimators are established. An extensive simulation study conducted to asses the finite sample performance of the method indicates that it performs well for practical situations. An illustrative example using a data set from a tumor toxicological study is provided. Chapter 4 considers regression analysis of interval-censored data in the presence of a cured subgroup and the case where one or more explanatory variables in the model are subject to measurement errors. These errors should be taken into account in the estimation of the model, to avoid biased estimations. A general approach that exists in the literature is the SIMEX algorithm, a method based on simulations which allows one to estimate the effect of measurement error on the bias of the estimators and to reduce this bias. We extend the SIMEX approach to the mixture cure model with interval-censored data. Comprehensive simulations study as well as a real data application are provided. Several directions for future research are discussed in Chapter 5.

Download Full-text

Bi-level variable selection in semiparametric transformation mixture cure models for right-censored data

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2021.1926499 ◽

2021 ◽

pp. 1-20

Author(s):

Jingjing Wu ◽

Xuewen Lu ◽

Wenyan Zhong

Keyword(s):

Variable Selection ◽

Censored Data ◽

Right Censored Data ◽

Cure Models ◽

Semiparametric Transformation ◽

Level Variable

Download Full-text

Semiparametric regression of clustered current status data

Journal of Applied Statistics ◽

10.1080/02664763.2018.1564022 ◽

2019 ◽

Vol 46 (10) ◽

pp. 1724-1737

Author(s):

Yanqin Feng ◽

Shurong Lin ◽

Yang Li

Keyword(s):

Semiparametric Regression ◽

Current Status Data ◽

Current Status ◽

Status Data

Download Full-text

Nonparametric estimation for a current status right‐censored data model

Statistica Neerlandica ◽

10.1111/stan.12180 ◽

2019 ◽

Vol 73 (4) ◽

pp. 475-495 ◽

Cited By ~ 1

Author(s):

Sergey V. Malov

Keyword(s):

Censored Data ◽

Nonparametric Estimation ◽

Data Model ◽

Current Status ◽

Right Censored Data ◽

A Current

Download Full-text

Statistical analysis of bivariate interval-censored failure time data

10.32469/10355/47078 ◽

2015 ◽

Author(s):

◽

Qingning Zhou

Keyword(s):

Statistical Analysis ◽

Censored Data ◽

Failure Time ◽

Current Status Data ◽

Current Status ◽

Interval Censored Data ◽

Time Data ◽

Failure Times ◽

Interval Censored ◽

Status Data

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Interval-censored failure time data arise when the failure time of interest in a survival study is not exactly observed but known only to fall within some interval. One area that often produces such data is medical studies with periodic follow-ups, in which the medical condition of interest such as the onset of a disease is only known to occur between two adjacent examination times. An important special case of intervalcensored data is current status data which arise when each study subject is observed only once and the only information available is whether the failure event of interest has occurred or not by the observation time. The areas that often yield such data include tumorigenicity experiments and cross-sectional studies. Sometimes we refer to current status data as case I interval-censored data, and the general case as case II interval-censored data. The analysis of both case I and case II interval-censored data has recently attracted a great deal of attention and many procedures have been proposed for various issues related to it. However, there are still a number of problems that remain unsolved or lack approaches that are simpler, more efficient and apply to more general situations compared to the existing ones. This is especially the case for multivariate intervalcensored data which arise if there are multiple failure times of interest and all of them suffer intervalcensoring. This dissertation focuses on the statistical analysis for bivariate interval-censored data, including regression analysis, model selection and estimation of the association between failure times.

Download Full-text

Variable selection in the high-dimensional continuous generalized linear model with current status data

Journal of Applied Statistics ◽

10.1080/02664763.2013.840271 ◽

2013 ◽

Vol 41 (3) ◽

pp. 467-483 ◽

Cited By ~ 15

Author(s):

Guo-Liang Tian ◽

Mingqiu Wang ◽

Lixin Song

Keyword(s):

Variable Selection ◽

Linear Model ◽

Generalized Linear Model ◽

Current Status Data ◽

Current Status ◽

High Dimensional ◽

Status Data

Download Full-text

Sieve Maximum Likelihood Estimator for Semiparametric Regression Models With Current Status Data

Journal of the American Statistical Association ◽

10.1198/016214504000000313 ◽

2004 ◽

Vol 99 (466) ◽

pp. 346-356 ◽

Cited By ~ 54

Author(s):

Hongqi Xue ◽

K. F Lam ◽

Guoying Li

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Regression Models ◽

Semiparametric Regression ◽

Current Status Data ◽

Current Status ◽

Likelihood Estimator ◽

Status Data

Download Full-text

SEMIPARAMETRIC REGRESSION ESTIMATES BASED ON SOME TRANSFORMATION TECHNIQUES FOR RIGHT-CENSORED DATA

Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering ◽

10.18038/estubtda.632694 ◽

2019 ◽

Author(s):

Ersin YILMAZ ◽

Dursun Aydın

Keyword(s):

Censored Data ◽

Semiparametric Regression ◽

Transformation Techniques ◽

Right Censored Data

Download Full-text

Statistical analysis of clustered or multivariate interval-censored failure time data

10.32469/10355/70687 ◽

2018 ◽

Author(s):

◽

Chenchen Ma

Keyword(s):

Censored Data ◽

Failure Time ◽

Observation Time ◽

Current Status Data ◽

Current Status ◽

Failure Time Data ◽

Interval Censored Data ◽

Time Data ◽

Interval Censored ◽

Status Data

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Interval-censored failure time data are a type of the failure time data that often occur in clinical trials with periodic follow-ups among others. In this case, the failure time of interest cannot be observed exactly, but is known to be greater than the last observation time at which the failure has not occurred and less than or equal to the first observation time at which the failure has been observed to occur. Current status data are a special case of interval-censored data that we sometimes refer to it as the case I interval censored data. This type of censoring means that each subject is observed only once for the status of occurrence of the failure of interest, so the survival time is either left- or right-censored. Chapter 4 considers the same problem as that in Chapter 3. There are multivariate current status data with informative censoring. We propose to use the vine-copulas to describe the dependence among the failure times of interest and the censoring time. The proposed estimators are shown to be strongly consistent and the asymptotic normality and efficiency of the regression parameter estimator are established. To assess the finite sample performance of the presented methodology, an extensive simulation study is performed and suggests that the method works well in practical situations. Finally, the proposed approach is applied to a tumorigenicity experiment. Several directions for future research are discussed in Chapter 5.

Download Full-text

Consistency of the Semi-parametric MLE under the Cox Model with Right-Censored Data

The Open Mathematics, Statistics and Probability Journal ◽

10.2174/2666148902010010021 ◽

2020 ◽

Vol 10 (1) ◽

pp. 21-27

Author(s):

Qiqing Yu

Keyword(s):

Maximum Likelihood ◽

Censored Data ◽

Cox Regression ◽

Likelihood Function ◽

Cox Model ◽

Likelihood Estimator ◽

Right Censored Data ◽

Information Inequality ◽

Leibler Information ◽

Consistency Proofs

Objective: We studied the consistency of the semi-parametric maximum likelihood estimator (SMLE) under the Cox regression model with right-censored (RC) data. Methods: Consistency proofs of the MLE are often based on the Shannon-Kolmogorov inequality, which requires finite E(lnL), where L is the likelihood function. Results: The results of this study show that one property of the semi-parametric MLE (SMLE) is established. Conclusion: Under the Cox model with RC data, E(lnL) may not exist. We used the Kullback-Leibler information inequality in our proof.

Download Full-text