Statistical Methods in Medical Research

Model averaging with the hybrid model: An asymptotic study and demonstration

Statistical Methods in Medical Research ◽

10.1177/09622802211041750 ◽

2022 ◽

pp. 096228022110417

Author(s):

Kian Wee Soh ◽

Thomas Lumley ◽

Cameron Walker ◽

Michael O’Sullivan

Keyword(s):

Cross Validation ◽

Asymptotic Optimality ◽

Model Averaging ◽

Explanatory Variables ◽

Model Average ◽

Cross Validation Error ◽

Validation Procedure ◽

High Level ◽

Leave One Out ◽

Established Technique

In this paper, we present a new model averaging technique that can be applied in medical research. The dataset is first partitioned by the values of its categorical explanatory variables. Then for each partition, a model average is determined by minimising some form of squared errors, which could be the leave-one-out cross-validation errors. From our asymptotic optimality study and the results of simulations, we demonstrate under several high-level assumptions and modelling conditions that this model averaging procedure may outperform jackknife model averaging, which is a well-established technique. We also present an example where a cross-validation procedure does not work (that is, a zero-valued cross-validation error is obtained) when determining the weights for model averaging.

Download Full-text

On estimating a constrained bivariate random effects model for meta-analysis of test accuracy studies

Statistical Methods in Medical Research ◽

10.1177/09622802211065157 ◽

2022 ◽

pp. 096228022110651

Author(s):

Mohammed Baragilly ◽

Brian Harvey Willis

Keyword(s):

Sensitivity And Specificity ◽

Random Effects ◽

Coverage Probability ◽

Mean Squared Error ◽

Meta Analysis ◽

Test Accuracy ◽

Random Effects Model ◽

Mahalanobis Distances ◽

Constrained Model ◽

True Values

Tailored meta-analysis uses setting-specific knowledge for the test positive rate and disease prevalence to constrain the possible values for a test's sensitivity and specificity. The constrained region is used to select those studies relevant to the setting for meta-analysis using an unconstrained bivariate random effects model (BRM). However, sometimes there may be no studies to aggregate, or the summary estimate may lie outside the plausible or “applicable” region. Potentially these shortcomings may be overcome by incorporating the constraints in the BRM to produce a constrained model. Using a penalised likelihood approach we developed an optimisation algorithm based on co-ordinate ascent and Newton-Raphson iteration to fit a constrained bivariate random effects model (CBRM) for meta-analysis. Using numerical examples based on simulation studies and real datasets we compared its performance with the BRM in terms of bias, mean squared error and coverage probability. We also determined the ‘closeness’ of the estimates to their true values using the Euclidian and Mahalanobis distances. The CBRM produced estimates which in the majority of cases had lower absolute mean bias and greater coverage probability than the BRM. The estimated sensitivities and specificity for the CBRM were, in general, closer to the true values than the BRM. For the two real datasets, the CBRM produced estimates which were in the applicable region in contrast to the BRM. When combining setting-specific data with test accuracy meta-analysis, a constrained model is more likely to yield a plausible estimate for the sensitivity and specificity in the practice setting than an unconstrained model.

Download Full-text

A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy

Statistical Methods in Medical Research ◽

10.1177/09622802211065158 ◽

2022 ◽

pp. 096228022110651

Author(s):

Mireille E Schnitzer ◽

Steve Ferreira Guerra ◽

Cristina Longo ◽

Lucie Blais ◽

Robert W Platt

Keyword(s):

Inhaled Corticosteroids ◽

Simulated Data ◽

Likelihood Estimation ◽

Inverse Probability Weighting ◽

Time Dependent ◽

Probability Weighting ◽

Inverse Probability ◽

Immortal Time Bias ◽

Targeted Maximum Likelihood ◽

Time Dependent Effect

Many studies seek to evaluate the effects of potentially harmful pregnancy exposures during specific gestational periods. We consider an observational pregnancy cohort where pregnant individuals can initiate medication usage or become exposed to a drug at various times during their pregnancy. An important statistical challenge involves how to define and estimate exposure effects when pregnancy loss or delivery can occur over time. Without proper consideration, the results of standard analysis may be vulnerable to selection bias, immortal time-bias, and time-dependent confounding. In this study, we apply the “target trials” framework of Hernán and Robins in order to define effects based on the counterfactual approach often used in causal inference. This effect is defined relative to a hypothetical randomized trial of timed pregnancy exposures where delivery may precede and thus potentially interrupt exposure initiation. We describe specific implementations of inverse probability weighting, G-computation, and Targeted Maximum Likelihood Estimation to estimate the effects of interest. We demonstrate the performance of all estimators using simulated data and show that a standard implementation of inverse probability weighting is biased. We then apply our proposed methods to a pharmacoepidemiology study to evaluate the potentially time-dependent effect of exposure to inhaled corticosteroids on birthweight in pregnant people with mild asthma.

Download Full-text

Stopping rules for phase I clinical trials with dose expansion cohorts

Statistical Methods in Medical Research ◽

10.1177/09622802211064996 ◽

2021 ◽

pp. 096228022110649

Author(s):

Sean M Devlin ◽

Alexia Iasonos ◽

John O’Quigley

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Dose Escalation ◽

Small Sample Size ◽

Maximum Tolerated Dose ◽

Small Sample ◽

Stopping Rules ◽

Phase I Clinical Trials ◽

Study Protocols ◽

Dose Levels

Many clinical trials incorporate stopping rules to terminate early if the clinical question under study can be answered with a high degree of confidence. While common in later-stage trials, these rules are rarely implemented in dose escalation studies, due in part to the relatively smaller sample size of these designs. However, even with a small sample size, this paper shows that easily implementable stopping rules can terminate dose-escalation early with minimal loss to the accuracy of maximum tolerated dose estimation. These stopping rules are developed when the goal is to identify one or two dose levels, as the maximum tolerated dose and co-maximum tolerated dose. In oncology, this latter goal is frequently considered when the study includes dose-expansion cohorts, which are used to further estimate and compare the safety and efficacy of one or two dose levels. As study protocols do not typically halt accrual between escalation and expansion, early termination is of clinical importance as it either allows for additional patients to be treated as part of the dose expansion cohort to obtain more precise estimates of the study endpoints or allows for an overall reduction in the total sample size.

Download Full-text

Mean corrected generalized estimating equations for longitudinal binary outcomes with report bias

Statistical Methods in Medical Research ◽

10.1177/09622802211065160 ◽

2021 ◽

pp. 096228022110651

Author(s):

Chao Li ◽

Ye Shen ◽

Qian Xiao ◽

Stephen L Rathbun ◽

Hui Huang ◽

...

Keyword(s):

Generalized Estimating Equations ◽

Behavioral Therapy ◽

Estimating Equations ◽

The Self ◽

Binary Outcomes ◽

Urine Test ◽

Test Results ◽

Cocaine Use ◽

Report Bias ◽

Generalized Estimating

Cocaine addiction is an important public health problem worldwide. Cognitive-behavioral therapy is a counseling intervention for supporting cocaine-dependent individuals through recovery and relapse prevention. It may reduce patients’ cocaine uses by improving their motivations and enabling them to recognize risky situations. To study the effect of cognitive behavioral therapy on cocaine dependence, the self-reported cocaine use with urine test data were collected at the Primary Care Center of Yale-New Haven Hospital. Its outcomes are binary, including both the daily self-reported drug uses and weekly urine test results. To date, the generalized estimating equations are widely used to analyze binary data with repeated measures. However, due to the existence of significant self-report bias in the self-reported cocaine use with urine test data, a direct application of the generalized estimating equations approach may not be valid. In this paper, we proposed a novel mean corrected generalized estimating equations approach for analyzing longitudinal binary outcomes subject to reporting bias. The mean corrected generalized estimating equations can provide consistently and asymptotically normally distributed estimators under true contamination probabilities. In the self-reported cocaine use with urine test study, accurate weekly urine test results are used to detect contamination. The superior performances of the proposed method are illustrated by both simulation studies and real data analysis.

Download Full-text

Estimating a time-to-event distribution from right-truncated data in an epidemic: A review of methods

Statistical Methods in Medical Research ◽

10.1177/09622802211023955 ◽

2021 ◽

pp. 096228022110239

Author(s):

Shaun R Seaman ◽

Anne Presanis ◽

Christopher Jackson

Keyword(s):

Infectious Disease ◽

Population Distribution ◽

Event Data ◽

Time To Event ◽

Disease Symptoms ◽

Truncated Data ◽

Time To Event Data ◽

Event Distribution ◽

Using Data ◽

Right Truncation

Time-to-event data are right-truncated if only individuals who have experienced the event by a certain time can be included in the sample. For example, we may be interested in estimating the distribution of time from onset of disease symptoms to death and only have data on individuals who have died. This may be the case, for example, at the beginning of an epidemic. Right truncation causes the distribution of times to event in the sample to be biased towards shorter times compared to the population distribution, and appropriate statistical methods should be used to account for this bias. This article is a review of such methods, particularly in the context of an infectious disease epidemic, like COVID-19. We consider methods for estimating the marginal time-to-event distribution, and compare their efficiencies. (Non-)identifiability of the distribution is an important issue with right-truncated data, particularly at the beginning of an epidemic, and this is discussed in detail. We also review methods for estimating the effects of covariates on the time to event. An illustration of the application of many of these methods is provided, using data on individuals who had died with coronavirus disease by 5 April 2020.

Download Full-text

Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number

Statistical Methods in Medical Research ◽

10.1177/09622802211065159 ◽

2021 ◽

pp. 096228022110651

Author(s):

Robert Challen ◽

Ellen Brooks-Pollock ◽

Krasimira Tsaneva-Atanasova ◽

Leon Danon

Keyword(s):

Incubation Period ◽

Time Delays ◽

Hospital Admissions ◽

Reproduction Number ◽

Meta Analysis ◽

Model Parameters ◽

Small Data ◽

Generation Interval ◽

Serial Interval ◽

The Impact

The serial interval of an infectious disease, commonly interpreted as the time between the onset of symptoms in sequentially infected individuals within a chain of transmission, is a key epidemiological quantity involved in estimating the reproduction number. The serial interval is closely related to other key quantities, including the incubation period, the generation interval (the time between sequential infections), and time delays between infection and the observations associated with monitoring an outbreak such as confirmed cases, hospital admissions, and deaths. Estimates of these quantities are often based on small data sets from early contact tracing and are subject to considerable uncertainty, which is especially true for early coronavirus disease 2019 data. In this paper, we estimate these key quantities in the context of coronavirus disease 2019 for the UK, including a meta-analysis of early estimates of the serial interval. We estimate distributions for the serial interval with a mean of 5.9 (95% CI 5.2; 6.7) and SD 4.1 (95% CI 3.8; 4.7) days (empirical distribution), the generation interval with a mean of 4.9 (95% CI 4.2; 5.5) and SD 2.0 (95% CI 0.5; 3.2) days (fitted gamma distribution), and the incubation period with a mean 5.2 (95% CI 4.9; 5.5) and SD 5.5 (95% CI 5.1; 5.9) days (fitted log-normal distribution). We quantify the impact of the uncertainty surrounding the serial interval, generation interval, incubation period, and time delays, on the subsequent estimation of the reproduction number, when pragmatic and more formal approaches are taken. These estimates place empirical bounds on the estimates of most relevant model parameters and are expected to contribute to modeling coronavirus disease 2019 transmission.

Download Full-text

Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression

Statistical Methods in Medical Research ◽

10.1177/09622802211065405 ◽

2021 ◽

pp. 096228022110654

Author(s):

Ashwini Joshi ◽

Angelika Geroldinger ◽

Lena Jiricka ◽

Pralay Senchaudhuri ◽

Christopher Corcoran ◽

...

Keyword(s):

Maximum Likelihood ◽

Poisson Regression ◽

Sparse Data ◽

Bias Reduction ◽

Maximum Likelihood Estimates ◽

Regression Coefficients ◽

Bayesian Estimator ◽

Parameter Estimates ◽

Small Data ◽

Conditional Poisson

Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s general approach to bias reduction, exact conditional Poisson regression, and a Bayesian estimator using weakly informative priors that can be obtained via data augmentation. Furthermore, we include in our evaluation a new proposal for a modification of Firth’s approach, improving its performance for predictions without compromising its attractive bias-correcting properties for regression coefficients. We illustrate the issue of the nonexistence of maximum likelihood estimates with a dataset arising from the recent outbreak of COVID-19 and an example from implant dentistry. All methods are evaluated in a comprehensive simulation study under a variety of realistic scenarios, evaluating their performance for prediction and estimation. To conclude, while exact conditional Poisson regression may be confined to small data sets only, both the modification of Firth’s approach and the Bayesian estimator are universally applicable solutions with attractive properties for prediction and estimation. While the Bayesian method needs specification of prior variances for the regression coefficients, the modified Firth approach does not require any user input.

Download Full-text

Considerations on the region of interest in the ROC space

Statistical Methods in Medical Research ◽

10.1177/09622802211060515 ◽

2021 ◽

pp. 096228022110605

Author(s):

Luigi Lavazza ◽

Sandro Morasca

Keyword(s):

Receiver Operating Characteristic ◽

Operating Characteristic ◽

False Positive Rate ◽

Area Under The Curve ◽

Region Of Interest ◽

True Positive Rate ◽

True Positive ◽

Receiver Operating Characteristic Curves ◽

Positive Rate ◽

Receiver Operating

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that [Formula: see text] (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.

Download Full-text

Concordance correlation coefficients estimated by modified variance components and generalized estimating equations for longitudinal overdispersed Poisson data

Statistical Methods in Medical Research ◽

10.1177/09622802211065156 ◽

2021 ◽

pp. 096228022110651

Author(s):

Miao-Yu Tsai ◽

Chia-Ni Sun ◽

Chao-Chun Lin

Keyword(s):

Correlation Coefficient ◽

Generalized Estimating Equations ◽

Variance Component ◽

Estimating Equations ◽

Estimating Equation ◽

Repeated Measurements ◽

Concordance Correlation Coefficient ◽

Concordance Correlation ◽

Poisson Data ◽

Generalized Estimating

For longitudinal overdispersed Poisson data sets, estimators of the intra-, inter-, and total concordance correlation coefficient through variance components have been proposed. However, biased estimators of quadratic forms are used in concordance correlation coefficient estimation. In addition, the generalized estimating equations approach has been used in estimating agreement for longitudinal normal data and not for longitudinal overdispersed Poisson data. Therefore, this paper proposes a modified variance component approach to develop the unbiased estimators of the concordance correlation coefficient for longitudinal overdispersed Poisson data. Further, the indices of intra-, inter-, and total agreement through generalized estimating equations are also developed considering the correlation structure of longitudinal count repeated measurements. Simulation studies are conducted to compare the performance of the modified variance component and generalized estimating equation approaches for longitudinal Poisson and overdispersed Poisson data sets. An application of corticospinal diffusion tensor tractography study is used for illustration. In conclusion, the modified variance component approach performs outstandingly well with small mean square errors and nominal 95% coverage rates. The generalized estimating equation approach provides in model assumption flexibility of correlation structures for repeated measurements to produce satisfactory concordance correlation coefficient estimation results.

Download Full-text

Statistical Methods in Medical Research
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Sage Publications

Model averaging with the hybrid model: An asymptotic study and demonstration

On estimating a constrained bivariate random effects model for meta-analysis of test accuracy studies

A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy

Stopping rules for phase I clinical trials with dose expansion cohorts

Mean corrected generalized estimating equations for longitudinal binary outcomes with report bias

Estimating a time-to-event distribution from right-truncated data in an epidemic: A review of methods

Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number

Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression

Considerations on the region of interest in the ROC space

Concordance correlation coefficients estimated by modified variance components and generalized estimating equations for longitudinal overdispersed Poisson data

Export Citation Format

Statistical Methods in Medical ResearchLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Sage Publications

Model averaging with the hybrid model: An asymptotic study and demonstration

On estimating a constrained bivariate random effects model for meta-analysis of test accuracy studies

A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy

Stopping rules for phase I clinical trials with dose expansion cohorts

Mean corrected generalized estimating equations for longitudinal binary outcomes with report bias

Estimating a time-to-event distribution from right-truncated data in an epidemic: A review of methods

Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number

Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression

Considerations on the region of interest in the ROC space

Concordance correlation coefficients estimated by modified variance components and generalized estimating equations for longitudinal overdispersed Poisson data

Statistical Methods in Medical Research
Latest Publications