Statistical Methods in Medical Research
Latest Publications





Published By Sage Publications

1477-0334, 0962-2802

2022 ◽  
pp. 096228022110417
Kian Wee Soh ◽  
Thomas Lumley ◽  
Cameron Walker ◽  
Michael O’Sullivan

In this paper, we present a new model averaging technique that can be applied in medical research. The dataset is first partitioned by the values of its categorical explanatory variables. Then for each partition, a model average is determined by minimising some form of squared errors, which could be the leave-one-out cross-validation errors. From our asymptotic optimality study and the results of simulations, we demonstrate under several high-level assumptions and modelling conditions that this model averaging procedure may outperform jackknife model averaging, which is a well-established technique. We also present an example where a cross-validation procedure does not work (that is, a zero-valued cross-validation error is obtained) when determining the weights for model averaging.

2022 ◽  
pp. 096228022110651
Mohammed Baragilly ◽  
Brian Harvey Willis

Tailored meta-analysis uses setting-specific knowledge for the test positive rate and disease prevalence to constrain the possible values for a test's sensitivity and specificity. The constrained region is used to select those studies relevant to the setting for meta-analysis using an unconstrained bivariate random effects model (BRM). However, sometimes there may be no studies to aggregate, or the summary estimate may lie outside the plausible or “applicable” region. Potentially these shortcomings may be overcome by incorporating the constraints in the BRM to produce a constrained model. Using a penalised likelihood approach we developed an optimisation algorithm based on co-ordinate ascent and Newton-Raphson iteration to fit a constrained bivariate random effects model (CBRM) for meta-analysis. Using numerical examples based on simulation studies and real datasets we compared its performance with the BRM in terms of bias, mean squared error and coverage probability. We also determined the ‘closeness’ of the estimates to their true values using the Euclidian and Mahalanobis distances. The CBRM produced estimates which in the majority of cases had lower absolute mean bias and greater coverage probability than the BRM. The estimated sensitivities and specificity for the CBRM were, in general, closer to the true values than the BRM. For the two real datasets, the CBRM produced estimates which were in the applicable region in contrast to the BRM. When combining setting-specific data with test accuracy meta-analysis, a constrained model is more likely to yield a plausible estimate for the sensitivity and specificity in the practice setting than an unconstrained model.

2022 ◽  
pp. 096228022110651
Mireille E Schnitzer ◽  
Steve Ferreira Guerra ◽  
Cristina Longo ◽  
Lucie Blais ◽  
Robert W Platt

Many studies seek to evaluate the effects of potentially harmful pregnancy exposures during specific gestational periods. We consider an observational pregnancy cohort where pregnant individuals can initiate medication usage or become exposed to a drug at various times during their pregnancy. An important statistical challenge involves how to define and estimate exposure effects when pregnancy loss or delivery can occur over time. Without proper consideration, the results of standard analysis may be vulnerable to selection bias, immortal time-bias, and time-dependent confounding. In this study, we apply the “target trials” framework of Hernán and Robins in order to define effects based on the counterfactual approach often used in causal inference. This effect is defined relative to a hypothetical randomized trial of timed pregnancy exposures where delivery may precede and thus potentially interrupt exposure initiation. We describe specific implementations of inverse probability weighting, G-computation, and Targeted Maximum Likelihood Estimation to estimate the effects of interest. We demonstrate the performance of all estimators using simulated data and show that a standard implementation of inverse probability weighting is biased. We then apply our proposed methods to a pharmacoepidemiology study to evaluate the potentially time-dependent effect of exposure to inhaled corticosteroids on birthweight in pregnant people with mild asthma.

2021 ◽  
pp. 096228022110649
Sean M Devlin ◽  
Alexia Iasonos ◽  
John O’Quigley

Many clinical trials incorporate stopping rules to terminate early if the clinical question under study can be answered with a high degree of confidence. While common in later-stage trials, these rules are rarely implemented in dose escalation studies, due in part to the relatively smaller sample size of these designs. However, even with a small sample size, this paper shows that easily implementable stopping rules can terminate dose-escalation early with minimal loss to the accuracy of maximum tolerated dose estimation. These stopping rules are developed when the goal is to identify one or two dose levels, as the maximum tolerated dose and co-maximum tolerated dose. In oncology, this latter goal is frequently considered when the study includes dose-expansion cohorts, which are used to further estimate and compare the safety and efficacy of one or two dose levels. As study protocols do not typically halt accrual between escalation and expansion, early termination is of clinical importance as it either allows for additional patients to be treated as part of the dose expansion cohort to obtain more precise estimates of the study endpoints or allows for an overall reduction in the total sample size.

2021 ◽  
pp. 096228022110239
Shaun R Seaman ◽  
Anne Presanis ◽  
Christopher Jackson

Time-to-event data are right-truncated if only individuals who have experienced the event by a certain time can be included in the sample. For example, we may be interested in estimating the distribution of time from onset of disease symptoms to death and only have data on individuals who have died. This may be the case, for example, at the beginning of an epidemic. Right truncation causes the distribution of times to event in the sample to be biased towards shorter times compared to the population distribution, and appropriate statistical methods should be used to account for this bias. This article is a review of such methods, particularly in the context of an infectious disease epidemic, like COVID-19. We consider methods for estimating the marginal time-to-event distribution, and compare their efficiencies. (Non-)identifiability of the distribution is an important issue with right-truncated data, particularly at the beginning of an epidemic, and this is discussed in detail. We also review methods for estimating the effects of covariates on the time to event. An illustration of the application of many of these methods is provided, using data on individuals who had died with coronavirus disease by 5 April 2020.

2021 ◽  
pp. 096228022110651
Chao Li ◽  
Ye Shen ◽  
Qian Xiao ◽  
Stephen L Rathbun ◽  
Hui Huang ◽  

Cocaine addiction is an important public health problem worldwide. Cognitive-behavioral therapy is a counseling intervention for supporting cocaine-dependent individuals through recovery and relapse prevention. It may reduce patients’ cocaine uses by improving their motivations and enabling them to recognize risky situations. To study the effect of cognitive behavioral therapy on cocaine dependence, the self-reported cocaine use with urine test data were collected at the Primary Care Center of Yale-New Haven Hospital. Its outcomes are binary, including both the daily self-reported drug uses and weekly urine test results. To date, the generalized estimating equations are widely used to analyze binary data with repeated measures. However, due to the existence of significant self-report bias in the self-reported cocaine use with urine test data, a direct application of the generalized estimating equations approach may not be valid. In this paper, we proposed a novel mean corrected generalized estimating equations approach for analyzing longitudinal binary outcomes subject to reporting bias. The mean corrected generalized estimating equations can provide consistently and asymptotically normally distributed estimators under true contamination probabilities. In the self-reported cocaine use with urine test study, accurate weekly urine test results are used to detect contamination. The superior performances of the proposed method are illustrated by both simulation studies and real data analysis.

2021 ◽  
pp. 096228022110651
Robert Challen ◽  
Ellen Brooks-Pollock ◽  
Krasimira Tsaneva-Atanasova ◽  
Leon Danon

The serial interval of an infectious disease, commonly interpreted as the time between the onset of symptoms in sequentially infected individuals within a chain of transmission, is a key epidemiological quantity involved in estimating the reproduction number. The serial interval is closely related to other key quantities, including the incubation period, the generation interval (the time between sequential infections), and time delays between infection and the observations associated with monitoring an outbreak such as confirmed cases, hospital admissions, and deaths. Estimates of these quantities are often based on small data sets from early contact tracing and are subject to considerable uncertainty, which is especially true for early coronavirus disease 2019 data. In this paper, we estimate these key quantities in the context of coronavirus disease 2019 for the UK, including a meta-analysis of early estimates of the serial interval. We estimate distributions for the serial interval with a mean of 5.9 (95% CI 5.2; 6.7) and SD 4.1 (95% CI 3.8; 4.7) days (empirical distribution), the generation interval with a mean of 4.9 (95% CI 4.2; 5.5) and SD 2.0 (95% CI 0.5; 3.2) days (fitted gamma distribution), and the incubation period with a mean 5.2 (95% CI 4.9; 5.5) and SD 5.5 (95% CI 5.1; 5.9) days (fitted log-normal distribution). We quantify the impact of the uncertainty surrounding the serial interval, generation interval, incubation period, and time delays, on the subsequent estimation of the reproduction number, when pragmatic and more formal approaches are taken. These estimates place empirical bounds on the estimates of most relevant model parameters and are expected to contribute to modeling coronavirus disease 2019 transmission.

2021 ◽  
pp. 096228022110654
Ashwini Joshi ◽  
Angelika Geroldinger ◽  
Lena Jiricka ◽  
Pralay Senchaudhuri ◽  
Christopher Corcoran ◽  

Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s general approach to bias reduction, exact conditional Poisson regression, and a Bayesian estimator using weakly informative priors that can be obtained via data augmentation. Furthermore, we include in our evaluation a new proposal for a modification of Firth’s approach, improving its performance for predictions without compromising its attractive bias-correcting properties for regression coefficients. We illustrate the issue of the nonexistence of maximum likelihood estimates with a dataset arising from the recent outbreak of COVID-19 and an example from implant dentistry. All methods are evaluated in a comprehensive simulation study under a variety of realistic scenarios, evaluating their performance for prediction and estimation. To conclude, while exact conditional Poisson regression may be confined to small data sets only, both the modification of Firth’s approach and the Bayesian estimator are universally applicable solutions with attractive properties for prediction and estimation. While the Bayesian method needs specification of prior variances for the regression coefficients, the modified Firth approach does not require any user input.

2021 ◽  
pp. 096228022110605
Luigi Lavazza ◽  
Sandro Morasca

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that [Formula: see text] (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.

2021 ◽  
pp. 096228022110651
Miao-Yu Tsai ◽  
Chia-Ni Sun ◽  
Chao-Chun Lin

For longitudinal overdispersed Poisson data sets, estimators of the intra-, inter-, and total concordance correlation coefficient through variance components have been proposed. However, biased estimators of quadratic forms are used in concordance correlation coefficient estimation. In addition, the generalized estimating equations approach has been used in estimating agreement for longitudinal normal data and not for longitudinal overdispersed Poisson data. Therefore, this paper proposes a modified variance component approach to develop the unbiased estimators of the concordance correlation coefficient for longitudinal overdispersed Poisson data. Further, the indices of intra-, inter-, and total agreement through generalized estimating equations are also developed considering the correlation structure of longitudinal count repeated measurements. Simulation studies are conducted to compare the performance of the modified variance component and generalized estimating equation approaches for longitudinal Poisson and overdispersed Poisson data sets. An application of corticospinal diffusion tensor tractography study is used for illustration. In conclusion, the modified variance component approach performs outstandingly well with small mean square errors and nominal 95% coverage rates. The generalized estimating equation approach provides in model assumption flexibility of correlation structures for repeated measurements to produce satisfactory concordance correlation coefficient estimation results.

Sign in / Sign up

Export Citation Format

Share Document