Consequences of Ignoring Clustering in Linear Regression

Abstract BackgroundClustering of observations is a common phenomenon in epidemiological and clinical research. Previous studies have highlighted the importance of using multilevel analysis to account for such clustering, but in practice, methods ignoring clustering are often used. We used simulated data to explore the circumstances in which failure to account for clustering in linear regression analysis could lead to importantly erroneous conclusions. MethodsWe simulated data following the random-intercept model specification under different scenarios of clustering of a continuous outcome and a single continuous or binary explanatory variable. We fitted random-intercept (RI) and cluster-unadjusted ordinary least squares (OLS) models and compared the derived estimates of effect, as quantified by regression coefficients, and their estimated precision. We also assessed the extent to which coverage by 95% confidence intervals and rates of Type I error were appropriate. ResultsWe found that effects estimated from OLS linear regression models that ignored clustering were on average unbiased. The precision of effect estimates from the OLS model was overestimated when both the outcome and explanatory variable were continuous. By contrast, in linear regression with a binary explanatory variable, in most circumstances, the precision of effects was somewhat underestimated by the OLS model. The magnitude of bias, both in point estimates and their precision, increased with greater clustering of the outcome variable, and was influenced also by the amount of clustering in the explanatory variable. The cluster-unadjusted model resulted in poor coverage rates by 95% confidence intervals and high rates of Type I error especially when the explanatory variable was continuous. ConclusionsIn this study we identified situations in which an OLS regression model is more likely to affect statistical inference, namely when the explanatory variable is continuous, and its intraclass correlation coefficient is higher than 0.01. Situations in which statistical inference is less likely to be affected have also been identified.

Download Full-text

Consequences of ignoring clustering in linear regression

BMC Medical Research Methodology ◽

10.1186/s12874-021-01333-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Georgia Ntani ◽

Hazel Inskip ◽

Clive Osmond ◽

David Coggon

Keyword(s):

Confidence Intervals ◽

Type I Error ◽

Explanatory Variable ◽

Simulated Data ◽

Error Rates ◽

Outcome Variable ◽

Type I ◽

Type I Error Rates ◽

True Value ◽

Coverage Rates

Abstract Background Clustering of observations is a common phenomenon in epidemiological and clinical research. Previous studies have highlighted the importance of using multilevel analysis to account for such clustering, but in practice, methods ignoring clustering are often employed. We used simulated data to explore the circumstances in which failure to account for clustering in linear regression could lead to importantly erroneous conclusions. Methods We simulated data following the random-intercept model specification under different scenarios of clustering of a continuous outcome and a single continuous or binary explanatory variable. We fitted random-intercept (RI) and ordinary least squares (OLS) models and compared effect estimates with the “true” value that had been used in simulation. We also assessed the relative precision of effect estimates, and explored the extent to which coverage by 95% confidence intervals and Type I error rates were appropriate. Results We found that effect estimates from both types of regression model were on average unbiased. However, deviations from the “true” value were greater when the outcome variable was more clustered. For a continuous explanatory variable, they tended also to be greater for the OLS than the RI model, and when the explanatory variable was less clustered. The precision of effect estimates from the OLS model was overestimated when the explanatory variable varied more between than within clusters, and was somewhat underestimated when the explanatory variable was less clustered. The cluster-unadjusted model gave poor coverage rates by 95% confidence intervals and high Type I error rates when the explanatory variable was continuous. With a binary explanatory variable, coverage rates by 95% confidence intervals and Type I error rates deviated from nominal values when the outcome variable was more clustered, but the direction of the deviation varied according to the overall prevalence of the explanatory variable, and the extent to which it was clustered. Conclusions In this study we identified circumstances in which application of an OLS regression model to clustered data is more likely to mislead statistical inference. The potential for error is greatest when the explanatory variable is continuous, and the outcome variable more clustered (intraclass correlation coefficient is ≥ 0.01).

Download Full-text

Performance of Matching Methods as Compared With Unmatched Ordinary Least Squares Regression Under Constant Effects

American Journal of Epidemiology ◽

10.1093/aje/kwz093 ◽

2019 ◽

Vol 188 (7) ◽

pp. 1345-1354 ◽

Cited By ~ 3

Author(s):

Anusha M Vable ◽

Mathew V Kiang ◽

M Maria Glymour ◽

Joseph Rigdon ◽

Emmanuel F Drabo ◽

...

Keyword(s):

Propensity Score ◽

Least Squares ◽

Propensity Score Matching ◽

Confidence Intervals ◽

Error Rate ◽

Type I Error ◽

Ordinary Least Squares ◽

Type I ◽

Exact Matching ◽

Common Support

AbstractMatching methods are assumed to reduce the likelihood of a biased inference compared with ordinary least squares (OLS) regression. Using simulations, we compared inferences from propensity score matching, coarsened exact matching, and unmatched covariate-adjusted OLS regression to identify which methods, in which scenarios, produced unbiased inferences at the expected type I error rate of 5%. We simulated multiple data sets and systematically varied common support, discontinuities in the exposure and/or outcome, exposure prevalence, and analytical model misspecification. Matching inferences were often biased in comparison with OLS, particularly when common support was poor; when analysis models were correctly specified and common support was poor, the type I error rate was 1.6% for propensity score matching (statistically inefficient), 18.2% for coarsened exact matching (high), and 4.8% for OLS (expected). Our results suggest that when estimates from matching and OLS are similar (i.e., confidence intervals overlap), OLS inferences are unbiased more often than matching inferences; however, when estimates from matching and OLS are dissimilar (i.e., confidence intervals do not overlap), matching inferences are unbiased more often than OLS inferences. This empirical “rule of thumb” may help applied researchers identify situations in which OLS inferences may be unbiased as compared with matching inferences.

Download Full-text

On Some Test Statistics for Testing the Regression Coefficients in Presence of Multicollinearity: A Simulation Study

Stats ◽

10.3390/stats3010005 ◽

2020 ◽

Vol 3 (1) ◽

pp. 40-55 ◽

Cited By ~ 1

Author(s):

Sergio Perez-Melo ◽

B. M. Golam Kibria

Keyword(s):

Linear Regression ◽

Simulation Study ◽

Ridge Regression ◽

Type I Error ◽

Ordinary Least Squares ◽

Error Rates ◽

Type I ◽

Linear Regression Models ◽

Nominal Size ◽

Simulation Results

Ridge regression is a popular method to solve the multicollinearity problem for both linear and non-linear regression models. This paper studied forty different ridge regression t-type tests of the individual coefficients of a linear regression model. A simulation study was conducted to evaluate the performance of the proposed tests with respect to their empirical sizes and powers under different settings. Our simulation results demonstrated that many of the proposed tests have type I error rates close to the 5% nominal level and, among those, all tests except one have considerable gain in powers over the standard ordinary least squares (OLS) t-type test. It was observed from our simulation results that seven tests based on some ridge estimators performed better than the rest in terms of achieving higher power gains while maintaining a 5% nominal size.

Download Full-text

Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses

The International Journal of Biostatistics ◽

10.2202/1557-4679.1146 ◽

2009 ◽

Vol 5 (1) ◽

Cited By ~ 65

Author(s):

Peter C Austin

Keyword(s):

Propensity Score ◽

Confidence Intervals ◽

Variance Estimation ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Type I Error Rates

Download Full-text

A Prototype for Brazilian Bankcheck Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001497000238 ◽

1997 ◽

Vol 11 (04) ◽

pp. 549-569 ◽

Cited By ~ 16

Author(s):

Luan L. Lee ◽

Miguel G. Lizarraga ◽

Natanael R. Gomes ◽

Alessandro L. Koerich

Keyword(s):

Information Extraction ◽

Character Recognition ◽

Type I Error ◽

Recognition Rate ◽

Simulated Data ◽

Recognition System ◽

Signature Verification ◽

Type I ◽

Verification Algorithm ◽

Commercial Applications

This paper describes a prototype for Brazilian bankcheck recognition. The description is divided into three topics: bankcheck information extraction, digit amount recognition and signature verification. In bankcheck information extraction, our algorithms provide signature and digit amount images free of background patterns and bankcheck printed information. In digit amount recognition, we dealt with the digit amount segmentation and implementation of a complete numeral character recognition system involving image processing, feature extraction and neural classification. In signature verification, we designed and implemented a static signature verification system suitable for banking and commercial applications. Our signature verification algorithm is capable of detecting both simple, random and skilled forgeries. The proposed automatic bankcheck recognition prototype was intensively tested by real bankcheck data as well as simulated data providing the following performance results: for skilled forgeries, 4.7% equal error rate; for random forgeries, zero Type I error and 7.3% Type II error; for bankcheck numerals, 92.7% correct recognition rate.

Download Full-text

Statistical inference of genetic pathway analysis in high dimensions

Biometrika ◽

10.1093/biomet/asz033 ◽

2019 ◽

Vol 106 (3) ◽

pp. 651-651

Author(s):

Yang Liu ◽

Wei Sun ◽

Alexander P Reiner ◽

Charles Kooperberg ◽

Qianchuan He

Keyword(s):

Statistical Inference ◽

Pathway Analysis ◽

Genetic Variants ◽

Error Control ◽

Genome Wide Association Study ◽

Type I Error ◽

High Density Lipoproteins ◽

Type I ◽

Genetic Pathway ◽

A Genome

Summary Genetic pathway analysis has become an important tool for investigating the association between a group of genetic variants and traits. With dense genotyping and extensive imputation, the number of genetic variants in biological pathways has increased considerably and sometimes exceeds the sample size $n$. Conducting genetic pathway analysis and statistical inference in such settings is challenging. We introduce an approach that can handle pathways whose dimension $p$ could be greater than $n$. Our method can be used to detect pathways that have nonsparse weak signals, as well as pathways that have sparse but stronger signals. We establish the asymptotic distribution for the proposed statistic and conduct theoretical analysis on its power. Simulation studies show that our test has correct Type I error control and is more powerful than existing approaches. An application to a genome-wide association study of high-density lipoproteins demonstrates the proposed approach.

Download Full-text

Integrated Optimal Fingerprinting: Method Description and Illustration

Journal of Climate ◽

10.1175/jcli-d-14-00124.1 ◽

2016 ◽

Vol 29 (6) ◽

pp. 1977-1998 ◽

Cited By ~ 6

Author(s):

Alexis Hannart

Keyword(s):

Linear Regression ◽

Confidence Intervals ◽

Estimation Error ◽

Nuisance Parameter ◽

Simulated Data ◽

Uncertainty Assessment ◽

Suggested Approach ◽

Internal Climate Variability ◽

Detection And Attribution ◽

Optimal Fingerprinting

Abstract The present paper introduces and illustrates methodological developments intended for so-called optimal fingerprinting methods, which are of frequent use in detection and attribution studies. These methods used to involve three independent steps: preliminary reduction of the dimension of the data, estimation of the covariance associated to internal climate variability, and, finally, linear regression inference with associated uncertainty assessment. It is argued that such a compartmentalized treatment presents several issues; an integrated method is thus introduced to address them. The suggested approach is based on a single-piece statistical model that represents both linear regression and control runs. The unknown covariance is treated as a nuisance parameter that is eliminated by integration. This allows for the introduction of regularization assumptions. Point estimates and confidence intervals follow from the integrated likelihood. Further, it is shown that preliminary dimension reduction is not required for implementability and that computational issues associated to using the raw, high-dimensional, spatiotemporal data can be resolved quite easily. Results on simulated data show improved performance compared to existing methods w.r.t. both estimation error and accuracy of confidence intervals and also highlight the need for further improvements regarding the latter. The method is illustrated on twentieth-century precipitation and surface temperature, suggesting a potentially high informational benefit of using the raw, nondimension-reduced data in detection and attribution (D&A), provided model error is appropriately built into the inference.

Download Full-text

Analysis of multicenter clinical trials with very low event rates

Trials ◽

10.1186/s13063-020-04801-5 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jiyu Kim ◽

Andrea B. Troxel ◽

Scott D. Halpern ◽

Kevin G. Volpp ◽

Brennan C. Kahan ◽

...

Keyword(s):

Error Rate ◽

Odds Ratio ◽

Type I Error ◽

Event Rate ◽

Small Sample ◽

Type I ◽

Type I Error Rate ◽

Random Intercept ◽

Small Sample Correction ◽

Outcome Event

Abstract Introduction In a five-arm randomized clinical trial (RCT) with stratified randomization across 54 sites, we encountered low primary outcome event proportions, resulting in multiple sites with zero events either overall or in one or more study arms. In this paper, we systematically evaluated different statistical methods of accounting for center in settings with low outcome event proportions. Methods We conducted a simulation study and a reanalysis of a completed RCT to compare five popular methods of estimating an odds ratio for multicenter trials with stratified randomization by center: (i) no center adjustment, (ii) random intercept model, (iii) Mantel–Haenszel model, (iv) generalized estimating equation (GEE) with an exchangeable correlation structure, and (v) GEE with small sample correction (GEE-small sample correction). We varied the number of total participants (200, 500, 1000, 5000), number of centers (5, 50, 100), control group outcome percentage (2%, 5%, 10%), true odds ratio (1, > 1), intra-class correlation coefficient (ICC) (0.025, 0.075), and distribution of participants across the centers (balanced, skewed). Results Mantel–Haenszel methods generally performed poorly in terms of power and bias and led to the exclusion of participants from the analysis because some centers had no events. Failure to account for center in the analysis generally led to lower power and type I error rates than other methods, particularly with ICC = 0.075. GEE had an inflated type I error rate except in some settings with a large number of centers. GEE-small sample correction maintained the type I error rate at the nominal level but suffered from reduced power and convergence issues in some settings when the number of centers was small. Random intercept models generally performed well in most scenarios, except with a low event rate (i.e., 2% scenario) and small total sample size (n ≤ 500), when all methods had issues. Discussion Random intercept models generally performed best across most scenarios. GEE-small sample correction performed well when the number of centers was large. We do not recommend the use of Mantel–Haenszel, GEE, or models that do not account for center. When the expected event rate is low, we suggest that the statistical analysis plan specify an alternative method in the case of non-convergence of the primary method.

Download Full-text

Non-inferiority designs comparing placebo to a proven therapy for childhood pneumonia in low-resource settings

Clinical Trials ◽

10.1177/1740774519888460 ◽

2019 ◽

Vol 17 (2) ◽

pp. 129-137

Author(s):

Susanne May ◽

Siobhan P Brown ◽

Robert H Schmicker ◽

Scott S. Emerson ◽

Evangelyn Nkwopara ◽

...

Keyword(s):

Confidence Intervals ◽

Type I Error ◽

Cost Effective ◽

Type I ◽

Low Resource ◽

Resource Setting ◽

Low Resource Setting ◽

New Treatment ◽

Intent To Treat ◽

Design Options

Background/aims: After a new treatment is recommended to be first-line treatment for a specific indication, outcome and population, it may be unethical to use placebo as a comparator in trials for that setting. Nevertheless, in specific circumstances, use of a placebo group might be warranted, for example, when it is believed that an active treatment may not be efficacious or cost-effective for a specific subpopulation. An example is antibiotic treatment for pneumonia, which may not be effective for many patients taking it due to the emergence of antibiotic-resistant strains or the high prevalence of viral and low prevalence of bacterial pneumonia. Methods: We explore the applicability of different design options in cases where the benefit of an established treatment is questioned, with particular emphasis on issues that arise in a low-resource setting. Using the example of a clinical trial comparing the effectiveness of placebo versus amoxicillin in treating children 2–59 months of age with fast breathing pneumonia in Lilongwe, Malawi, we discuss the pros and cons of superiority versus non-inferiority designs, an intent-to-treat versus as-treated analysis and the use and interpretation of one- versus two-sided confidence intervals. Results: We find that a non-inferiority design using an intent-to-treat analysis is the most appropriate design and analysis option. In addition, the presentation of one- versus two-sided confidence intervals can depend on the results but can maintain type I error. Conclusion: In the setting where the benefit of a previously established beneficial treatment is questioned, a non-inferiority design that includes placebo as the tested treatment option can be the most appropriate design option.

Download Full-text

Hypothesis Testing in Business Administration

Oxford Research Encyclopedia of Business and Management ◽

10.1093/acrefore/9780190224851.013.279 ◽

2020 ◽

Author(s):

Rand R. Wilcox

Keyword(s):

Hypothesis Testing ◽

Empirical Evidence ◽

Confidence Intervals ◽

Error Probability ◽

Type I Error ◽

Type I ◽

Type Ii ◽

Practical Concern ◽

Type I Error Probability ◽

The Difference

Hypothesis testing is an approach to statistical inference that is routinely taught and used. It is based on a simple idea: develop some relevant speculation about the population of individuals or things under study and determine whether data provide reasonably strong empirical evidence that the hypothesis is wrong. Consider, for example, two approaches to advertising a product. A study might be conducted to determine whether it is reasonable to assume that both approaches are equally effective. A Type I error is rejecting this speculation when in fact it is true. A Type II error is failing to reject when the speculation is false. A common practice is to test hypotheses with the type I error probability set to 0.05 and to declare that there is a statistically significant result if the hypothesis is rejected. There are various concerns about, limitations to, and criticisms of this approach. One criticism is the use of the term significant. Consider the goal of comparing the means of two populations of individuals. Saying that a result is significant suggests that the difference between the means is large and important. But in the context of hypothesis testing it merely means that there is empirical evidence that the means are not equal. Situations can and do arise where a result is declared significant, but the difference between the means is trivial and unimportant. Indeed, the goal of testing the hypothesis that two means are equal has been criticized based on the argument that surely the means differ at some decimal place. A simple way of dealing with this issue is to reformulate the goal. Rather than testing for equality, determine whether it is reasonable to make a decision about which group has the larger mean. The components of hypothesis-testing techniques can be used to address this issue with the understanding that the goal of testing some hypothesis has been replaced by the goal of determining whether a decision can be made about which group has the larger mean. Another aspect of hypothesis testing that has seen considerable criticism is the notion of a p-value. Suppose some hypothesis is rejected with the Type I error probability set to 0.05. This leaves open the issue of whether the hypothesis would be rejected with Type I error probability set to 0.025 or 0.01. A p-value is the smallest Type I error probability for which the hypothesis is rejected. When comparing means, a p-value reflects the strength of the empirical evidence that a decision can be made about which has the larger mean. A concern about p-values is that they are often misinterpreted. For example, a small p-value does not necessarily mean that a large or important difference exists. Another common mistake is to conclude that if the p-value is close to zero, there is a high probability of rejecting the hypothesis again if the study is replicated. The probability of rejecting again is a function of the extent that the hypothesis is not true, among other things. Because a p-value does not directly reflect the extent the hypothesis is false, it does not provide a good indication of whether a second study will provide evidence to reject it. Confidence intervals are closely related to hypothesis-testing methods. Basically, they are intervals that contain unknown quantities with some specified probability. For example, a goal might be to compute an interval that contains the difference between two population means with probability 0.95. Confidence intervals can be used to determine whether some hypothesis should be rejected. Clearly, confidence intervals provide useful information not provided by testing hypotheses and computing a p-value. But an argument for a p-value is that it provides a perspective on the strength of the empirical evidence that a decision can be made about the relative magnitude of the parameters of interest. For example, to what extent is it reasonable to decide whether the first of two groups has the larger mean? Even if a compelling argument can be made that p-values should be completely abandoned in favor of confidence intervals, there are situations where p-values provide a convenient way of developing reasonably accurate confidence intervals. Another argument against p-values is that because they are misinterpreted by some, they should not be used. But if this argument is accepted, it follows that confidence intervals should be abandoned because they are often misinterpreted as well. Classic hypothesis-testing methods for comparing means and studying associations assume sampling is from a normal distribution. A fundamental issue is whether nonnormality can be a source of practical concern. Based on hundreds of papers published during the last 50 years, the answer is an unequivocal Yes. Granted, there are situations where nonnormality is not a practical concern, but nonnormality can have a substantial negative impact on both Type I and Type II errors. Fortunately, there is a vast literature describing how to deal with known concerns. Results based solely on some hypothesis-testing approach have clear implications about methods aimed at computing confidence intervals. Nonnormal distributions that tend to generate outliers are one source for concern. There are effective methods for dealing with outliers, but technically sound techniques are not obvious based on standard training. Skewed distributions are another concern. The combination of what are called bootstrap methods and robust estimators provides techniques that are particularly effective for dealing with nonnormality and outliers. Classic methods for comparing means and studying associations also assume homoscedasticity. When comparing means, this means that groups are assumed to have the same amount of variance even when the means of the groups differ. Violating this assumption can have serious negative consequences in terms of both Type I and Type II errors, particularly when the normality assumption is violated as well. There is vast literature describing how to deal with this issue in a technically sound manner.

Download Full-text