scholarly journals The Magnitude and Direction of Collider Bias for Binary Variables

2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Trang Quynh Nguyen ◽  
Allan Dafoe ◽  
Elizabeth L. Ogburn

AbstractSuppose we are interested in the effect of variableXon variableY. IfXandYboth influence, or are associated with variables that influence, a common outcome, called acollider, then conditioning on the collider (or on a variable influenced by the collider – its “child”) induces a spurious association betweenXandY, which is known as collider bias. Characterizing the magnitude and direction of collider bias is crucial for understanding the implications of selection bias and for adjudicating decisions about whether to control for variables that are known to be associated with both exposure and outcome but could be either confounders or colliders. Considering a class of situations where all variables are binary, and whereXandYeither are, or are respectively influenced by, two marginally independent causes of a collider, we derive collider bias that results from (i) conditioning on specific levels of the collider or its child (on the covariance, risk difference, and in two cases odds ratio, scales), or (ii) linear regression adjustment for, the collider or its child. We also derive simple conditions that determine the sign of such bias.

1998 ◽  
Vol 43 (4) ◽  
pp. 411-415 ◽  
Author(s):  
David L Streiner

This article describes various indices of risk, which is the probability that a person will develop a specific outcome. The risk difference is the absolute difference in risks between 2 groups and can be used either to compare the outcome of 2 groups, one of which was exposed to some genetic or environmental factor, or to see how much of an effect a treatment may have. The reciprocal of the risk difference, the number needed to treat, expresses how many patients must receive the intervention in order for 1 person to derive some benefit. Attributable risk reflects the proportion of cases due to some putative cause and indicates the number of cases that can be averted if the cause were removed. Finally, the relative risk and odds ratio reflect the relative differences between groups in achieving some outcome, either good (a cure) or bad (development of a disorder).


2019 ◽  
Vol 2019 ◽  
pp. 1-7 ◽  
Author(s):  
Epaminondas Markos Valsamis ◽  
Henry Husband ◽  
Gareth Ka-Wai Chan

Introduction. In healthcare, change is usually detected by statistical techniques comparing outcomes before and after an intervention. A common problem faced by researchers is distinguishing change due to secular trends from change due to an intervention. Interrupted time-series analysis has been shown to be effective in describing trends in retrospective time-series and in detecting change, but methods are often biased towards the point of the intervention. Binary outcomes are typically modelled by logistic regression where the log-odds of the binary event is expressed as a function of covariates such as time, making model parameters difficult to interpret. The aim of this study was to present a technique that directly models the probability of binary events to describe change patterns using linear sections. Methods. We describe a modelling method that fits progressively more complex linear sections to the time-series of binary variables. Model fitting uses maximum likelihood optimisation and models are compared for goodness of fit using Akaike’s Information Criterion. The best model describes the most likely change scenario. We applied this modelling technique to evaluate hip fracture patient mortality rate for a total of 2777 patients over a 6-year period, before and after the introduction of a dedicated hip fracture unit (HFU) at a Level 1, Major Trauma Centre. Results. The proposed modelling technique revealed time-dependent trends that explained how the implementation of the HFU influenced mortality rate in patients sustaining proximal femoral fragility fractures. The technique allowed modelling of the entire time-series without bias to the point of intervention. Modelling the binary variable of interest directly, as opposed to a transformed variable, improved the interpretability of the results. Conclusion. The proposed segmented linear regression modelling technique using maximum likelihood estimation can be employed to effectively detect trends in time-series of binary variables in retrospective studies.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yingxin Liu ◽  
Shiyu Zhou ◽  
Hongxia Wei ◽  
Shengli An

Abstract Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.


2010 ◽  
Vol 138 (2) ◽  
pp. 592-604 ◽  
Author(s):  
Jochen Bröcker

Abstract Logistic models are studied as a tool to convert dynamical forecast information (deterministic and ensemble) into probability forecasts. A logistic model is obtained by setting the logarithmic odds ratio equal to a linear combination of the inputs. As with any statistical model, logistic models will suffer from overfitting if the number of inputs is comparable to the number of forecast instances. Computational approaches to avoid overfitting by regularization are discussed, and efficient techniques for model assessment and selection are presented. A logit version of the lasso (originally a linear regression technique), is discussed. In lasso models, less important inputs are identified and the corresponding coefficient is set to zero, providing an efficient and automatic model reduction procedure. For the same reason, lasso models are particularly appealing for diagnostic purposes.


2013 ◽  
Vol 166 (2) ◽  
pp. 522-523 ◽  
Author(s):  
Dániel Aradi ◽  
Laurent Bonello ◽  
Dániel Kehl ◽  
András Komócsi

2016 ◽  
Author(s):  
Marcus R. Munafò ◽  
Kate Tilling ◽  
Amy E. Taylor ◽  
David M. Evans ◽  
George Davey Smith

AbstractLarge-scale cross-sectional and cohort studies have transformed our understanding of the genetic and environmental determinants of health outcomes. However, the representativeness of these samples may be limited – either through selection into studies, or by attrition from studies over time. Here we explore the potential impact of this selection bias on results obtained from these studies, from the perspective that this amounts to conditioning on a collider (i.e., a form of collider bias). While it is acknowledged that selection bias will have a strong effect on representativeness and prevalence estimates, it is often assumed that it should not have a strong impact on estimates of associations. We argue that because selection can induce collider bias (which occurs when two variables independently influence a third variable, and that third variable is conditioned upon), selection can lead to substantially biased estimates of associations. In particular, selection related to phenotypes can bias associations with genetic variants associated with those phenotypes. In simulations, we show that even modest influences on selection into, or attrition from, a study can generate biased and potentially misleading estimates of both phenotypic and genotypic associations. Our results highlight the value of knowing which population your study sample is representative of. If the factors influencing selection and attrition are known, they can be adjusted for. For example, having DNA available on most participants in a birth cohort study offers the possibility of investigating the extent to which polygenic scores predict subsequent participation, which in turn would enable sensitivity analyses of the extent to which bias might distort estimates.Key MessagesSelection bias (including selective attrition) may limit the representativeness of large-scale cross-sectional and cohort studies.This selection bias may induce collider bias (which occurs when two variables independently influence a third variable, and that variable is conditioned upon).This may lead to substantially biased estimates of associations, including of genetic associations, even when selection / attrition is relatively modest.


Sign in / Sign up

Export Citation Format

Share Document