Statistical Methods for Analyzing Collapsibility in Regression Models

1992 ◽  
Vol 17 (1) ◽  
pp. 51-74 ◽  
Author(s):  
Clifford C. Clogg ◽  
Eva Petkova ◽  
Edward S. Shihadeh

We give a unified treatment of statistical methods for assessing collapsibility in regression problems, including some possible extensions to the class of generalized linear models. Terminology is borrowed from the contingency table area where various methods for assessing collapsibility have been proposed. Our procedures, however, can be motivated by considering extensions, and alternative derivations, of common procedures for omitted-variable bias in linear regression. Exact tests and interval estimates with optimal properties are available for linear regression with normal errors, and asymptotic procedures follow for models with estimated weights. The methods given here can be used to compareβ1 and β2 in the common setting where the response function is first modeled asXβ1(reduced model) and then asXβ2+Zγ(full model), withZ a vector of covariates omitted from the reduced model. These procedures can be used in experimental settings (X= randomly asigned treatments,Z= covariates) or in nonexperimental settings where two models viewed as alternative behavioral or structural explanations are compared (one model withX only, another model withX andZ).

2018 ◽  
Vol 30 (12) ◽  
pp. 3227-3258 ◽  
Author(s):  
Ian H. Stevenson

Generalized linear models (GLMs) have a wide range of applications in systems neuroscience describing the encoding of stimulus and behavioral variables, as well as the dynamics of single neurons. However, in any given experiment, many variables that have an impact on neural activity are not observed or not modeled. Here we demonstrate, in both theory and practice, how these omitted variables can result in biased parameter estimates for the effects that are included. In three case studies, we estimate tuning functions for common experiments in motor cortex, hippocampus, and visual cortex. We find that including traditionally omitted variables changes estimates of the original parameters and that modulation originally attributed to one variable is reduced after new variables are included. In GLMs describing single-neuron dynamics, we then demonstrate how postspike history effects can also be biased by omitted variables. Here we find that omitted variable bias can lead to mistaken conclusions about the stability of single-neuron firing. Omitted variable bias can appear in any model with confounders—where omitted variables modulate neural activity and the effects of the omitted variables covary with the included effects. Understanding how and to what extent omitted variable bias affects parameter estimates is likely to be important for interpreting the parameters and predictions of many neural encoding models.


2016 ◽  
Vol 4 (2) ◽  
Author(s):  
Peter M. Steiner ◽  
Yongnam Kim

AbstractCausal inference with observational data frequently requires researchers to estimate treatment effects conditional on a set of observed covariates, hoping that they remove or at least reduce the confounding bias. Using a simple linear (regression) setting with two confounders – one observed (X), the other unobserved (U) – we demonstrate that conditioning on the observed confounder X does not necessarily imply that the confounding bias decreases, even if X is highly correlated with U. That is, adjusting for X may increase instead of reduce the omitted variable bias (OVB). Two phenomena can cause an increasing OVB: (i) bias amplification and (ii) cancellation of offsetting biases. Bias amplification occurs because conditioning on X amplifies any remaining bias due to the omitted confounder U. Cancellation of offsetting biases is an issue whenever X and U induce biases in opposite directions such that they perfectly or partially offset each other, in which case adjusting for X inadvertently cancels the bias-offsetting effect. In this article we discuss the conditions under which adjusting for X increases OVB, and demonstrate that conditioning on X increases the imbalance in U, which turns U into an even stronger confounder. We also show that conditioning on an unreliably measured confounder can remove more bias than the corresponding reliable measure. Practical implications for causal inference will be discussed.


2018 ◽  
Vol 34 (3) ◽  
pp. 323-334
Author(s):  
Nadya Mincheva ◽  
Mitko Lalev ◽  
Magdalena Oblakova ◽  
Pavlina Hristakieva

The prediction of chicks? weight before hatching is an important element of selection, aimed at improving the uniformity rate and productivity of birds. With this regards, our goal was to develop and evaluate optimum models for similar prediction in two White Plymouth Rock chickens lines - line L and line K on the basis of the incubation egg weight and egg geometry characteristics - egg maximum breadth (B), egg length (L), geometric mean diameter (Dg), egg volume (V), egg surface area (S). A total of 280 eggs (140 from each line) laid by 40-weekold hens were randomly selected. Mean arithmetic values, standard deviations and coefficients of variation of studied parameters were determined for each line. Correlation coefficients between the weight of hatchlings and predictors were the highest for egg weight, geometric mean diameter, volume and surface area of eggs (r=0.731-0.779 for line L; r=0.802-0.819 for line ?). Nine linear regression models were developed and their accuracy evaluated. The regression equations of hatchlings? weight vs egg length had the lowest coefficient of determination (0.175 for line K and 0.291 for line L), but when egg length and breadth entered the model together, its value increased significantly up to 0.541 and 0.665 for lines L and K, respectively. The weight of day-old chicks from line L could be predicted with higher accuracy with a model involving egg surface area apart egg weight (ChW=0.513EW+0.282S - 10.345; R2=0.620). In line ? a more accurate prognosis was attained by adding egg breadth as an additional predictor to the weight in the model (ChW=0.587EW+0.566? - 19.853; R2=0.692). The study demonstrated that multiple linear regression models were more precise that single linear models.


2021 ◽  
Author(s):  
Katja Moehring

Multilevel models that combine individual and contextual factors are increasingly popular in comparative social science research; however, their application in country-comparative studies is often associated with several problems. First of all, most data-sets utilized for multilevel modeling include only a small number (N<30) of macro-level units, and therefore, the estimated models have a small number of degrees of freedom on the country level. If models are correctly specified paying regard to the small, level-2 N, only a few macro-level indicators can be controlled for. Furthermore, the introduction of random slopes and cross-level interaction effects is then hardly possible. Consequently, (1) these models are likely to suffer from omitted variable bias regarding the country-level estimators, and (2) the advantages of multilevel modeling cannot be fully exploited.The fixed effects approach is a valuable alternative to the application of conventional multilevel methods in country-comparative analyses. This method is also applicable with a small number of countries and avoids the country-level omitted variable bias through controlling for country-level heterogeneity. Following common practice in panel regression analyses, the moderator effect of macro-level characteristics can be estimated also in fixed effects models by means of cross-level interaction effects. Despite the advantages of the fixed effects approach, it is rarely used for the analysis of cross-national data.In this paper, I compare the fixed effects approach with conventional multilevel regression models and give practical examples using data of the International Social Survey Programme (ISSP) from 2006. As it turns out, the results of both approaches regarding the effect of cross-level interactions are similar. Thus, fixed effects models can be used either as an alternative to multilevel regression models or to assess the robustness of multilevel results.


Author(s):  
Guojun Gan

A variable annuity is a popular life insurance product that comes with financial guarantees. Using Monte Carlo simulation to value a large variable annuity portfolio is extremely time-consuming. Metamodeling approaches have been proposed in the literature to speed up the valuation process. In metamodeling, a metamodel is first fitted to a small number of variable annuity contracts and then used to predict the values of all other contracts. However, metamodels that have been investigated in the literature are sophisticated predictive models. In this paper, we investigate the use of linear regression models with interaction effects for the valuation of large variable annuity portfolios. Our numerical results show that linear regression models with interactions are able to produce accurate predictions and can be useful additions to the toolbox of metamodels that insurance companies can use to speed up the valuation of large VA portfolios.


2020 ◽  
pp. 65-92
Author(s):  
Bendix Carstensen

This chapter evaluates regression models, focusing on the normal linear regression model. The normal linear regression model establishes a relationship between a quantitative response (also called outcome or dependent) variable, assumed to be normally distributed, and one or more explanatory (also called regression, predictor, or independent) variables about which no distributional assumptions are made. The model is usually referred to as 'the general linear model'. The chapter then differentiates between simple linear regression and multiple regression. The term 'simple linear regression' covers the regression model where there is one response variable and one explanatory variable, assuming a linear relationship between the two. The chapter also discusses the model formulae in R; generalized linear models; collinearity and aliasing; and logarithmic transformations.


2019 ◽  
Vol 6 ◽  
pp. 233339281989100
Author(s):  
Valentin Rousson ◽  
Jean-Benoît Rossel ◽  
Yves Eggli

We consider the nontrivial problem of estimating the health cost repartition among different diseases in the common case where the patients may have multiple diseases. To tackle this problem, we propose to use an iterative proportional repartition (IPR) algorithm, a nonparametric method which is simple to understand and to implement, allowing (among other) to avoid negative cost estimates and to retrieve the total health cost by summing up the estimated costs of the different diseases. This method is illustrated with health costs data from Switzerland and is compared in a simulation study with other methods such as linear regression and general linear models. In the case of an additive model without interactions between disease costs, a situation where the truth is clearly defined such that the methods can be compared on an objective basis, the IPR algorithm clearly outperformed the other methods with respect to efficiency of estimation in all the settings considered. In the presence of interactions, the situation is more complex and will deserve further investigation.


Risks ◽  
2018 ◽  
Vol 6 (3) ◽  
pp. 71 ◽  
Author(s):  
Guojun Gan

A variable annuity is a popular life insurance product that comes with financial guarantees. Using Monte Carlo simulation to value a large variable annuity portfolio is extremely time-consuming. Metamodeling approaches have been proposed in the literature to speed up the valuation process. In metamodeling, a metamodel is first fitted to a small number of variable annuity contracts and then used to predict the values of all other contracts. However, metamodels that have been investigated in the literature are sophisticated predictive models. In this paper, we investigate the use of linear regression models with interaction effects for the valuation of large variable annuity portfolios. Our numerical results show that linear regression models with interactions are able to produce accurate predictions and can be useful additions to the toolbox of metamodels that insurance companies can use to speed up the valuation of large VA portfolios.


2017 ◽  
Vol 9 (6) ◽  
pp. 106
Author(s):  
J.C.S. De Miranda

We present a methodology for estimating causal functional linear models using orthonormal tensor product expansions. More precisely, we estimate the functional parameters $\alpha$ and $\beta$ that appear in the causal functional linear regression model:$$\mathcal{Y}(s)=\alpha(s)+\int_a^b\beta(s,t)\mathcal{X}(t)\mathrm{d}t+\mathcal{E}(s),$$ where  $\mbox{supp } \beta \subset \mathfrak{T},$ and $\mathfrak{T}$ is the closed triangular region whose vertexes are $(a,a) , (b,a)$ and $(b,b).$ We assume we have an independent sample $\{ (\mathcal{Y}_k,\mathcal{X}_k) : 1\le k \le N, k\in \mathbb{N}\}$ of observations where the $\mathcal{X}_k $'s are functional covariates, the $\mathcal{Y}_k$'s are time order preserving functional responses and $\mathcal{E}_k,$ $1\le k \le N,$ is i.i.d. zero mean functional noise.


Author(s):  
PIHNASTYI OLEH MYKHAILOVYCH ◽  
KOZHYNA OLGA SERGEYEVNA

Objectives: Prognostication of bronchial asthma severity in children by means of two-parameter regression models building. Methods: A clinical study of 70 children with bronchial asthma diagnosis of 6 to 18 years old was done.142 factors were analyzed and a degree of relationship among them was revealed. Single-factor regression models were used during preliminary experimental data processing. Results: The correlation connection between the value observed and the factors under research was revealed. The method of two-parameter linear models with a fair accuracy was developed. Conclusion: The suggested method of approximate two-parameter linear regression models can be used for preliminary analysis of medical research data where the value observed depends on a big number of loosely connected factors.


Sign in / Sign up

Export Citation Format

Share Document