Detecting Confounding in Multivariate Linear Models via Spectral Analysis
AbstractWe study a model where one target variable $Y$ is correlated with a vector $\textbf{X}:=(X_1,\dots,X_d)$ of predictor variables being potential causes of $Y$. We describe a method that infers to what extent the statistical dependences between $\textbf{X}$ and $Y$ are due to the influence of $\textbf{X}$ on $Y$ and to what extent due to a hidden common cause (confounder) of $\textbf{X}$ and $Y$. The method relies on concentration of measure results for large dimensions $d$ and an independence assumption stating that, in the absence of confounding, the vector of regression coefficients describing the influence of each $\textbf{X}$ on $Y$ typically has ‘generic orientation’ relative to the eigenspaces of the covariance matrix of $\textbf{X}$. For the special case of a scalar confounder we show that confounding typically spoils this generic orientation in a characteristic way that can be used to quantitatively estimate the amount of confounding (subject to our idealized model assumptions).