scholarly journals Ensemble-based estimates of eigenvector error for empirical covariance matrices

2018 ◽  
Vol 8 (2) ◽  
pp. 289-312
Author(s):  
Dane Taylor ◽  
Juan G Restrepo ◽  
François G Meyer

Abstract Covariance matrices are fundamental to the analysis and forecast of economic, physical and biological systems. Although the eigenvalues $\{\lambda _i\}$ and eigenvectors $\{\boldsymbol{u}_i\}$ of a covariance matrix are central to such endeavours, in practice one must inevitably approximate the covariance matrix based on data with finite sample size $n$ to obtain empirical eigenvalues $\{\tilde{\lambda }_i\}$ and eigenvectors $\{\tilde{\boldsymbol{u}}_i\}$, and therefore understanding the error so introduced is of central importance. We analyse eigenvector error $\|\boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2$ while leveraging the assumption that the true covariance matrix having size $p$ is drawn from a matrix ensemble with known spectral properties—particularly, we assume the distribution of population eigenvalues weakly converges as $p\to \infty $ to a spectral density $\rho (\lambda )$ and that the spacing between population eigenvalues is similar to that for the Gaussian orthogonal ensemble. Our approach complements previous analyses of eigenvector error that require the full set of eigenvalues to be known, which can be computationally infeasible when $p$ is large. To provide a scalable approach for uncertainty quantification of eigenvector error, we consider a fixed eigenvalue $\lambda $ and approximate the distribution of the expected square error $r= \mathbb{E}\left [\| \boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2\right ]$ across the matrix ensemble for all $\boldsymbol{u}_i$ associated with $\lambda _i=\lambda $. We find, for example, that for sufficiently large matrix size $p$ and sample size $n> p$, the probability density of $r$ scales as $1/nr^2$. This power-law scaling implies that the eigenvector error is extremely heterogeneous—even if $r$ is very small for most eigenvectors, it can be large for others with non-negligible probability. We support this and further results with numerical experiments.

Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 605-617
Author(s):  
H S Battey

Summary We develop a theory of covariance and concentration matrix estimation on any given or estimated sparsity scale when the matrix dimension is larger than the sample size. Nonstandard sparsity scales are justified when such matrices are nuisance parameters, distinct from interest parameters, which should always have a direct subject-matter interpretation. The matrix logarithmic and inverse scales are studied as special cases, with the corollary that a constrained optimization-based approach is unnecessary for estimating a sparse concentration matrix. It is shown through simulations that for large unstructured covariance matrices, there can be appreciable advantages to estimating a sparse approximation to the log-transformed covariance matrix and converting the conclusions back to the scale of interest.


Metrika ◽  
2019 ◽  
Vol 83 (2) ◽  
pp. 243-254
Author(s):  
Mathias Lindholm ◽  
Felix Wahl

Abstract In the present note we consider general linear models where the covariates may be both random and non-random, and where the only restrictions on the error terms are that they are independent and have finite fourth moments. For this class of models we analyse the variance parameter estimator. In particular we obtain finite sample size bounds for the variance of the variance parameter estimator which are independent of covariate information regardless of whether the covariates are random or not. For the case with random covariates this immediately yields bounds on the unconditional variance of the variance estimator—a situation which in general is analytically intractable. The situation with random covariates is illustrated in an example where a certain vector autoregressive model which appears naturally within the area of insurance mathematics is analysed. Further, the obtained bounds are sharp in the sense that both the lower and upper bound will converge to the same asymptotic limit when scaled with the sample size. By using the derived bounds it is simple to show convergence in mean square of the variance parameter estimator for both random and non-random covariates. Moreover, the derivation of the bounds for the above general linear model is based on a lemma which applies in greater generality. This is illustrated by applying the used techniques to a class of mixed effects models.


Sign in / Sign up

Export Citation Format

Share Document