leverage scores
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 2)

H-INDEX

2
(FIVE YEARS 1)

2021 ◽  
Vol 42 (3) ◽  
pp. 1199-1228
Author(s):  
Aleksandros Sobczyk ◽  
Efstratios Gallopoulos

Author(s):  
Aritra Bose ◽  
Myson C. Burch ◽  
Agniva Chowdhury ◽  
Peristera Paschou ◽  
Petros Drineas

AbstractGenome-wide association studies (GWAS) have been extensively used to estimate the signed effects of trait-associated alleles. Recent independent studies failed to replicate the strong evidence of selection for height across Europe implying the shortcomings of standard population stratification correction approaches. Here, we present CluStrat, a stratification correction algorithm for complex population structure that leverages the linkage disequilibrium (LD)-induced distances between individuals. CluStrat performs agglomerative hierarchical clustering using the Mahalanobis distance and then applies sketching-based randomized ridge regression on the genotype data to obtain the association statistics. With the growing size of data, computing and storing the genome wide covariance matrix is a non-trivial task. We get around this overhead by computing the GRM directly using a connection between statistical leverage scores and the Mahalanobis distance. We test CluStrat on a large simulation study of discrete and admixed, arbitrarily-structured sub-populations identifying two to three-fold more true causal variants when compared to Principal Component (PC) based stratification correction methods while trading off for a slightly higher spurious associations. Applying CluStrat on WTCCC2 Parkinson’s disease (PD) data, we identified loci mapped to a host of genes associated with PD such as BACH2, MAP2, NR4A2, SLC11A1, UNC5C to name a few.Availability and ImplementationCluStrat source code and user manual is available at: https://github.com/aritra90/CluStrat


2018 ◽  
Vol 7 (3) ◽  
pp. 581-604 ◽  
Author(s):  
Armin Eftekhari ◽  
Michael B Wakin ◽  
Rachel A Ward

Abstract Leverage scores, loosely speaking, reflect the importance of the rows and columns of a matrix. Ideally, given the leverage scores of a rank-r matrix $M\in \mathbb{R}^{n\times n}$, that matrix can be reliably completed from just $O (rn\log ^{2}n )$ samples if the samples are chosen randomly from a non-uniform distribution induced by the leverage scores. In practice, however, the leverage scores are often unknown a priori. As such, the sample complexity in uniform matrix completion—using uniform random sampling—increases to $O(\eta (M)\cdot rn\log ^{2}n)$, where η(M) is the largest leverage score of M. In this paper, we propose a two-phase algorithm called MC2 for matrix completion: in the first phase, the leverage scores are estimated based on uniform random samples, and then in the second phase the matrix is resampled non-uniformly based on the estimated leverage scores and then completed. For well-conditioned matrices, the total sample complexity of MC2 is no worse than uniform matrix completion, and for certain classes of well-conditioned matrices—namely, reasonably coherent matrices whose leverage scores exhibit mild decay—MC2 requires substantially fewer samples. Numerical simulations suggest that the algorithm outperforms uniform matrix completion in a broad class of matrices and, in particular, is much less sensitive to the condition number than our theory currently requires.


2015 ◽  
Vol 36 (3) ◽  
pp. 1143-1163 ◽  
Author(s):  
John T. Holodnak ◽  
Ilse C. F. Ipsen ◽  
Thomas Wentworth

Sign in / Sign up

Export Citation Format

Share Document