AbstractWe extend the approximation-theoretic technique of optimal recovery to the setting of imputing missing values in clustered data, specifically for non-negative matrix factorization (NMF), and develop an implementable algorithm. Under certain geometric conditions, we prove tight upper bounds on NMF relative error, which is the first bound of this type for missing values. We also give probabilistic bounds for the same geometric assumptions. Experiments on image data and biological data show that this theoretically-grounded technique performs as well as or better than other imputation techniques that account for local structure.