## Polynomial Time Algorithm for Column-Row Based Relative-Error Low-Rank Matrix Approximation

### Authors: Petros Drineas, Michael W. Mahoney and S. Muthukrishnan

ABSTRACT

Given an $m \times n$ matrix $A$ and an integer $k$ less than the rank of $A$, the best -- with respect to the Frobenius norm -- rank $k$ approximation to $A$ is $A_k$, which is obtained by truncating the Singular Value Decomposition (SVD) of $A$. While $A_k$ is routinely used in data analysis, it is difficult to interpret and understand it in terms of the {\em original data}, namely the rows and columns of $A$ which come from the application domain.

In this paper, we address the problem of obtaining low-rank approximations that are directly expressible in terms of the original rows and columns of $A$. Our main results are as follows. We present a randomized algorithm to determine a set $C$ of columns, whose size is polynomial in $k,\log (1/\delta),1/\varepsilon$, such that the matrix $A'$ expressly written in terms of $C$ satisfies $\FNorm{A-A'} \leq (1+\varepsilon) \FNorm{A-A_k}$ with probability at least $1-\delta$. This is the first polynomial time algorithm for low-rank matrix approximation that gives relative error guarantees; all previously known methods including the seminal work of Frieze, Kannan and Vempala~\cite{FKV98} yield approximations with a large additive term of $\varepsilon \FNorm{A}$ and are improved by our result. We further extend this result to obtain a randomized algorithm to determine a set $C$ of columns and a set $R$ of rows, both polynomial in $k,\log (1/\delta),1/\varepsilon$ in size, such that the matrix $A'$ expressly written in terms of $C$ and $R$ satisfies h polynomial in $k,\log (1/\delta),1/\varepsilon$ in size, such that the matrix $A'$ expressly written in terms of $C$ and $R$ satisfies $\FNorm{A-A'} \leq (1+\varepsilon) \FNorm{A-A_k}$ with probability at least $1-\delta$. This is again the first polynomial time algorithm of this form with relative error guarantees; previously, even existence of such $C$ and $R$ were not known.

All our algorithms employ random sampling, but rather than sampling rows and columns of $A$ as in prior work, we carefully use information from the top few singular vectors of $A$. This technique was recently introduced in~\cite{DMM06} for the $l_2$ regression problem, and is significantly extended here. Our algorithms are quite simple, taking time of the order of the time needed to compute the SVD of $A$, and will likely be useful in practical applications.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2006/2006-04.ps.gz