Tuesday, May 22, 2012

Running Least Squares

or "Incremental Least Squares"

If you want to find the Least Squares solution to a linear algebra problem (\(\textbf{A} \vec{y} = \vec{b}\)), you probably would use the formula

$$\hat{y} = (\textbf{A}^\intercal \textbf{A})^{-1} \textbf{A}^\intercal \vec{b},$$

where \(\hat{y}\) is the closest possible value to \(y\). If you need to do this for many points, the operation could become expensive, both in time and memory. What you instead could do is this:

Imagine you have calculated the approximation for some number of samples. You had \(\textbf{A}\), \(\vec{b}\) and \(\vec{y}\), and calculated \(\hat{y}\).

Now you want to add another point to the calculations. Instead of recalculating the transpose, inverse and multiplications, you can add one more row to the formula.

$$\hat{y}' = \left( \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix} \right)^{-1} \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\vec{b} \\ z\end{bmatrix}$$ which expands into $$\hat{y}' = \left( \textbf{A}^\intercal \textbf{A} + \vec{r} \vec{r}^\intercal \right)^{-1} \left( \textbf{A}^\intercal \vec{b} + \vec{r} z \right)$$

Here \(\vec{r}\) is a vector of the input variables (e.g. \(\begin{bmatrix}x \\ y\end {bmatrix}\)), and \(z\) is the new target.

You might now see where this is going.

  • All you need to keep in memory, are the matrix \(\textbf{M} = \textbf{A}^\intercal \textbf{A}\) and vector \( \vec{v} = \textbf{A}^\intercal \vec{b} \).
  • Whenever you want to add another point, you increment \(\textbf{M}\) with \(\vec{r} \vec{r}^\intercal\), and \(\vec{v}\) with \(\vec{r} z\).
  • When you want to get the result, you calculate \(\hat{y} = \textbf{M}^{-1} \vec{v}\). This operation will require at least two samples.
  • \(\textbf{M}\) and \(\vec{v}\) both start off with only zero-entries.

The size of \(\textbf{M}\) is \(n \times n\), and \(\vec{v}\) is \(n\), where \(n\) is the number of coefficients in the system (the width of the matrix \(\textbf{A}\).


Shengping Zhang said...

Hi, you did not consider that the direct computation of the inverse of (A^TA)^{-1} is very expensive. Usually QR factorization is used to reduce the computational cost. If QR is used, how to incrementally compute the least square solution?

Markus Jarderot said...

A^TA is just a N x N matrix, where N is the number of variables. Usually, the number of variables are small.

If you are using a large number of variables (or if you consider 4 x 4 large), I don't think there is any easy way to a "running" QR factorization.

Ben Jackson said...

I'm surprised you didn't invoke Sherman-Morrison to get your incremental M^-1