Tuesday, May 22, 2012

Running Least Squares

or "Incremental Least Squares"

If you want to find the Least Squares solution to a linear algebra problem (\(\textbf{A} \vec{y} = \vec{b}\)), you probably would use the formula

$$\hat{y} = (\textbf{A}^\intercal \textbf{A})^{-1} \textbf{A}^\intercal \vec{b},$$

where \(\hat{y}\) is the closest possible value to \(y\). If you need to do this for many points, the operation could become expensive, both in time and memory. What you instead could do is this:

Imagine you have calculated the approximation for some number of samples. You had \(\textbf{A}\), \(\vec{b}\) and \(\vec{y}\), and calculated \(\hat{y}\).

Now you want to add another point to the calculations. Instead of recalculating the transpose, inverse and multiplications, you can add one more row to the formula.

$$\hat{y}' = \left( \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix} \right)^{-1} \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\vec{b} \\ z\end{bmatrix}$$ which expands into $$\hat{y}' = \left( \textbf{A}^\intercal \textbf{A} + \vec{r} \vec{r}^\intercal \right)^{-1} \left( \textbf{A}^\intercal \vec{b} + \vec{r} z \right)$$

Here \(\vec{r}\) is a vector of the input variables (e.g. \(\begin{bmatrix}x \\ y\end {bmatrix}\)), and \(z\) is the new target.

You might now see where this is going.

  • All you need to keep in memory, are the matrix \(\textbf{M} = \textbf{A}^\intercal \textbf{A}\) and vector \( \vec{v} = \textbf{A}^\intercal \vec{b} \).
  • Whenever you want to add another point, you increment \(\textbf{M}\) with \(\vec{r} \vec{r}^\intercal\), and \(\vec{v}\) with \(\vec{r} z\).
  • When you want to get the result, you calculate \(\hat{y} = \textbf{M}^{-1} \vec{v}\). This operation will require at least two samples.
  • \(\textbf{M}\) and \(\vec{v}\) both start off with only zero-entries.

The size of \(\textbf{M}\) is \(n \times n\), and \(\vec{v}\) is \(n\), where \(n\) is the number of coefficients in the system (the width of the matrix \(\textbf{A}\).