Tuesday, May 22, 2012

Running Least Squares

or "Incremental Least Squares"

If you want to find the Least Squares solution to a linear algebra problem ($$\textbf{A} \vec{y} = \vec{b}$$), you probably would use the formula

$$\hat{y} = (\textbf{A}^\intercal \textbf{A})^{-1} \textbf{A}^\intercal \vec{b},$$

where $$\hat{y}$$ is the closest possible value to $$y$$. If you need to do this for many points, the operation could become expensive, both in time and memory. What you instead could do is this:

Imagine you have calculated the approximation for some number of samples. You had $$\textbf{A}$$, $$\vec{b}$$ and $$\vec{y}$$, and calculated $$\hat{y}$$.

Now you want to add another point to the calculations. Instead of recalculating the transpose, inverse and multiplications, you can add one more row to the formula.

$$\hat{y}' = \left( \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix} \right)^{-1} \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\vec{b} \\ z\end{bmatrix}$$ which expands into $$\hat{y}' = \left( \textbf{A}^\intercal \textbf{A} + \vec{r} \vec{r}^\intercal \right)^{-1} \left( \textbf{A}^\intercal \vec{b} + \vec{r} z \right)$$

Here $$\vec{r}$$ is a vector of the input variables (e.g. $$\begin{bmatrix}x \\ y\end {bmatrix}$$), and $$z$$ is the new target.

You might now see where this is going.

• All you need to keep in memory, are the matrix $$\textbf{M} = \textbf{A}^\intercal \textbf{A}$$ and vector $$\vec{v} = \textbf{A}^\intercal \vec{b}$$.
• Whenever you want to add another point, you increment $$\textbf{M}$$ with $$\vec{r} \vec{r}^\intercal$$, and $$\vec{v}$$ with $$\vec{r} z$$.
• When you want to get the result, you calculate $$\hat{y} = \textbf{M}^{-1} \vec{v}$$. This operation will require at least two samples.
• $$\textbf{M}$$ and $$\vec{v}$$ both start off with only zero-entries.

The size of $$\textbf{M}$$ is $$n \times n$$, and $$\vec{v}$$ is $$n$$, where $$n$$ is the number of coefficients in the system (the width of the matrix $$\textbf{A}$$.