## Tuesday, May 22, 2012

### Running Least Squares

or "Incremental Least Squares"

If you want to find the Least Squares solution to a linear algebra problem ($$\textbf{A} \vec{y} = \vec{b}$$), you probably would use the formula

$$\hat{y} = (\textbf{A}^\intercal \textbf{A})^{-1} \textbf{A}^\intercal \vec{b},$$

where $$\hat{y}$$ is the closest possible value to $$y$$. If you need to do this for many points, the operation could become expensive, both in time and memory. What you instead could do is this:

Imagine you have calculated the approximation for some number of samples. You had $$\textbf{A}$$, $$\vec{b}$$ and $$\vec{y}$$, and calculated $$\hat{y}$$.

Now you want to add another point to the calculations. Instead of recalculating the transpose, inverse and multiplications, you can add one more row to the formula.

$$\hat{y}' = \left( \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix} \right)^{-1} \begin{bmatrix}\textbf{A} \\ \vec{r}^\intercal\end{bmatrix}^\intercal \begin{bmatrix}\vec{b} \\ z\end{bmatrix}$$ which expands into $$\hat{y}' = \left( \textbf{A}^\intercal \textbf{A} + \vec{r} \vec{r}^\intercal \right)^{-1} \left( \textbf{A}^\intercal \vec{b} + \vec{r} z \right)$$

Here $$\vec{r}$$ is a vector of the input variables (e.g. $$\begin{bmatrix}x \\ y\end {bmatrix}$$), and $$z$$ is the new target.

You might now see where this is going.

• All you need to keep in memory, are the matrix $$\textbf{M} = \textbf{A}^\intercal \textbf{A}$$ and vector $$\vec{v} = \textbf{A}^\intercal \vec{b}$$.
• Whenever you want to add another point, you increment $$\textbf{M}$$ with $$\vec{r} \vec{r}^\intercal$$, and $$\vec{v}$$ with $$\vec{r} z$$.
• When you want to get the result, you calculate $$\hat{y} = \textbf{M}^{-1} \vec{v}$$. This operation will require at least two samples.
• $$\textbf{M}$$ and $$\vec{v}$$ both start off with only zero-entries.

The size of $$\textbf{M}$$ is $$n \times n$$, and $$\vec{v}$$ is $$n$$, where $$n$$ is the number of coefficients in the system (the width of the matrix $$\textbf{A}$$.

Shengping Zhang said...

Hi, you did not consider that the direct computation of the inverse of (A^TA)^{-1} is very expensive. Usually QR factorization is used to reduce the computational cost. If QR is used, how to incrementally compute the least square solution?

Markus Jarderot said...

A^TA is just a N x N matrix, where N is the number of variables. Usually, the number of variables are small.

If you are using a large number of variables (or if you consider 4 x 4 large), I don't think there is any easy way to a "running" QR factorization.

Ben Jackson said...

I'm surprised you didn't invoke Sherman-Morrison to get your incremental M^-1