Least Squares

The A^TAx=A^Tb form
The LU case
- Direct method
- The numerically stable solution
The QR case

(ns morelinear.leastsquares
  (:require [clojure.core.matrix :refer :all]
            [clojure.core.matrix.linear :as linear]
            [morelinear.gauss :as gauss]
            [morelinear.householder :as householder]))
(set-current-implementation :vectorz)

The A^TAx=A^Tb form

Now that we have many ways to solve square linear systems we need to extend Ax=b to the overdefined case where we have more linear equations than parameters. This situation will come up constantly, generally in situations where you only have a few inputs you want to solve for, but you have a lot of redundant measurements.

\begin{equation} \begin{bmatrix} a_11 & a_12\\ a_21 & a_22\\ a_31 & a_32\\ a_41 & a_42\\ ...\\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} y_1\\ y_2\\ \\ \end{bmatrix} \end{equation}

If the measurements were ideal then the rows of A will not be independent and you will be able to find an explicit solution though Gaussian Elimination. But in the general case (with the addition of noise and rounding errors) no explicit solution will exist for A_skinnyx=b . You could try to find a set of independent equations and throw away the extra equation to try to make a square matrix, but this is throwing information away.

Since there is no x for which A_skinnyx=b holds true we instead solve for a "close" system A_skinnyx=b_close which does have a solution. In other words we want to find an x that will give us a b_close which is as close as possible to b. When we say "close" what we mean mathematically is that we want to minimize the sum of the difference between b_close and b - ie. sum_of_all_values(b_close-b).

The problem is that sums don't express very easily in matrix form. Fortunately we do have a mechanism which is really close - the inner product. The inner product of a vector x with itself - x^Tx - is the sum of the squares of the values of x. While this is not equivalent, it actually does not change the solution b/c the minimum stays the minimum. So instead of sum_of_all_values(b_close-b). we can work with (b_close-b)^T(b_close-b) and guarantee the same solution

If we plug in our equation for b_close for we get (A_skinnyx-b)^T(A_skinnyx-b).

\begin{equation} (A_{skinny}x-b)^{T}(A_{skinny}x-b) \\ ((A_{skinny}x)^{T}-b^{T})(A_{skinny}x-b) \\ (x^{T}A_{skinny}^{T}-b^{T})(A_{skinny}x-b) \\ x^{T}A_{skinny}^{T}A_{skinny}x -x^{T}A_{skinny}^{T}b -b^{T}A_{skinny}x +b^2 \end{equation}

As we learn in calculus, minimizing a function is done by take its derivative with respect to the parameter we are changing (here that's x), setting it equal to zero and then solving for that parameter (b/c the minimum point has zero slope). What page 226-227 shows is that the derivative of our difference equation give us the equation A^TAx=A^Tb. The right hand side A^Tb is a vector, and in-fact the whole equation is in the form Ax=b which we know how to solve (again, solving for x here). Also note that A^TA is always square and singlular - and that b/c A was skinny it's actually smaller than A.

The LU case

Direct method

We take the product A^TA and the product A^Tb and solve A^TAx=A^Tb just like we solve an Ax=b system

(defn lu-direct
  "Solve ATA=ATb directly"
  [linear-system output-vector]
  (let [AT (transpose linear-system)
        ATA (mmul AT linear-system)
        ATb (mmul AT output-vector)]
    (gauss/solve-with-lu ATA ATb)))

;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
         [1.0 0.25 0.0625]
         [1.0 0.50 0.25]
         [1.0 0.75 0.5625]
         [1.0 1.00 1.0]]
      b [0.000 0.008 0.015 0.019 0.02]
      x (lu-direct A b)]
  (pm x)
  (pm (mmul A x)))

[-0.000 0.040 -0.019]
[-0.000 0.009 0.015 0.019 0.020]

The numerically stable solution

;; [-0.000 0.040 -0.019] ;; [-2.2857142857142797E-4 ;; 0.008514285714285717 ;; 0.01482857142857143 ;; 0.018714285714285718 ;; 0.020171428571428573]

Unfortunately while the previous solutions is very simple, it does involve a matrix multiplication and therefore has some numerical issues. A nice trick is presented in Exercise 4.6.9 with the equations

\begin{equation} \begin{bmatrix} I_{m*m} & A\\ A^T & 0_{n*n}\\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} b\\ 0\\ \end{bmatrix} \end{equation}

This special block has the property that it'll always be square no matter what shape A is. If you multiply out the block matrices you will get two equations

\begin{equation} \begin{bmatrix} x_1+Ax_{2}\\ A^{T}x_{1}\\ \end{bmatrix} = \begin{bmatrix} b\\ 0\\ \end{bmatrix} \end{equation} \begin{equation} x_1+Ax_{2} = b \end{equation} \begin{equation} A^{T}x_{1} = 0 \end{equation}

Note how the second equation tell us x₁ is in the N(A^T)

Now we multiple both sides of our first equation by A^T

\begin{equation} A^{T}(x_1+Ax_{2}) = A^{T}b \end{equation} \begin{equation} A^{T}x_1+A^{T}Ax_{2} = A^{T}b \end{equation} \begin{equation} A^{T}Ax_{2} = A^{T}b \end{equation}

We just used the fact that x₁ was in the nullspace of A^T to drop the first term and now we see how x₂ is also a solution to the least squares equation

So we can use any method that works on solving square matrices, use it on our jumbo matrix and get a solution for the least squares problem. All without computing A^TA.

(defn- jumbo-matrix
  "Use the big block matrix method to compute the least squares solution"
  [input-matrix]
  (let [num-rows (row-count input-matrix)
        num-columns (column-count input-matrix)
        identity (identity-matrix num-rows)
        AT (transpose input-matrix)
        zeroes (zero-matrix num-columns
                            num-columns)]
    (join-along 0
                (join-along 1
                            identity
                            input-matrix)
                (join-along 1
                            AT
                            zeroes))))

(defn- pad-with-zeroes
  "Takes the INPUT-VECTOR and make it longer with some zeroes padded on the end"
  [input-vector number-of-zeroes]
  (join-along 0
              input-vector
              (zero-vector number-of-zeroes)))

(defn lu-jumbo
  ""
  [input-matrix output-vector]

  (subvector (gauss/solve-with-lu (jumbo-matrix input-matrix)
                                  (pad-with-zeroes output-vector
                                                   (column-count input-matrix)))
             (row-count input-matrix)
             (column-count input-matrix)))

;; missile problem from page 232 but redone with the big matrix method
(let [A [[1.0 0.0 0.0]
         [1.0 0.25 0.0625]
         [1.0 0.50 0.25]
         [1.0 0.75 0.5625]
         [1.0 1.00 1.0]]
      b [0.000 0.008 0.015 0.019 0.02]
      x (lu-jumbo A b)]
  (pm x)
  (pm (mmul A x)))

[-0.000 0.040 -0.019]
[-0.000 0.009 0.015 0.019 0.020]

(let [A [[1.000    0.000       0.000]
         [1.000  250.000   62500.000]
         [1.000  500.000  250000.000]
         [1.000  750.000  562500.000]
         [1.000 1000.000 1000000.000]]
      b [0.0 8.0 15.0 19.0 20.0]
      x (lu-jumbo A b)]
  (pm x)
  (pm (mmul A x)))

[-7.456 0.048 -0.000]
[-7.456 2.643 9.071 11.828 10.914]

TODO This should be numerically stable but it's not. Need to double check everything again.. Maybe compare results to elinear

The QR case

With the QR decomposition we similarly get to avoid computing the A^TA product. Since A=QR we can write rewrite A^TAx=A^Tb:

\begin{equation} A^{T}Ax=A^{T}b \\ (QR)^{T}QRx = (QR)^{T}b \\ R^{T}Q^{T}QRx = R^{T}Q^{T}b \end{equation}

Now since Q^T is equivalent to Q^-1 (see: Householder transform) this simplifies further to:

\begin{equation} R^{T}Rx = R^{T}Q^{T}b \\ Rx = Q^{T}b \\ \end{equation}

The R matrix is upper triangular so the left side solves directly through backsubsitution, while the right side evaluates to some column vector.

(defn householder-qr-long
  ""
  [A b]
  (let [R (mutable A)
        Q (householder/qr! R)
        QT (transpose Q)
        QTb (mmul QT b)]
    (gauss/backward-substitution R QTb)))

However when we test the result…

  ;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
         [1.0 0.25 0.0625]
         [1.0 0.50 0.25]
         [1.0 0.75 0.5625]
         [1.0 1.00 1.0]]
      b [0.000 0.008 0.015 0.019 0.02]
      x (householder-qr-long A b)]
  (pm x)
  (pm (mmul A x)))

[0.003 0.016 0.005]
[0.003 0.007 0.012 0.017 0.023]

The values are .. in the right direction but clearly much more off the mark than in the previous solutions. The trick, as mentioned in the Householder transform section, is to reduce A and b simultaneously with reflectors and skip forming Q entirely.

(defn householder-qr
  ""
  [A b]
  (householder/reduction A b)
  (gauss/backward-substitution A b))

;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
         [1.0 0.25 0.0625]
         [1.0 0.50 0.25]
         [1.0 0.75 0.5625]
         [1.0 1.00 1.0]]
      R (mutable A)
      b (mutable [0.000 0.008 0.015 0.019 0.02])
      x (householder-qr R b)]
  (pm x)
  (pm (mmul A x)))

[-0.000 0.040 -0.019]
[-0.000 0.009 0.015 0.019 0.020]

And now the results match closely to what we got before