Least Squares, Minimum Norm, Pseudo-Inverse Solutions

May 4, 2025 | Tech Math

Preface

For a system of linear "equations" (may not actually be equal) $$ Ax \backsimeq b $$ where $ A\in\mathbb{R}^{m \times n}, x\in \mathbb{R}^n, b \in \mathbb{R}^m $, there exists some "solutions" (may not be exact solutions) in some optimal sense that are widely used in engineering. This post presents some results and relevant references.

Least Squares Solution

When $ m \gt n $, $\text{rank}(A) = n$, $A$ is a tall matrix with full column-rank; there are more equations than unknowns, the system is over-determined, no solution exists. In this case, one looks for a solution such that $$ \min || Ax - b || $$

Such a solution is called a least squares solution, in the sense that the error of $Ax-b$ is minimized. It is unique and is given by $$ x^{\text{LS}} = (A^T A)^{-1} A^T b $$

For its derivation and proof, see this Zhihu post and Wikipedia Least Squares. For matrix derivative computation and results, see Wikipedia Matrix Calculus.

Minimum Norm (Least Norm) Solution

When $ m \lt n $, $\text{rank}(A) = m$, $A$ is a fat matrix with full row-rank; there are less equations than unknowns, the system is under-determined, infinite solutions exist. In this case, one typically looks for a solution that $$ \min ||x|| $$ $$ \text{s.t.} \,\,\, Ax = b $$

Such a solution is called a minimum norm solution, or least norm solution, in the sense that it is a solution of the system, and its norm is minimized. It is unique and is given by $$ x^{\text{MN}} = A^T(AA^T)^{-1} b $$ and $$ x^{\text{MN}} \perp \mathcal{N}(A) $$

For its derivation and proof, see this Zhihu post and Stanford Engineering Lecture Notes.

Pseudo-Inverse (Minimum Norm Least Squares) Solution

When $\text{rank}(A) \lt \min(m,n)$, $A$ is rank-deficient; there may or may not exist solutions. If solutions exist, there are infinite solutions; otherwise, there is no solution. In this case, one typically looks for a solution that $$ \min ||x|| $$ $$ \text{s.t.} \,\,\, \arg \min ||Ax-b|| $$

Such a solution is called a pseudo-inverse solution, or minimum norm least squares solution, in the sense that both (or either) the error of $Ax-b$ and the norm of $x$ are minimized. It is unique and is given by $$ x^{\text{MNLS}} = A^{\dagger} b $$ where $$ A = U \Sigma V^T $$

$U \in \mathbb{R}^{m\times m}$, unitary matrix $U^{-1} = U^T$, consists of eigenvectors of $AA^T$,
$V \in \mathbb{R}^{n\times n}$, unitary matrix $V^{-1} = V^T$, consists of eigenvectors of $A^TA$,
$\Sigma \in \mathbb{R}^{m\times n}$, rectangular diagonal matrix $\Sigma = \text{diag} \{ \sigma_1, \sigma_2, \cdots, \sigma_{\rho}, 0, \cdots, 0 \}$, consists of eigenvalues of $AA^T$ or $A^TA$,

where $\sigma_i$ for $i = 1,\cdots,\rho$ are the non-zero eigenvalues of $AA^T$ or $A^TA$,
and $AA^T$ and $A^TA$ have the same non-zero eigenvalues, but it is more efficient to compute the matrix with smaller dimension,

and $$ A^{\dagger} = V \Sigma^{\dagger} U^T $$ where $$ \Sigma^{\dagger} = \text{diag} \{ \frac{1}{\sigma_1}, \frac{1}{\sigma_2}, \cdots, \frac{1}{\sigma_{\rho}}, 0, \cdots, 0 \} $$ Mathematically, note that $$ A^{\dagger} = \begin{cases} A^{\text{LS}} = (A^T A)^{-1} A^T & \text{rank}(A) = n \\ A^{\text{MN}} = A^T(AA^T)^{-1} & \text{rank}(A) = m \\ A^{\text{MNLS}}_\text{MN} = V \Sigma^{\dagger} U^T & \text{rank}(A) \lt \min(m,n) \end{cases} $$

For reference, see Wikipedia Moore-Penrose Inverse and Wikipedia Singular Value Decomposition. For applications, see Robot Singularity Avoidance, Analysis and Comparison and Nullspace Concepts, Robotics Applications, Implementation.

Linear Problem and Decompositions by `Eigen`

When solving a system of linear equations (again, may not actually be equal in the sense that the exact solution may not exist) $$ Ax=b $$ and obtaining matrix inverse, there are many decompositions to choose from, each with some pros and cons, for more information, see Eigen's Linear algebra and decompositions, Benchmark of dense decompositions , and their usage prerequisites Catalogue of decompositions. .

To use these decompositions to solve linear least square problems, there are APIs help you get results, instead of you having to compute for example $(A^TA)^{-1}A^T$ explicitly, see Solving linear least squares systems and Inplace matrix decompositions .