Model Predictive Control via Quadratic Programming

Nov 2, 2024 | Tech Robotics

Preface

Model predictive control (MPC) is a popular control technique. It computes an optimal control input by taking into consideration the system dynamics from its current state to certain steps in the future.

This post summarizes the mathematical derivation that formulates MPC as a quadratic programming (QP) problem and provides its implementation in C++ for future reference. For more information, see 2002@Bartlett , 2007@Camacho , 2019@Kim , and 2021@Yu .

System Dynamics

A discrete time linear system can be described by the following state space equation $$ \label{eq:discrete-time} x(k+1) = A_k \, x(k) + B_k \, u(k) $$ where $x(k)$ and $u(k)$ are the states and control input and $A_k$ and $B_k$ are the system dynamics at the $k$-th time step.

The system dynamics in the next time step is then $$ \begin{split} x(k+2) &= A_{k+1} \, x(k+1) + B_{k+1} \, u(k+1) \\ &= A_{k+1} A_k \, x(k) + A_{k+1} B_k \, u(k) + B_{k+1} \, u(k+1) \end{split} $$

Following the above logic, one can derive the system dynamics from $0$-th to the $h$-th time steps in the future as follows $$ \label{eq:mpc-full} \begin{split} x_1 &= A_0 x_0 + B_0 u_0 \\ x_2 &= A_1 A_0 x_0 + A_1 B_0 u_0 + B_1 u_1 \\ x_3 &= A_2 A_1 A_0 x_0 + A_2 A_1 B_0 u_0 + A_2 B_1 u_1 + B_2 u_2 \\ x_4 & = A_3 A_2 A_1 A_0 x_0 + A_3 A_2 A_1 B_0 u_0 + A_3 A_2 B_1 u_1 + A_3 B_2 u_2 + B_3 u_3 \\ &\vdots \\ x_h &= \prod^{0}_{i=h-1} A_i x_0 + \prod^{1}_{i=h-1}A_i B_0 u_0 + \prod^{2}_{i=h-1}A_i B_1 u_1 + \cdots + \prod^{h-2}_{i=h-1}A_i B_{h-3} u_{h-3} + \prod^{h-1}_{i=h-1}A_i B_{h-2} u_{h-2} + B_{h-1} u_{h-1} \end{split} $$ where $x_i$, $u_i$ stand for $x(i)$, $u(i)$ for simplicity.

Equation \eqref{eq:mpc-full} can be written as $$ \label{eq:mpc-matrix} X = A_{qp} x_0 + B_{qp} U $$ where $$ \begin{split} X &= \begin{bmatrix} x_1^T & x_2^T & \cdots & x_h^T \end{bmatrix} \\ U &= \begin{bmatrix} u_0^T & u_1^T & \cdots & u_{h-1}^T \end{bmatrix} \end{split} $$ and $$ \label{eq:aqp-bqp} A_{qp} = \begin{bmatrix} A_0 \\ A_1 A_0 \\ A_2 A_1 A_0 \\ A_3 A_2 A_1 A_0 \\ \vdots \\ \prod^{0}_{i=h-1} A_i \end{bmatrix},\quad B_{qp} = \begin{bmatrix} B_0 & 0 & 0 & 0 & \cdots & 0 \\ A_1 B_0 & B_1 & 0 & 0 & \cdots & 0 \\ A_2 A_1 B_0 & A_2 B_1 & B_2 & 0 & \cdots & 0 \\ A_3 A_2 A_1 B_0 & A_3 A_2 B_1 & A_3 B_2 & B_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & 0 \\ \prod^{1}_{i=h-1} A_i B_0 & \prod^{2}_{i=h-1} A_i B_1 & \prod^{3}_{i=h-1} A_i B_2 & \prod^{4}_{i=h-1} A_i B_3 & \cdots & B_{h-1} \end{bmatrix} $$

Quadratic Programming Formulation

Given to the system a desired trajectory $$ X_d = \begin{bmatrix} x_{d_1}^T & x_{d_2}^T & \cdots & x_{d_h}^T \end{bmatrix} $$ the problem is to find an optimal control $U$, i.e., minimum effort, such that $||X - X_d||$ is minimized. Then, the objective function can be written as $$ \label{eq:obj-func} \min_{U} f(U) = (X-X_d)^T Q (X-X_d) + U^T R U $$ where $Q=Q^T$ and $R=R^T$ are the weighting matrices. In order to convert Eq. \eqref{eq:obj-func} into a standard QP problem with control $U$ as the optimization variable, some refactoring of Eq. \eqref{eq:obj-func} is necessary.

Note the compact form of the system dynamics for the next $h$ time steps in Eq. \eqref{eq:mpc-matrix}, and assume that $$E=A_{qp}x_0 - X_d$$ where $E$ denotes "error" so to speak; one can refactor Eq. \eqref{eq:obj-func} to get $$ \begin{split} f(U) &= (X-X_d)^T Q (X-X_d) + U^T R U \\[1.5ex] &= (A_{qp}x_0 - X_d + B_{qp}U )^T Q (A_{qp}x_0 - X_d + B_{qp}U ) + U^T R U \\[1.5ex] &= (E + B_{qp}U )^T Q (E + B_{qp}U ) + U^T R U \\[1.5ex] &= (B_{qp}U )^T Q (B_{qp}U) + 2 E^TQ (B_{qp}U) + U^T R U + \cancelto{0}{E^T Q E} \\[1.5ex] &= U^T B_{qp}^T Q B_{qp}U + 2 E^TQ B_{qp}U + U^T R U \\[1.5ex] &= \frac{1}{2} U^T 2(B_{qp}^T Q B_{qp} + R) U + U^T (2E^TQ B_{qp})^T \\[1.5ex] &= \frac{1}{2} U^T H U + U^T g \end{split} $$ where $$ \label{eq:hessian-gradient} \begin{split} H &= 2(B_{qp}^T Q B_{qp} + R) \\[1.5ex] g &= (2E^TQ B_{qp})^T = 2 B_{qp}^T Q (A_{qp}x_0 - X_d) \end{split} $$ That is, in standard QP form, the MPC problem can be reformulated as $$ \begin{aligned} \min_{U} \quad f(U) &= \frac{1}{2} U^T H U + U^T g \\ \textrm{s.t.} \quad \underline{c} &\leq CU \leq \overline{c} \\ \underline{u} &\leq U \leq \overline{u} \end{aligned} $$ where $C$ is some constraint matrix, if applicable, $\underline{c}, \overline{c}$ and $\underline{u}, \overline{u}$ are the lower and upper bounds of constrained and unconstrained control input, respectively. And one only takes from the optimal $U$ the first block of size $u_k$, as the optimal control input in the MPC sense to the system at time $k$.

Dimensionality

Refer to State Space Representation, in general, in the original problem $$x_{k+1} = A_k x_k + B_k u_k$$ the dimensions are $$ x_i\in\mathbb{R}^{s \times 1}, A_i\in\mathbb{R}^{s \times s}, B_i\in\mathbb{R}^{s \times c}, u_i\in\mathbb{R}^{c \times 1} $$ where $s, c$ stand for states and controls, respectively.

In the QP problem $$ X = A_{qp} x_0 + B_{qp} U $$ with an $h$-step horizon, the dimensions are then $$ X \in \mathbb{R}^{S \times 1}, A_{qp}\in\mathbb{R}^{S \times s}, B_{qp}\in\mathbb{R}^{S \times C}, U\in\mathbb{R}^{C \times 1}, x_0\in\mathbb{R}^{s \times 1} $$ where $S=hs$ and $C=hc$ stand for QP states and QP controls, respectively; and $x_0$ is the initial state for each MPC iteration.

Note that the QP problem is really solving the the optimal control $U$, thus, we define $$\nu = C$$ to make the intent explicity where $\nu$ stands for the number of optimization variables, which is the same as the number of QP controls.

Then, for the Hessian and gradient $$ \begin{split} H &= 2(B_{qp}^T Q B_{qp} + R) \\[1.5ex] g &= 2 B_{qp}^T Q (A_{qp}x_0 - X_d) \end{split} $$ one has $$ Q\in\mathbb{R}^{S \times S}, R\in\mathbb{R}^{\nu \times \nu}, H\in\mathbb{R}^{\nu \times \nu}, g\in\mathbb{R}^{\nu \times 1}, X_d \in \mathbb{R}^{S \times 1} $$

In addition, for the inequality constraints $$ \underline{c} \leq CU \leq \overline{c} $$ optionally, one has $$ C \in \mathbb{R}^{\kappa \times \nu} $$ where $\kappa$ is the number of constraints. When no constraints are present, one can set $$ C = \begin{bmatrix} 0 & 0 & \cdots & 0 \end{bmatrix} $$ and provide a trivial inequality constraint $$ -\epsilon \leq CU \leq \epsilon $$ to make the formulation compatible with that of standard QP solvers, where $\epsilon \in \mathbb{R}^+$ is some small number.

Example

Recall the trivial 1-DOF unit mass system example in Numerical Integration via Runge-Kutta, $$ \label{eq:continuous-time} \dot{\mathbf{x}} = A\mathbf{x} + Bu $$ with $$ \mathbf{x} = \begin{bmatrix} \dot{x} \\ x \end{bmatrix}, \quad A = \begin{bmatrix} 0 & 0 \\ 1 & 0 \end{bmatrix}, \quad B = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $$

In order to apply the discrete time MPC method to continuous time system in \eqref{eq:continuous-time}, one needs to discretize the state space equation as follows. $$ \dot{\mathbf{x}} = \frac{\mathbf{x}_{k+1}-\mathbf{x}_k}{\Delta_T} = A_k \mathbf{x}_k + B_k u_k = A\mathbf{x} + Bu $$ Rearrange terms gives $$ \begin{split} \mathbf{x}_{k+1} &= (I + A_k \Delta_T)\mathbf{x}_k + B_k \Delta_T u_k \\ \mathbf{x}_{k+1} &= \mathcal{A}_k \mathbf{x}_k + \mathcal{B}_k u_k \end{split} $$ where $$ \label{eq:discretized-a-b} \begin{split} \mathcal{A}_k &= I + A_k \Delta_T \\ \mathcal{B}_k &= B_k \Delta_T \end{split} $$

In addition, the original desired target in Numerical Integration via Runge-Kutta is static. Yet, MPC problems require a desired trajectory along a prediction horizon of length $h$. For that, we can concatenate a set of $h$ static targets to get the desired trajectory $$ X_d = \begin{bmatrix} \mathbf{x}_{\text{des}}^T & \mathbf{x}_{\text{des}}^T & \cdots & \mathbf{x}_{\text{des}}^T \end{bmatrix}^T $$ With the above refactoring, the problem is set for model predictive control.

C++ Implementation

First of all, MPC requires large-scale matrix computation than regular control formation that only takes current system dynamics into consideration. When implementing MPC problems, there are techniques to reduce repetitive computation that may occur when constructing \eqref{eq:aqp-bqp} and \eqref{eq:hessian-gradient}.

For example, when $A, B$ or $\mathcal{A}, \mathcal{B}$ are fixed, as in many systems. One can first compute the following complementary matrices $$ A_{\text{qp-comp}} = \begin{bmatrix} I \\ A^1 \\ A^2 \\ \vdots \\ A^{h} \end{bmatrix}, \quad B_{\text{qp-comp}} = B_{qp}^T Q $$ where $$A_{\text{qp-comp}} \in \mathbb{R}^{\mathcal{S} \times s}$$ with $\mathcal{S} = (h+1)s$. Then, $A_{qp}$ is a block of $A_{\text{qp-comp}}$ and all products of $A$ or $\mathcal{A}$ in $B_{qp}$ can be found from $A_{\text{qp-comp}}$ without recalculation, and $$B_{\text{qp-comp}} \in \mathbb{R}^{C\times S}$$ is a common element in both $H$ and $g$.

Note also that, in order to apply fine-grained individual control of position and velocity tracking errors, one can use $$ Q = \text{diag}\{\Theta, \Theta, \cdots, \Theta\}$$ where $$ \Theta = \text{diag}\{q_{\dot{x}}, q_{x}\} $$ with $q_{\dot{x}}, q_{x}$ represent penalty for velocity and position tracking errors, respectively.

With the above formulation and computation techniques, the C++ implementation of the trivial problem with MPC via QP is shown below.

Note that for problems with fixed matrix dimensions, like in the above example, the program would be much faster if one employs fix-sized Eigen matrices, which avoids heap memory allocation. For simplicity, the above code snippet uses dynamic-sized matrices.

Results Validation

Plots generated by data produced from the above code and the corresponding parameter selection are shown below.

$‎Check network for missing images.$

From the plots, one can observe that: (1) the tracking error and objective value go to zero without any overshoot; (2) the control effort is bounded by its limits. Hence, they validate the MPC via QP solution.

Observations

Comparing with the classic control solution in Numerical Integration via Runge-Kutta, it's quite interesting to see that with very limited maximum control effort (15 vs. 1), the MPC via QP controller drives the system error to zero reasonably fast without any overshoot.

In addition, observe that the desired trajectory is actually a fixed point with zero velocity and unit position. Since larger velocity can drive position error to zero faster, one may want to put more penalty to position error and small penalty to velocity error. Yet because the destination desired velocity is zero, it conflicts with the zero velocity final states condition. Hence, in the objective value plot, one can see that there are two peaks. It can be seen as, first, the system state is far away from the desired destination. Then, the system uses large velocity to get the position close the the desired state with very fast, during the process, large velocity causes increase in objective value and hence the second peak. Once the system gets close enough to the desired position, it starts to reduce the its velocity while maintaining the desired position, and finally makes the objective value go to zero.

Since the discretized system dynamics in \eqref{eq:discretized-a-b} is somewhat proportional to the sample time $\Delta_T$. The sampling time $\Delta_T$ hence has the characteristics of that of control gain. Choosing large sampling time will produce large $\mathcal{A}_k$ and $\mathcal{B}_k$, it thus behaves like choosing large control gain. However, larger sampling time also means lower system discretization precision.

More discussions can be found in the references mentioned at the beginning of this post.