---- > [!theorem] Theorem. ([[shape-alignment procrustes problem]]) > Suppose $A,B \in \mathbb{R}^{d \times n}$ and that we would like to (perhaps improperly) rotate, translate, and scale $A$ to best align with $B$. One way to express this mathematically is by computing the matrix $\hat{A} \in \mathbb{R}^{d \times n}$ defined as $\hat{A}=\alpha Q (A- \b \lambda \b 1_{n}')+ \b \mu \b 1_{n}', (1)$ > where $Q \in \mathbb{R}^{d \times d}$ is [[orthogonal matrix|orthogonal]], $\b \lambda, \b \mu \in \mathbb{R}^{d}$ are [[vector|translation vectors]], and $\alpha \in \mathbb{R}$ is a scalar. Geometrically, $(1)$ corresponds to a translation of the [[submodule generated by a subset|column span]] of $A$ by $-\b \lambda$, rotation then scaling of the result by $\alpha Q$, then translation by $\b \mu$. However, optimizing *both* $\b \lambda$ and $\b \mu$ is redundant, since replacing $\b \lambda \to \b \lambda + \b \Delta$ and $\b \mu \to \b \mu + \alpha Q \b \Delta$ won't change $\hat{A.}$ Thus WLOG we choose $\b \lambda := \b \mu_{A} := \frac{1}{n}A \b 1_{n}=\frac{1}{n} \sum_{k=1}^{n}A_{[:,k]},$ > the [[center of mass|centroid]] of the columns of $A$. Then we optimize the remaining parameters as $(\alpha_{*}, \b \mu_{*}, Q_{*})=\arg \min_{\alpha \geq 0, \b \mu \in \mathbb{R}^{d}, Q:Q'Q=I_{d}} \|B-\alpha Q(A - \b \mu_{A} \b 1_{n}')-\b \mu \b 1_{n}'\|_{F}, \ (2)$ > where $\|\cdot\|_{F}$ is the [[Frobenius norm]]. > \ > The solution, as shown below, is $\begin{align} \b \mu_{*}= & \b \mu_{B}, \\ Q_{*}= & UV' \\ \alpha_{*}= & \frac{\text{Tr}(B_{0}A_{0}'Q_{*}')}{\text{Tr}(A_{0}A_{0}')}, \end{align}$ where $\b \mu_{B}:= \frac{1}{n}B \b 1_{n}$ is the [[center of mass|centroid]] of (the columns of) $B$ and $U\Sigma V'$ is an [[Singular Value Decomposition of a Matrix|SVD]] of $B_{0}A_{0}'$ where we have defined the "de-meaned [[matrix|matrices]]" $B_{0}:= B- \b \mu_{B} \b 1_{n}', \ \ A_{0} := A-\b \mu_{A} \b 1_{n}'.$ > [!proof]- Proof. ([[shape-alignment procrustes problem]]) > > > We will no longer use boldface for $\b \mu = \mu$. > > We proceed to show the following: > 1. For any fixed choice of $\alpha$ and $Q$, $\mu_{*}=\mu_{B}$ minimizes $(2)$. > 2. For any fixed choice of $\alpha$ and $\mu=\mu_{*}=\mu_{B}$, $Q_{*}=UV'$ minimizes $(2)$. > 3. $\alpha=\alpha_{*}$ minimizes $(2)$ when $\mu=\mu_{*}$ and $Q=Q_{*}$. > > **1.** Let $\alpha$ and $Q$ be fixed. Defining $Z:=B-\alpha Q(A-\mu_{A}\b 1_{n}')$, $(2)$ can now be written as the program $\mu_{*}= \arg \min_{\mu \in \mathbb{R}^{d}} \|Z-\mu \b 1_{n}'\|_{F}^{2} \ (3)$ > (squaring won't change the minimizer) and we want to show $\mu_{*}=\mu_{B}$. Row $j$ of $\mu \b 1_{n}'$ is just $[\mu_{j} \ \dots \ \mu_{j}]=\b 1_{n}' \mu _j$. > > Using the relationship between [[Frobenius norm]] and [[Euclidean inner product|Euclidean norm]] rewrite $(3)$ as $\begin{align} > \mu_{*}= & \arg \min_{\mu \in \mathbb{R}^{d}} \sum_{j=1}^{n} \|[Z - \mu \b 1 _{n}']_{j, :}\|_{2}^{2} \\ > = & \arg \min _{\mu \in \mathbb{R} ^{d}} \sum_{j=1}^{n} \|Z_{j, :} - \v 1_{n}' \mu _{j}\|_{2}^{2} \\ > = & \arg \min_{\mu \in \mathbb{R}^{d}} \sum_{j=1}^{n} \|\overbrace{\b 1 _{n}'}^{A} \overbrace{\mu_{j}}^{x_{j}} - \overbrace{Z_{j,:}}^{y}\|_{2}^{2}, > \end{align}$ > where we see that the program is [[additively separable]] and so it suffices to solve individually for each $u_{j}$. But that is just a collection of $n$ [[least squares]] problems arg-minimizing $\|Ax-y\|_{2}^{2}$ and so each optimal $\mu_{j}$ is given by the [[Moore-Penrose pseudoinverse]] solution $\b 1_{n}^{+}Z_{j,:}$. Since $\b 1_{n}^{}$ has [[compact svd]] $\frac{\b 1_{n} }{\sqrt{ n } }\cdot \sqrt{ n } \cdot [1]$, $\b 1_{n}^{+}=\b 1_{n}' /n$. Hence the optimal $\mu_{j}=\frac{1}{n}\sum_{k}Z_{j,k}$ and therefore $\mu_{*} = \begin{bmatrix} > \frac{1}{n}\sum_{k=1}^{d} Z_{1, k} \\ \vdots \\ \frac{1}{n} \sum_{k=1}^{d}Z_{n, k} > \end{bmatrix}= \frac{1}{n} \sum_{k=1}^{d} Z_{:, k}=\frac{1}{n} Z \b 1_{n}.$ > Finally, $\begin{align} > \mu_{*}= & \frac{1}{n}Z \b 1_{n} \\ > = & \frac{1}{n} (B-\alpha Q(A-\mu_{A}\b 1_{n}')) \b 1_{n} \\ > = & \frac{1}{n} \big( B \b 1_{n} - \alpha Q (\frac{1}{n} A \b 1 _{n} - \frac{1}{n} \mu_{A}\b 1_{n}' \b 1_{n}) \big) \\ > = & \frac{1}{n} \big( B \b 1_{n} - \alpha Q (\mu_{A} - \mu_{A}) \big) \\ > = & \frac{1}{n}B \b 1_{n} \\ > = & \mu_{B} > \end{align},$ > as desired. > > **2.** With $\mu=\mu_{*}$ and fixed $\alpha$, $(2)$ is an [[orthogonal procrustes problem]]: $\begin{align} > Q_{*}= & \arg \min_{Q:Q'Q=I} \| \textcolor{Thistle}{B}-\alpha Q(A-\mu_{A}\b 1_{n}')\textcolor{Thistle}{-\mu_{B}\b 1_{n}'}\|_{F} \\ > = & \arg \min_{Q: Q'Q=I} \|B_{0} - \alpha Q A_{0}\| \\ > = & \arg \min_{Q: Q'Q=I} \|B_{0} - QA_{0}\|, > \end{align}$ > where the final step is true due to the [[orthogonal procrustes problem|scale invariance property of the orthogonal procrustes problem]]. We know the solution is $Q_{*}=UV'$, where $U$ and $V$ are as in the theorem statement. > > **3.** Now $(2)$ looks like $\begin{align} > \alpha_{*}= & \arg \min_{\alpha \geq 0} \|B_{0} - \alpha \overbrace{UV' A_{0}}^{:= W}\|_{F}^{2} \\ > = & \arg \min _{\alpha \geq 0} \text{Tr}\big((B_{0}' - W'\alpha)(B_{0} - W\alpha)\big) \\ > = & \arg \min _{\alpha \geq 0} \text{Tr} \big( B_{0}'B_{0} -\alpha B_{0}'W - \alpha W'B_{0} + \alpha^{2} W'W \big) \\ > = & \arg \min _{\alpha \geq 0} \text{Tr}(B_{0}' B_{0}) - 2 \alpha \text{Tr}(B_{0}'W) + \alpha^{2} \text{Tr}(W'W). > \end{align}$ > The objective is a quadratic in $\alpha$ (hence differentiable and [[convex function|convex]]), we [[derivative|differentiate]] and [[characterization of extrema for differentiable convex functions|set to 0]] thus: $0 = -2 \text{Tr}(B_{0}'W) + 2 \alpha \text{Tr}(W'W)$ > from which it is clear that $\alpha_{*}=\frac{\text{Tr}(B_{0}' W)}{\text{Tr}(W'W)}=\frac{\text{Tr}(B_{0}'\overbrace{UV'}^{Q_{*}}A_{0})}{\text{Tr} (A_{0}' VU'UV' A_{0})}= \frac{\text{Tr}(B_{0}'Q_{*}A_{0})}{\text{Tr}(A_{0} A_{0}')},$ > from which the result follows from the fact the the [[trace of a matrix|trace]] of a [[matrix]] equals the trace of its [[matrix transpose|transpose]]. ---- #### ----- #### References > [!backlink] > ```dataview TABLE rows.file.link as "Further Reading" FROM [[]] FLATTEN file.tags GROUP BY file.tags as Tag > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```