----
> [!theorem] Theorem. ([[shape-alignment procrustes problem]])
> Suppose $A,B \in \mathbb{R}^{d \times n}$ and that we would like to (perhaps improperly) rotate, translate, and scale $A$ to best align with $B$. One way to express this mathematically is by computing the matrix $\hat{A} \in \mathbb{R}^{d \times n}$ defined as $\hat{A}=\alpha Q (A- \b \lambda \b 1_{n}')+ \b \mu \b 1_{n}', (1)$
> where $Q \in \mathbb{R}^{d \times d}$ is [[orthogonal matrix|orthogonal]], $\b \lambda, \b \mu \in \mathbb{R}^{d}$ are [[vector|translation vectors]], and $\alpha \in \mathbb{R}$ is a scalar. Geometrically, $(1)$ corresponds to a translation of the [[submodule generated by a subset|column span]] of $A$ by $-\b \lambda$, rotation then scaling of the result by $\alpha Q$, then translation by $\b \mu$. However, optimizing *both* $\b \lambda$ and $\b \mu$ is redundant, since replacing $\b \lambda \to \b \lambda + \b \Delta$ and $\b \mu \to \b \mu + \alpha Q \b \Delta$ won't change $\hat{A.}$ Thus WLOG we choose $\b \lambda := \b \mu_{A} := \frac{1}{n}A \b 1_{n}=\frac{1}{n} \sum_{k=1}^{n}A_{[:,k]},$
> the [[center of mass|centroid]] of the columns of $A$. Then we optimize the remaining parameters as $(\alpha_{*}, \b \mu_{*}, Q_{*})=\arg \min_{\alpha \geq 0, \b \mu \in \mathbb{R}^{d}, Q:Q'Q=I_{d}} \|B-\alpha Q(A - \b \mu_{A} \b 1_{n}')-\b \mu \b 1_{n}'\|_{F}, \ (2)$
> where $\|\cdot\|_{F}$ is the [[Frobenius norm]].
> \
> The solution, as shown below, is $\begin{align}
\b \mu_{*}= & \b \mu_{B}, \\ Q_{*}= & UV' \\ \alpha_{*}= & \frac{\text{Tr}(B_{0}A_{0}'Q_{*}')}{\text{Tr}(A_{0}A_{0}')},
\end{align}$
where $\b \mu_{B}:= \frac{1}{n}B \b 1_{n}$ is the [[center of mass|centroid]] of (the columns of) $B$ and $U\Sigma V'$ is an [[Singular Value Decomposition of a Matrix|SVD]] of $B_{0}A_{0}'$ where we have defined the "de-meaned [[matrix|matrices]]" $B_{0}:= B- \b \mu_{B} \b 1_{n}', \ \ A_{0} := A-\b \mu_{A} \b 1_{n}'.$
> [!proof]- Proof. ([[shape-alignment procrustes problem]])
>
>
> We will no longer use boldface for $\b \mu = \mu$.
>
> We proceed to show the following:
> 1. For any fixed choice of $\alpha$ and $Q$, $\mu_{*}=\mu_{B}$ minimizes $(2)$.
> 2. For any fixed choice of $\alpha$ and $\mu=\mu_{*}=\mu_{B}$, $Q_{*}=UV'$ minimizes $(2)$.
> 3. $\alpha=\alpha_{*}$ minimizes $(2)$ when $\mu=\mu_{*}$ and $Q=Q_{*}$.
>
> **1.** Let $\alpha$ and $Q$ be fixed. Defining $Z:=B-\alpha Q(A-\mu_{A}\b 1_{n}')$, $(2)$ can now be written as the program $\mu_{*}= \arg \min_{\mu \in \mathbb{R}^{d}} \|Z-\mu \b 1_{n}'\|_{F}^{2} \ (3)$
> (squaring won't change the minimizer) and we want to show $\mu_{*}=\mu_{B}$. Row $j$ of $\mu \b 1_{n}'$ is just $[\mu_{j} \ \dots \ \mu_{j}]=\b 1_{n}' \mu _j$.
>
> Using the relationship between [[Frobenius norm]] and [[Euclidean inner product|Euclidean norm]] rewrite $(3)$ as $\begin{align}
> \mu_{*}= & \arg \min_{\mu \in \mathbb{R}^{d}} \sum_{j=1}^{n} \|[Z - \mu \b 1 _{n}']_{j, :}\|_{2}^{2} \\
> = & \arg \min _{\mu \in \mathbb{R} ^{d}} \sum_{j=1}^{n} \|Z_{j, :} - \v 1_{n}' \mu _{j}\|_{2}^{2} \\
> = & \arg \min_{\mu \in \mathbb{R}^{d}} \sum_{j=1}^{n} \|\overbrace{\b 1 _{n}'}^{A} \overbrace{\mu_{j}}^{x_{j}} - \overbrace{Z_{j,:}}^{y}\|_{2}^{2},
> \end{align}$
> where we see that the program is [[additively separable]] and so it suffices to solve individually for each $u_{j}$. But that is just a collection of $n$ [[least squares]] problems arg-minimizing $\|Ax-y\|_{2}^{2}$ and so each optimal $\mu_{j}$ is given by the [[Moore-Penrose pseudoinverse]] solution $\b 1_{n}^{+}Z_{j,:}$. Since $\b 1_{n}^{}$ has [[compact svd]] $\frac{\b 1_{n} }{\sqrt{ n } }\cdot \sqrt{ n } \cdot [1]$, $\b 1_{n}^{+}=\b 1_{n}' /n$. Hence the optimal $\mu_{j}=\frac{1}{n}\sum_{k}Z_{j,k}$ and therefore $\mu_{*} = \begin{bmatrix}
> \frac{1}{n}\sum_{k=1}^{d} Z_{1, k} \\ \vdots \\ \frac{1}{n} \sum_{k=1}^{d}Z_{n, k}
> \end{bmatrix}= \frac{1}{n} \sum_{k=1}^{d} Z_{:, k}=\frac{1}{n} Z \b 1_{n}.$
> Finally, $\begin{align}
> \mu_{*}= & \frac{1}{n}Z \b 1_{n} \\
> = & \frac{1}{n} (B-\alpha Q(A-\mu_{A}\b 1_{n}')) \b 1_{n} \\
> = & \frac{1}{n} \big( B \b 1_{n} - \alpha Q (\frac{1}{n} A \b 1 _{n} - \frac{1}{n} \mu_{A}\b 1_{n}' \b 1_{n}) \big) \\
> = & \frac{1}{n} \big( B \b 1_{n} - \alpha Q (\mu_{A} - \mu_{A}) \big) \\
> = & \frac{1}{n}B \b 1_{n} \\
> = & \mu_{B}
> \end{align},$
> as desired.
>
> **2.** With $\mu=\mu_{*}$ and fixed $\alpha$, $(2)$ is an [[orthogonal procrustes problem]]: $\begin{align}
> Q_{*}= & \arg \min_{Q:Q'Q=I} \| \textcolor{Thistle}{B}-\alpha Q(A-\mu_{A}\b 1_{n}')\textcolor{Thistle}{-\mu_{B}\b 1_{n}'}\|_{F} \\
> = & \arg \min_{Q: Q'Q=I} \|B_{0} - \alpha Q A_{0}\| \\
> = & \arg \min_{Q: Q'Q=I} \|B_{0} - QA_{0}\|,
> \end{align}$
> where the final step is true due to the [[orthogonal procrustes problem|scale invariance property of the orthogonal procrustes problem]]. We know the solution is $Q_{*}=UV'$, where $U$ and $V$ are as in the theorem statement.
>
> **3.** Now $(2)$ looks like $\begin{align}
> \alpha_{*}= & \arg \min_{\alpha \geq 0} \|B_{0} - \alpha \overbrace{UV' A_{0}}^{:= W}\|_{F}^{2} \\
> = & \arg \min _{\alpha \geq 0} \text{Tr}\big((B_{0}' - W'\alpha)(B_{0} - W\alpha)\big) \\
> = & \arg \min _{\alpha \geq 0} \text{Tr} \big( B_{0}'B_{0} -\alpha B_{0}'W - \alpha W'B_{0} + \alpha^{2} W'W \big) \\
> = & \arg \min _{\alpha \geq 0} \text{Tr}(B_{0}' B_{0}) - 2 \alpha \text{Tr}(B_{0}'W) + \alpha^{2} \text{Tr}(W'W).
> \end{align}$
> The objective is a quadratic in $\alpha$ (hence differentiable and [[convex function|convex]]), we [[derivative|differentiate]] and [[characterization of extrema for differentiable convex functions|set to 0]] thus: $0 = -2 \text{Tr}(B_{0}'W) + 2 \alpha \text{Tr}(W'W)$
> from which it is clear that $\alpha_{*}=\frac{\text{Tr}(B_{0}' W)}{\text{Tr}(W'W)}=\frac{\text{Tr}(B_{0}'\overbrace{UV'}^{Q_{*}}A_{0})}{\text{Tr} (A_{0}' VU'UV' A_{0})}= \frac{\text{Tr}(B_{0}'Q_{*}A_{0})}{\text{Tr}(A_{0} A_{0}')},$
> from which the result follows from the fact the the [[trace of a matrix|trace]] of a [[matrix]] equals the trace of its [[matrix transpose|transpose]].
----
####
-----
#### References
> [!backlink]
> ```dataview
TABLE rows.file.link as "Further Reading"
FROM [[]]
FLATTEN file.tags
GROUP BY file.tags as Tag
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```