[[Noteworthy Uses]]:: *[[Noteworthy Uses]]*
[[Proved By]]:: *[[Proved By|Crucial Dependencies]]*
----
Recall the [[optimal soft-margin linear classifier|OSM hyperplane]], parameterized by the solution $\v u ^{*}:=(\v w^{*}, b ^{*}, \b \xi^{*})$ of the [[quadratic program]] $\begin{align}& \min_{\v w, b, \b \xi} \frac{1}{2}\|\v w\|^{2} + \frac{C}{n} \sum_{i=1}^{n} \xi_{i} \\
\text{ s.t. } & y_{i}(\v w ^{\top}\v x_{i} + b) \geq 1- \xi_{i} \ \fa i \in [n], \\& \xi_{i} \geq 0 \ \ \fa i \in [n].
\end{align}$
> [!theorem] Theorem. ([[support vector machine]])
> [[Strong duality]] holds, such that the [[optimal soft-margin linear classifier|OSM hyperplane]] can be [[kernel|kernelized]] by solving the [[dual optimization problem|dual]] $\begin{align}\max_{\b \alpha} &-\frac{1}{2} \sum_{i,j=1}^{n}\alpha_{i} \alpha_{j} y_{i} y_{j} \langle \v x_{i}, \v x_{j} \rangle + \sum_{i=1}^{n} \alpha _{i} \\\text{s.t.} & \sum_{i=1}^{n} \alpha_{i}y_{i}=0 \\
& 0 \leq \alpha_{i} \leq \frac{C}{n}, \ \ \fa i \in [n].\end{align}$ The [[classifier]] is expressed $f(\v x)=\text{sgn}\left( \sum_{i=1}^{n} \alpha^{*}_{i} y_{i} \langle \v x _{i}, \v x \rangle + b ^{*} \right).$
For any $j$ with $0 < a_{j}^{*} < \frac{C}{n}$, the offset is given by $b ^{*}=y_{j}-\sum_{i=1}^{n} \alpha_{i}^{*} y_{i} \langle \v x_{i}, \v x_{j} \rangle. $
This is clearly kernelizable, and upon swapping $\langle \v x_{i}, \v x \rangle$ for $k(\v x _{i}, \v x)$ the [[classifier]] is called a **Support Vector Machine.**
> [!proof]- Proof. ([[support vector machine]])
>
> This is a [[convex function|convex optimization problem]] and the constraint functions are all affine, hence [[strong duality]] holds: the solution $d ^{*}$ to the [[dual optimization problem]] will match the solution $p ^{*}$ do the [[primal optimization problem]].
>
> **First**, we form the [[Lagrangian]], remarking that there are no $h_{i}$s; just two sets of inequality constraints corresponding to [[Lagrangian|dual variables]] $\b \lambda = \begin{bmatrix}\b \alpha \\ \b \beta\end{bmatrix}$. The [[Lagrangian]] is thus $L(\v w, b, \b \xi, \b \alpha, \b \beta)= \frac{1}{2} \|\v w\|^{2} + \frac{C}{n}\sum_{i=1}^{n} \xi_{i} - \sum_{i=1}^{n} \alpha_{i}\big(y_{i}(\langle \v w , x_{i}\rangle + b)-1+\xi_{i} )\big)-\sum_{i=1}^{n} \beta_{i}$
>
> which upon grouping in terms of the three primal variables is seen to be
>
> $\begin{align} L(\v w, b, \b \xi, \b \alpha, \b \beta) =& \frac{1}{2} \|\v w\|^{2} - \sum_{i=1}^{n} \alpha_{i}y_{i} \langle \v w, \v x_{i} \rangle \\
> + & b \sum_{i=1}^{ n} \alpha_{i} y_{i} \\
> + & \sum_{i=1}^{n} \xi_{i} \left( \frac{C}{n} - \alpha_{i} - \beta_{i}\right).\end{align}$
>
> **Second**, we seek to minimize the [[dual optimization problem|Lagrange dual function]] given arbitrary [[Lagrangian|dual variables]], by [[KKT Conditions]] we must have $\begin{align}
> \frac{ \partial L }{ \partial \v w } =\v w - \sum_{i=1}^{n} \alpha_{i}y_{i}x_{i}=0 \\
> \frac{ \partial L }{ \partial b } = \sum_{i=1}^{n} \alpha _{i} y_{i} = 0 \\
> \frac{ \partial L }{ \partial \xi _{i}} = \frac{C}{n}- \alpha_{i} - \beta_{i} = 0 \ \ \fa i \in [n]. \end{align}$
> **Third**, we solve for and plug into $L$ the optimal $\v w = \sum_{i=1}^{n} \alpha _i y_{i} x_{i}$, and note that if we do note enforce the constraints implied by the second two equations above, $\alpha_{i}$ can be taken arbitrarily large in the maximum. So the [[dual optimization problem|lagrange dual function]] is
> ![[CleanShot 2023-04-01 at
[email protected]]]
> assuming that $\sum_{i=1}^{n}\alpha _i y_=0$ and $\frac{C}{n} - \alpha_{i}-\beta_{i}=0$ for all $i \in [n]$. If the constraints aren't satisfied it is $-\infty$.
>
> **Fourth**, we can hence write the [[dual optimization problem]] as ![[CleanShot 2023-04-02 at
[email protected]]]
> but $\b \beta$ may be eliminated yielding the problem of the theorem statement.
>
> **Fifth**, we turn to expressing the final [[classifier]]. [[strong duality]] holds and so $(\v u ^{*}, \b \alpha^{*})$ satisfy the [[KKT Conditions]]. By the first condition $\v w ^{*}=\sum_{i=1}^{n} \alpha_{i} ^{*} y_{i} x_{i}$. The rest follows.
----
####
-----
#### References
> [!backlink]
> ```dataview
TABLE rows.file.link as "Further Reading"
FROM [[]]
FLATTEN file.tags
GROUP BY file.tags as Tag
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```