support vector machine

[[Noteworthy Uses]]:: *[[Noteworthy Uses]]* [[Proved By]]:: *[[Proved By|Crucial Dependencies]]* ---- Recall the [[optimal soft-margin linear classifier|OSM hyperplane]], parameterized by the solution $\v u ^{*}:=(\v w^{*}, b ^{*}, \b \xi^{*})$ of the [[quadratic program]] $\begin{align}& \min_{\v w, b, \b \xi} \frac{1}{2}\|\v w\|^{2} + \frac{C}{n} \sum_{i=1}^{n} \xi_{i} \\ \text{ s.t. } & y_{i}(\v w ^{\top}\v x_{i} + b) \geq 1- \xi_{i} \ \fa i \in [n], \\& \xi_{i} \geq 0 \ \ \fa i \in [n]. \end{align}$ > [!theorem] Theorem. ([[support vector machine]]) > [[Strong duality]] holds, such that the [[optimal soft-margin linear classifier|OSM hyperplane]] can be [[kernel|kernelized]] by solving the [[dual optimization problem|dual]] $\begin{align}\max_{\b \alpha} &-\frac{1}{2} \sum_{i,j=1}^{n}\alpha_{i} \alpha_{j} y_{i} y_{j} \langle \v x_{i}, \v x_{j} \rangle + \sum_{i=1}^{n} \alpha _{i} \\\text{s.t.} & \sum_{i=1}^{n} \alpha_{i}y_{i}=0 \\ & 0 \leq \alpha_{i} \leq \frac{C}{n}, \ \ \fa i \in [n].\end{align}$ The [[classifier]] is expressed $f(\v x)=\text{sgn}\left( \sum_{i=1}^{n} \alpha^{*}_{i} y_{i} \langle \v x _{i}, \v x \rangle + b ^{*} \right).$ For any $j$ with $0 < a_{j}^{*} < \frac{C}{n}$, the offset is given by $b ^{*}=y_{j}-\sum_{i=1}^{n} \alpha_{i}^{*} y_{i} \langle \v x_{i}, \v x_{j} \rangle. $ This is clearly kernelizable, and upon swapping $\langle \v x_{i}, \v x \rangle$ for $k(\v x _{i}, \v x)$ the [[classifier]] is called a **Support Vector Machine.** > [!proof]- Proof. ([[support vector machine]]) > > This is a [[convex function|convex optimization problem]] and the constraint functions are all affine, hence [[strong duality]] holds: the solution $d ^{*}$ to the [[dual optimization problem]] will match the solution $p ^{*}$ do the [[primal optimization problem]]. > > **First**, we form the [[Lagrangian]], remarking that there are no $h_{i}$s; just two sets of inequality constraints corresponding to [[Lagrangian|dual variables]] $\b \lambda = \begin{bmatrix}\b \alpha \\ \b \beta\end{bmatrix}$. The [[Lagrangian]] is thus $L(\v w, b, \b \xi, \b \alpha, \b \beta)= \frac{1}{2} \|\v w\|^{2} + \frac{C}{n}\sum_{i=1}^{n} \xi_{i} - \sum_{i=1}^{n} \alpha_{i}\big(y_{i}(\langle \v w , x_{i}\rangle + b)-1+\xi_{i} )\big)-\sum_{i=1}^{n} \beta_{i}$ > > which upon grouping in terms of the three primal variables is seen to be > > $\begin{align} L(\v w, b, \b \xi, \b \alpha, \b \beta) =& \frac{1}{2} \|\v w\|^{2} - \sum_{i=1}^{n} \alpha_{i}y_{i} \langle \v w, \v x_{i} \rangle \\ > + & b \sum_{i=1}^{ n} \alpha_{i} y_{i} \\ > + & \sum_{i=1}^{n} \xi_{i} \left( \frac{C}{n} - \alpha_{i} - \beta_{i}\right).\end{align}$ > > **Second**, we seek to minimize the [[dual optimization problem|Lagrange dual function]] given arbitrary [[Lagrangian|dual variables]], by [[KKT Conditions]] we must have $\begin{align} > \frac{ \partial L }{ \partial \v w } =\v w - \sum_{i=1}^{n} \alpha_{i}y_{i}x_{i}=0 \\ > \frac{ \partial L }{ \partial b } = \sum_{i=1}^{n} \alpha _{i} y_{i} = 0 \\ > \frac{ \partial L }{ \partial \xi _{i}} = \frac{C}{n}- \alpha_{i} - \beta_{i} = 0 \ \ \fa i \in [n]. \end{align}$ > **Third**, we solve for and plug into $L$ the optimal $\v w = \sum_{i=1}^{n} \alpha _i y_{i} x_{i}$, and note that if we do note enforce the constraints implied by the second two equations above, $\alpha_{i}$ can be taken arbitrarily large in the maximum. So the [[dual optimization problem|lagrange dual function]] is > ![[CleanShot 2023-04-01 at [email protected]]] > assuming that $\sum_{i=1}^{n}\alpha _i y_=0$ and $\frac{C}{n} - \alpha_{i}-\beta_{i}=0$ for all $i \in [n]$. If the constraints aren't satisfied it is $-\infty$. > > **Fourth**, we can hence write the [[dual optimization problem]] as ![[CleanShot 2023-04-02 at [email protected]]] > but $\b \beta$ may be eliminated yielding the problem of the theorem statement. > > **Fifth**, we turn to expressing the final [[classifier]]. [[strong duality]] holds and so $(\v u ^{*}, \b \alpha^{*})$ satisfy the [[KKT Conditions]]. By the first condition $\v w ^{*}=\sum_{i=1}^{n} \alpha_{i} ^{*} y_{i} x_{i}$. The rest follows. ---- #### ----- #### References > [!backlink] > ```dataview TABLE rows.file.link as "Further Reading" FROM [[]] FLATTEN file.tags GROUP BY file.tags as Tag > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```