Examples:: [[vector norm is convex]]
Nonexamples:: *[[Nonexamples]]*
Constructions:: *[[Constructions|Used in the construction of...]]*
Specializations:: *[[Specializations]]*
Generalizations:: *[[Generalizations]]*
Justifications and Intuition:: *[[Justifications and Intuition]]*
Properties:: [[local mins of convex functions are global mins]], [[characterization of extrema for differentiable convex functions]], [[convex image of Ax + b is a convex function]]
Sufficiencies:: *[[Sufficiencies]]*
Equivalences:: [[convex iff graph lies above all its tangents]], [[C2 function is convex iff its Hessian is everywhere PSD]]
----
- Let $X$ be a [[convex set|convex subset]] of a [[vector space]];
- Let $f: X \to \rr$.
> [!definition] Definition. ([[convex function]])
>
We call $f$ **convex** if $f\big(t \v u + (1-t)\v v\big) \leq tf(\v u) + (1-t)f(\v v)$
for all $\v u, \v v \in X$ and $t \in [0,1)$.
\
In other words, $f$ is **convex** if the image of any line from $a$ to $b$ in $X$ under $f$ is bounded above by the line from $f(a)$ to $f(b)$: ![[CleanShot 2023-01-11 at
[email protected]]]
> [!equivalence] Equivalence on $\mathbb{R}$. (The three-slope inequality)
> Assume $X=\mathbb{R}$. A function $f:\mathbb{R} \to \mathbb{R}$ is convex if and only if for all $x_{1}<x_{2}<x_{3}$
>
> $\underbrace{ \frac{f(x_{2})-f(x_{1})}{x_{2}-x_{1}} }_{ \text{1st slope} }\leq \underbrace{ \frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}} }_{ \text{2nd slope} } \leq \underbrace{ \frac{f(x_{3})-f(x_{2})}{x_{3}-x_{2}} }_{ \text{3rd slope} }.$
>
>
>
>
> > [!proof]- Proof.
> > > We start by writing $x_{2}$ as a convex combination of $x_{1}$ and $x_{3}$. By easily solving $x_{2}= t x_{3} + (1-t)x_{1}$ for $t \in (0,1)$, we obtain $t=\frac{x_{2}-x_{1}}{x_{3}-x_{1}}$. Now $f(x_{2}) \leq t f(x_{3}) + (1-t) f(x_{1})$
> > > implying $f(x_{2})-f(x_{1}) \leq tf(x_{3})-tf(x_{1})=\frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}}(x_{2}-x_{1})$. The first inequality follows. To obtain the second inequality we again write $x_{2}$ as a convex combination of $x_{3}$ and $x_{1}$ (i.e., switch the roles) and follow the same procedure.
> > >
> > > The converse is similarly straightfoward (omitted).
>
> [!justification]
> - A [[norm]] $\|\cdot\|$ on a [[vector space]] $V$ is a convex function. Example: the [[expectation]] function $\mathbb{E}$ in [[probability|probability]], being an $L^{1}$-[[Lp-norm|norm]], is convex (in particular, Jensen's Inequality $\varphi(\mathbb{E}X) \leq \mathbb{E}[\varphi(X)]$ holds)
>- More generally, [[seminorm|seminorms]] are convex
>- More generally still, [[sublinear functional|sublinear functionals]] are convex
^justification
> [!intuition]
> Geometrically, **convexity** means that for any two points on the graph of the function, the line segment connecting them never 'dips below' the graph.
> ![[CleanShot 2023-01-11 at
[email protected]]]
> [!basicexample] Example of second-order conditions under which (strict) convexity holds.
> **Question.** Consider the function $f(\v x) = \frac{1}{2}\v x^{\top}A\v x + \v b^{\top}\v x + c,$ where $A\in \rr ^{d \times d}$ is a [[symmetric matrix]]. Derive the [[hessian|Hessian]] of $f$. Under what conditions on $A$ is $f$ [[convex function|convex]]? [[strictly convex function|Strictly convex]]?
\
**Solution**.
We first compute the [[derivative]] of $f$, by the [[derivative of a quadratic form]] this is $Df(\v x)= \frac{1}{2} \v x^{\top}(A+A^{\top})+\v b^{\top}$. Taking the [[gradient|gradient]] (i.e., [[matrix transpose|transpose]]) of this yields (recall that $A$ is a [[symmetric matrix]]) $\begin{align}\nabla f(\v x)= & \left( \frac{1}{2}\v x^{\top}(A+A^{\top})+\v b^{\top} \right)^{\top} \\
= & (A^{\top})^{\top}(\v x^{\top})^{\top} + \v b \\
= & A\v x + \v b.
\end{align}$
Next we compute [[hessian]] $f$ as follows: we have that $\nabla f(\v x)=\begin{bmatrix}b_{1}+\sum_{j=1}^{n} x_{j}A_{1j} \\
\vdots \\
b_{d} + \sum_{j=1}^{n} x_{j} A_{dj}
\end{bmatrix}= \begin{bmatrix}\frac{ \partial f }{ \partial x_{1} } \\
\vdots \\
\frac{ \partial f }{ \partial x_{d} }
\end{bmatrix},$
and so it follows that $\begin{align}\nabla^{2}f(\v x) =
& \begin{bmatrix}\frac{ \partial f }{ \partial x_{1} }\left( b_{1}+\sum_{j=1}^{n}x_{j}A_{1j} \right) & \dots & \frac{ \partial f }{ \partial x_{1} }\left(b_{d}+\sum_{j=1}^{n} x_{j} A_{dj}\right) \\
\vdots & \ddots & \vdots \\
\frac{ \partial f }{ \partial x_{d} } \left( b_{1}+\sum_{j=1}^{n}x_{j}A_{1j} \right) & \dots & \frac{ \partial f }{ \partial x_{d} } \left(b_{d}+\sum_{j=1}^{n} x_{j} A_{dj}\right)
\end{bmatrix} \\
= & A.
\end{align}$
By [[C2 function is convex iff its Hessian is everywhere PSD]] we see that $f$ is [[convex function|convex]] provided that $A$ is a [[positive semidefinite matrix]].
\
By [[C2 function is strictly convex if its hessian is everywhere positive definite]] we see that $f$ is [[strictly convex function|strictly convex]] provided that $A$ is a [[positive definite matrix]].
----
####
----
#### References
> [!backlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM [[]]
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```