convex function - Jacob Hume

Examples:: [[vector norm is convex]] Nonexamples:: *[[Nonexamples]]* Constructions:: *[[Constructions|Used in the construction of...]]* Specializations:: *[[Specializations]]* Generalizations:: *[[Generalizations]]* Justifications and Intuition:: *[[Justifications and Intuition]]* Properties:: [[local mins of convex functions are global mins]], [[characterization of extrema for differentiable convex functions]], [[convex image of Ax + b is a convex function]] Sufficiencies:: *[[Sufficiencies]]* Equivalences:: [[convex iff graph lies above all its tangents]], [[C2 function is convex iff its Hessian is everywhere PSD]] ---- - Let $X$ be a [[convex set|convex subset]] of a [[vector space]]; - Let $f: X \to \rr$. > [!definition] Definition. ([[convex function]]) > We call $f$ **convex** if $f\big(t \v u + (1-t)\v v\big) \leq tf(\v u) + (1-t)f(\v v)$ for all $\v u, \v v \in X$ and $t \in [0,1)$. \ In other words, $f$ is **convex** if the image of any line from $a$ to $b$ in $X$ under $f$ is bounded above by the line from $f(a)$ to $f(b)$: ![[CleanShot 2023-01-11 at [email protected]]] > [!equivalence] Equivalence on $\mathbb{R}$. (The three-slope inequality) > Assume $X=\mathbb{R}$. A function $f:\mathbb{R} \to \mathbb{R}$ is convex if and only if for all $x_{1}<x_{2}<x_{3}$ > > $\underbrace{ \frac{f(x_{2})-f(x_{1})}{x_{2}-x_{1}} }_{ \text{1st slope} }\leq \underbrace{ \frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}} }_{ \text{2nd slope} } \leq \underbrace{ \frac{f(x_{3})-f(x_{2})}{x_{3}-x_{2}} }_{ \text{3rd slope} }.$ > > > > > > [!proof]- Proof. > > > We start by writing $x_{2}$ as a convex combination of $x_{1}$ and $x_{3}$. By easily solving $x_{2}= t x_{3} + (1-t)x_{1}$ for $t \in (0,1)$, we obtain $t=\frac{x_{2}-x_{1}}{x_{3}-x_{1}}$. Now $f(x_{2}) \leq t f(x_{3}) + (1-t) f(x_{1})$ > > > implying $f(x_{2})-f(x_{1}) \leq tf(x_{3})-tf(x_{1})=\frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}}(x_{2}-x_{1})$. The first inequality follows. To obtain the second inequality we again write $x_{2}$ as a convex combination of $x_{3}$ and $x_{1}$ (i.e., switch the roles) and follow the same procedure. > > > > > > The converse is similarly straightfoward (omitted). > > [!justification] > - A [[norm]] $\|\cdot\|$ on a [[vector space]] $V$ is a convex function. Example: the [[expectation]] function $\mathbb{E}$ in [[probability|probability]], being an $L^{1}$-[[Lp-norm|norm]], is convex (in particular, Jensen's Inequality $\varphi(\mathbb{E}X) \leq \mathbb{E}[\varphi(X)]$ holds) >- More generally, [[seminorm|seminorms]] are convex >- More generally still, [[sublinear functional|sublinear functionals]] are convex ^justification > [!intuition] > Geometrically, **convexity** means that for any two points on the graph of the function, the line segment connecting them never 'dips below' the graph. > ![[CleanShot 2023-01-11 at [email protected]]] > [!basicexample] Example of second-order conditions under which (strict) convexity holds. > **Question.** Consider the function $f(\v x) = \frac{1}{2}\v x^{\top}A\v x + \v b^{\top}\v x + c,$ where $A\in \rr ^{d \times d}$ is a [[symmetric matrix]]. Derive the [[hessian|Hessian]] of $f$. Under what conditions on $A$ is $f$ [[convex function|convex]]? [[strictly convex function|Strictly convex]]? \ **Solution**. We first compute the [[derivative]] of $f$, by the [[derivative of a quadratic form]] this is $Df(\v x)= \frac{1}{2} \v x^{\top}(A+A^{\top})+\v b^{\top}$. Taking the [[gradient|gradient]] (i.e., [[matrix transpose|transpose]]) of this yields (recall that $A$ is a [[symmetric matrix]]) $\begin{align}\nabla f(\v x)= & \left( \frac{1}{2}\v x^{\top}(A+A^{\top})+\v b^{\top} \right)^{\top} \\ = & (A^{\top})^{\top}(\v x^{\top})^{\top} + \v b \\ = & A\v x + \v b. \end{align}$ Next we compute [[hessian]] $f$ as follows: we have that $\nabla f(\v x)=\begin{bmatrix}b_{1}+\sum_{j=1}^{n} x_{j}A_{1j} \\ \vdots \\ b_{d} + \sum_{j=1}^{n} x_{j} A_{dj} \end{bmatrix}= \begin{bmatrix}\frac{ \partial f }{ \partial x_{1} } \\ \vdots \\ \frac{ \partial f }{ \partial x_{d} } \end{bmatrix},$ and so it follows that $\begin{align}\nabla^{2}f(\v x) = & \begin{bmatrix}\frac{ \partial f }{ \partial x_{1} }\left( b_{1}+\sum_{j=1}^{n}x_{j}A_{1j} \right) & \dots & \frac{ \partial f }{ \partial x_{1} }\left(b_{d}+\sum_{j=1}^{n} x_{j} A_{dj}\right) \\ \vdots & \ddots & \vdots \\ \frac{ \partial f }{ \partial x_{d} } \left( b_{1}+\sum_{j=1}^{n}x_{j}A_{1j} \right) & \dots & \frac{ \partial f }{ \partial x_{d} } \left(b_{d}+\sum_{j=1}^{n} x_{j} A_{dj}\right) \end{bmatrix} \\ = & A. \end{align}$ By [[C2 function is convex iff its Hessian is everywhere PSD]] we see that $f$ is [[convex function|convex]] provided that $A$ is a [[positive semidefinite matrix]]. \ By [[C2 function is strictly convex if its hessian is everywhere positive definite]] we see that $f$ is [[strictly convex function|strictly convex]] provided that $A$ is a [[positive definite matrix]]. ---- #### ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```