----
> [!theorem] Method. ([[maximum likelihood estimation]])
> **Maximum likelihood estimation** is a method for estimating the parameters of an assumed [[probability distribution]], given a [[training data|dataset]] of observations.
> \
> This is achieved by [[global extrema|maximizing]] a [[likelihood|likelihood function]] so that, under the assumed statistical model, the [[training data|observed data]] is most probable. For a concrete example, see [[maximum likelihood estimation for logistic regression]].
> [!basicexample]
Suppose we draw $n$ [[independent random variables|independent]] [[random variable|random]] real numbers in the range $[0,\infty)$ from the (properly normalized) [[exponential random variable|exponential probability density]] $P(x)=\mu e ^{-\mu x}$, i.e., we have a [[random variable|random vector]] $\b X$ for which the $X_{i} \sim \mu e^{-\mu x}$, $i \in [n]$ are [[independent random variables|independent]].
\
The [[probability distribution|probability]] to draw values $X_{1}=x_{1},\dots,X_{n}=x_{n}$ when $\mu$ is viewed as a parameter is thus $\begin{align}
p(x_{1},\dots,x_{n};\mu)= & p(x_{1} ; \mu) \cdots p(x_{n} ; \mu) \\
= & \mu e ^{-\mu x_{1}} \cdots \mu e ^{-\mu x_{n}} \\
= & \mu^{n}e^{-\mu \sum_{i=1}^{n} x_{n}}.
\end{align}$
Hence $L(\mu)=\mu^{n}e^{-\mu \sum_{i=1}^{n}x_{n}}$.
''
Now, we want to compute $\arg \text{ max}_{\mu} L(\mu)= \text{ arg max}_{\mu} \ln L(\mu)=\text{ arg max}_{\mu} \mu^{n}e^{-\mu \sum_{i=1}^{n}x_{n}}.$
\
$L$ is [[smooth]]; hence we may apply [[C2 function is convex iff its Hessian is everywhere PSD]] to show that $L$ is [[convex function|convex]]. Compute $\begin{align}
\frac{d}{d\mu} [\ln L]= & \frac{d}{d\mu} \left[ n \ln \mu -\mu \left( \sum_{i=1}^{n} x_{i}\right) \right] \\
= & \frac{n}{\mu}-\sum_{i=1}^{n} x_{i}, \ \ (1)
\end{align}$
and then $\frac{d^{2}}{d\mu^{2}}[\ln L]= \frac{d}{d\mu} \left[ \frac{n}{\mu}-\sum_{i=1}^{n}x_{i} \right]=-\frac{n}{\mu^{2}}.$
$\mu^{2}>0$ for all $\mu$, hence $-\frac{n}{\mu^{2}} \leq 0$ for all $\mu$ and we conclude that $L$ is [[convex function|concave (down)]]. We thus know that [[characterization of extrema for differentiable convex functions|any critical point of L is a global maximizer]]. We compute a [[critical point]] of $L$ via setting $(1)$ equal to $0$; this yields the maximum likelihood estimate $\mu=\frac{n}{\sum_{i=1}^{n}x_{i}}$.
----
####
-----
#### References
> [!backlink]
> ```dataview
TABLE rows.file.link as "Further Reading"
FROM [[]]
FLATTEN file.tags
GROUP BY file.tags as Tag
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```