maximum likelihood estimation

---- > [!theorem] Method. ([[maximum likelihood estimation]]) > **Maximum likelihood estimation** is a method for estimating the parameters of an assumed [[probability distribution]], given a [[training data|dataset]] of observations. > \ > This is achieved by [[global extrema|maximizing]] a [[likelihood|likelihood function]] so that, under the assumed statistical model, the [[training data|observed data]] is most probable. For a concrete example, see [[maximum likelihood estimation for logistic regression]]. > [!basicexample] Suppose we draw $n$ [[independent random variables|independent]] [[random variable|random]] real numbers in the range $[0,\infty)$ from the (properly normalized) [[exponential random variable|exponential probability density]] $P(x)=\mu e ^{-\mu x}$, i.e., we have a [[random variable|random vector]] $\b X$ for which the $X_{i} \sim \mu e^{-\mu x}$, $i \in [n]$ are [[independent random variables|independent]]. \ The [[probability distribution|probability]] to draw values $X_{1}=x_{1},\dots,X_{n}=x_{n}$ when $\mu$ is viewed as a parameter is thus $\begin{align} p(x_{1},\dots,x_{n};\mu)= & p(x_{1} ; \mu) \cdots p(x_{n} ; \mu) \\ = & \mu e ^{-\mu x_{1}} \cdots \mu e ^{-\mu x_{n}} \\ = & \mu^{n}e^{-\mu \sum_{i=1}^{n} x_{n}}. \end{align}$ Hence $L(\mu)=\mu^{n}e^{-\mu \sum_{i=1}^{n}x_{n}}$. '' Now, we want to compute $\arg \text{ max}_{\mu} L(\mu)= \text{ arg max}_{\mu} \ln L(\mu)=\text{ arg max}_{\mu} \mu^{n}e^{-\mu \sum_{i=1}^{n}x_{n}}.$ \ $L$ is [[smooth]]; hence we may apply [[C2 function is convex iff its Hessian is everywhere PSD]] to show that $L$ is [[convex function|convex]]. Compute $\begin{align} \frac{d}{d\mu} [\ln L]= & \frac{d}{d\mu} \left[ n \ln \mu -\mu \left( \sum_{i=1}^{n} x_{i}\right) \right] \\ = & \frac{n}{\mu}-\sum_{i=1}^{n} x_{i}, \ \ (1) \end{align}$ and then $\frac{d^{2}}{d\mu^{2}}[\ln L]= \frac{d}{d\mu} \left[ \frac{n}{\mu}-\sum_{i=1}^{n}x_{i} \right]=-\frac{n}{\mu^{2}}.$ $\mu^{2}>0$ for all $\mu$, hence $-\frac{n}{\mu^{2}} \leq 0$ for all $\mu$ and we conclude that $L$ is [[convex function|concave (down)]]. We thus know that [[characterization of extrema for differentiable convex functions|any critical point of L is a global maximizer]]. We compute a [[critical point]] of $L$ via setting $(1)$ equal to $0$; this yields the maximum likelihood estimate $\mu=\frac{n}{\sum_{i=1}^{n}x_{i}}$. ---- #### ----- #### References > [!backlink] > ```dataview TABLE rows.file.link as "Further Reading" FROM [[]] FLATTEN file.tags GROUP BY file.tags as Tag > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```