variance - Jacob Hume

---- > [!definition] Definition. ([[variance]]) > > Suppose $(\Omega, \mathcal{F}, \mathbb{P})$ is a [[probability|probability space]] [[Lp-norm|and]] $X \in \mathcal{L}^{2}(\mathbb{P})$ is a [[random variable]]. > - The **variance** of $X$ is $\operatorname{var}X:=\mathbb{E}[(X-\mathbb{E}X)^{2}].$ > - The **standard deviation** of $X$ is $\sigma(X)= \sqrt{ \operatorname{var }X }.$ > (Taking the square root is useful because doing so recovers the units of $X$ if it has any.) > > [!equivalence] > One can compute[^2] $\sigma^{2}(X)=\operatorname{var }X=\mathbb{E}[X^{2}]-(\mathbb{E}X)^{2}.$ ^equivalence [^2]: Here is the computation: $\begin{align} E\left[\big(X - E(X)\big)^2\right] =& E\big(X^2 - 2E(X)X + E^2(X)\big) \\ =& E(X^2) - 2E(X)E(X) + E^2(X) \\ =& E(X^2) - E^2(X). \end{align}$ > [!intuition] > > > Variance measures how 'focused' a [[random variable]] $X$ is: > - $\operatorname{var }X=0$ when $X$ is constant; > - Small variance implies $X$ is near-constant ([[expectation|expected]] to not stray far from its mean); > - Large variance implies $X$ is 'all over the place' ([[expectation|expected]] to deviate significantly from its mean). > [!basicproperties] > - $\operatorname{var }X \geq 0$ > - $\operatorname{var }X=0$ if and only if $X=\mathbb{E}X$ almost surely > - *(Variance of sum)* We have[^1] $\text{var}(X_1+ \dots+ X_n) = \sum_{i=1}^n \text{var}(X_i) + \sum_{i \neq j} \text{cov} (X_i,X_j).$ If our [[random variable|random variables]] are pairwise [[independent random variables|independent]], then all the [[covariance|covariances]] dissolve to yield $\text{var}(X_1+ \dots+X_n) = \text{var}(X_1)+ \dots + \text{var}(X_n).$ > - $\operatorname{var }(cX)=c^{2} \operatorname{var }(X)$ for $c \in \mathbb{R}$[^3] > - PGF thing below ^properties [^3]: Here is the computation: $\begin{align} \text{var}(cx) =& E(c^2X^2) - E^2(cX)\\ =& c^2E(X^2) - cE(X) \cdot cE(X) \\ =& c^2E(X^2) - c^2E^2(X) \\ =& c^2 \big(E(X^2) - E^2(X)\big)\\ =& c^2 \text{var}(X). \end{align}$ [^1]: We treat $i,j$ and $j,i$ as distinct pairs. E.g, we have $\text{cov}(X_1, X_2)$ and $\text{cov}(X_2,X_1)$— they are the same value, but treated as different terms. ---- #### #analysis/probability-statistics # Definition Let $X: \Omega \to \mathbb{R}$ be a [[random variable]] whose [[expectation]] $E(X)$ exists. We define the **variance** of $X$ as $\text{var}(X) = E\left[\big(X - E(X)\big)^2\right].$ # Equivalence We have $\text{var}(X) = E(X^2) - E^2(X)$. # Variance of a Sum We have that $\text{var}(X_1, \dots, X_n) = \sum_{i=1}^n \text{var}(X_i) + \sum_{i \neq j} \text{cov} (X_iX_j).$ Note that if our [[random variable]]s are pairwise [[independent random variables|independent]], then all the [[covariance]]s go away and we're left with $\text{var}(X_1, \dots,X_n) = \text{var}(X_1) \dots + \text{var}(X_n).$ ## Remark We treat $i,j$ and $j,i$ as distinct pairs. E.g, we have $\text{cov}(X_1, X_2)$ and $\text{cov}(X_2,X_1)$— they are the same value, but treated as different terms. # Intuition **Variance** measures how 'focused' a [[random variable]] is. - $0$ variance implies RV constant - Small variance implies RV near constant - Large variance implies RV is 'all over the place' # Properties - $\text{var}(X) \geq 0$. - $\text{var}(X)=0 \ \iff$ with probability $1$, $X$ is a constant. # Proof of Equivalence $\begin{align} E\left[\big(X - E(X)\big)^2\right] =& E\big(X^2 - 2E(X)X + E^2(X)\big) \\ =& E(X^2) - 2E(X)E(X) + E^2(X) \\ =& E(X^2) - E^2(X). \end{align}$ # Variance of a Scaled Random Variable Let $X$ be a [[random variable]] whose **variance** exists. Let $c \in \mathbb{R}$. We have $\text{var}(cX) = c^2\text{var}(X).$ This is simple to derive: $\begin{align} \text{var}(cx) =& E(c^2X^2) - E^2(cX)\\ =& c^2E(X^2) - cE(X) \cdot cE(X) \\ =& c^2E(X^2) - c^2E^2(X) \\ =& c^2 \big(E(X^2) - E^2(X)\big)\\ =& c^2 \text{var}(X). \end{align}$ # Example ## 1 - Dice ![[CleanShot 2022-11-05 at 21.34.15.jpg]] ## 2 - Waiting Time Let $X$ be a [[Poisson random variable]] with [[density]] $f(x) = \lambda e^{-\lambda x}$ for $x>0$ and $0$ otherwise. $\text{var}(x) = E(X^2) - E^2(X);$ $E(X^2) = \int_{0}^\infty x^2 \lambda e^{-\lambda x} \ dx;$ $E(X)= \int_{0}^\infty x \lambda e^{-\lambda x} \ dx.$ [[integral|Integrating]] and applying the formula, we obtain $\text{var}(X) = \frac{1}{\lambda^2}.$ ## 3 - Coin Tossing (Example of Variance of a Sum) Suppose we toss a fair coin $n$ times with $n$ 'reasonably large' (e.g., $n>5$). Let $X$ be the number of switches from $H$ to $T$. **What's $\text{var}(X)$*?* Let's use [[indicator random variable]] decomposition. For $1 \leq i \leq n$, let $X_i := \begin{cases} 1 &\text{ if the $i^{th}$ toss is H and the $(i+1)^{th}$ toss is T} \\ 0 & \text{otherwise}. \\ \end{cases} \ \ $ Then, $X = X_1 + \dots + X_n,$ and $\text{var}(X) = \sum_{i=1}^{n-1} \text{var}(X_i) + \sum_{i \neq j} \text{cov}(X_iX_j).$ $\text{var}(X_i) = E(X_i^2) - E^2(X_i) = E(X_i) - E^2(X_i)$. $P(X_i = 1)=\frac{1}{4}$, implying $E(X_i) = \frac{1}{4}$, so $\text{var}(X_i) = 1/4 - 1/16 = 3/16$. The interesting part: if $d(i,j) = |i-j| > 1$, then $\text{cov}(X_iX_j)=0$ because of [[independent|independence]]. So we need only to consider $\text{cov}(X_i, X_{i+1}) = E(X_i, X_{i+1}) - E(X_i)E(X_{i+1})$. $X_i X_{i+1}=0$ (see why?). $\text{cov}$ ends up being $-1/16$ (ask someone for notes) so, $\text{var}(X) = (n-1)(\frac{3}{16}) + (n-1 + n-1)(\frac{-1}{16}) = \frac{n+1}{16}$. ## 4 - [[the problem of letters#With Variance]] # Variance and [[Probability Generating Function]] $G_X''(s) = \sum_{k=2}^\infty k(k-1)P(X=k)s^k = \sum_{k=2}^\infty (k^2 - k) P(X=k)s^k = \sum_{k=2}^\infty k^2P(X=k)s^k - \sum_{k=2}^\infty kP(X=k)s^k \implies G_X''(1) = E(X^2) - E(X)$ ^ this tells us $\text{var}(X) = E(X^2) - E(X) = G_X''(1) + G_X'(1) - G_X'^2(1)$ ![[CleanShot 2022-11-11 at 20.27.44.jpg]] ## Example Toss a fair coin until we obtain two heads in a row, $HH$. In homework we computed the [[probability generating function|PGF]] here to be $G_X(s) = \frac{s^2}{4-2s-s^2} = s^2(4-2s-s^2)^{-1}.$ [[derivative|differentiating]] twice is painful, but it gives us everything we need! ![[CleanShot 2022-11-11 at 20.30.44.jpg]] Though, we could also do it by [[conditional expectation|conditioning]] on the first toss if we so chose. First, get $E(X):$ ![[CleanShot 2022-11-11 at 20.31.51.jpg]] By [[total expectation]], $E(X) = 6$. ![[CleanShot 2022-11-11 at 20.40.15.jpg]] #notFormatted ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```