----
> [!definition] Definition. ([[variance]])
>
> Suppose $(\Omega, \mathcal{F}, \mathbb{P})$ is a [[probability|probability space]] [[Lp-norm|and]] $X \in \mathcal{L}^{2}(\mathbb{P})$ is a [[random variable]].
> - The **variance** of $X$ is $\operatorname{var}X:=\mathbb{E}[(X-\mathbb{E}X)^{2}].$
> - The **standard deviation** of $X$ is $\sigma(X)= \sqrt{ \operatorname{var }X }.$
> (Taking the square root is useful because doing so recovers the units of $X$ if it has any.)
>
> [!equivalence]
> One can compute[^2] $\sigma^{2}(X)=\operatorname{var }X=\mathbb{E}[X^{2}]-(\mathbb{E}X)^{2}.$
^equivalence
[^2]: Here is the computation: $\begin{align}
E\left[\big(X - E(X)\big)^2\right] =& E\big(X^2 - 2E(X)X + E^2(X)\big) \\ =& E(X^2) - 2E(X)E(X) + E^2(X) \\ =& E(X^2) - E^2(X). \end{align}$
> [!intuition]
>
>
> Variance measures how 'focused' a [[random variable]] $X$ is:
> - $\operatorname{var }X=0$ when $X$ is constant;
> - Small variance implies $X$ is near-constant ([[expectation|expected]] to not stray far from its mean);
> - Large variance implies $X$ is 'all over the place' ([[expectation|expected]] to deviate significantly from its mean).
> [!basicproperties]
> - $\operatorname{var }X \geq 0$
> - $\operatorname{var }X=0$ if and only if $X=\mathbb{E}X$ almost surely
> - *(Variance of sum)* We have[^1] $\text{var}(X_1+ \dots+ X_n) = \sum_{i=1}^n \text{var}(X_i) + \sum_{i \neq j} \text{cov} (X_i,X_j).$
If our [[random variable|random variables]] are pairwise [[independent random variables|independent]], then all the [[covariance|covariances]] dissolve to yield $\text{var}(X_1+ \dots+X_n) = \text{var}(X_1)+ \dots + \text{var}(X_n).$
> - $\operatorname{var }(cX)=c^{2} \operatorname{var }(X)$ for $c \in \mathbb{R}$[^3]
> - PGF thing below
^properties
[^3]: Here is the computation: $\begin{align}
\text{var}(cx) =& E(c^2X^2) - E^2(cX)\\ =& c^2E(X^2) - cE(X) \cdot cE(X) \\ =& c^2E(X^2) - c^2E^2(X) \\ =& c^2 \big(E(X^2) - E^2(X)\big)\\ =& c^2 \text{var}(X). \end{align}$
[^1]: We treat $i,j$ and $j,i$ as distinct pairs. E.g, we have $\text{cov}(X_1, X_2)$ and $\text{cov}(X_2,X_1)$— they are the same value, but treated as different terms.
----
####
#analysis/probability-statistics
# Definition
Let $X: \Omega \to \mathbb{R}$ be a [[random variable]] whose [[expectation]] $E(X)$ exists. We define the **variance** of $X$ as $\text{var}(X) = E\left[\big(X - E(X)\big)^2\right].$
# Equivalence
We have $\text{var}(X) = E(X^2) - E^2(X)$.
# Variance of a Sum
We have that $\text{var}(X_1, \dots, X_n) = \sum_{i=1}^n \text{var}(X_i) + \sum_{i \neq j} \text{cov} (X_iX_j).$
Note that if our [[random variable]]s are pairwise [[independent random variables|independent]], then all the [[covariance]]s go away and we're left with $\text{var}(X_1, \dots,X_n) = \text{var}(X_1) \dots + \text{var}(X_n).$
## Remark
We treat $i,j$ and $j,i$ as distinct pairs. E.g, we have $\text{cov}(X_1, X_2)$ and $\text{cov}(X_2,X_1)$— they are the same value, but treated as different terms.
# Intuition
**Variance** measures how 'focused' a [[random variable]] is.
- $0$ variance implies RV constant
- Small variance implies RV near constant
- Large variance implies RV is 'all over the place'
# Properties
- $\text{var}(X) \geq 0$.
- $\text{var}(X)=0 \ \iff$ with probability $1$, $X$ is a constant.
# Proof of Equivalence
$\begin{align}
E\left[\big(X - E(X)\big)^2\right] =& E\big(X^2 - 2E(X)X + E^2(X)\big) \\ =& E(X^2) - 2E(X)E(X) + E^2(X) \\ =& E(X^2) - E^2(X). \end{align}$
# Variance of a Scaled Random Variable
Let $X$ be a [[random variable]] whose **variance** exists. Let $c \in \mathbb{R}$. We have $\text{var}(cX) = c^2\text{var}(X).$
This is simple to derive: $\begin{align}
\text{var}(cx) =& E(c^2X^2) - E^2(cX)\\ =& c^2E(X^2) - cE(X) \cdot cE(X) \\ =& c^2E(X^2) - c^2E^2(X) \\ =& c^2 \big(E(X^2) - E^2(X)\big)\\ =& c^2 \text{var}(X). \end{align}$
# Example
## 1 - Dice
![[CleanShot 2022-11-05 at 21.34.15.jpg]]
## 2 - Waiting Time
Let $X$ be a [[Poisson random variable]] with [[density]] $f(x) = \lambda e^{-\lambda x}$ for $x>0$ and $0$ otherwise.
$\text{var}(x) = E(X^2) - E^2(X);$
$E(X^2) = \int_{0}^\infty x^2 \lambda e^{-\lambda x} \ dx;$
$E(X)= \int_{0}^\infty x \lambda e^{-\lambda x} \ dx.$
[[integral|Integrating]] and applying the formula, we obtain $\text{var}(X) = \frac{1}{\lambda^2}.$
## 3 - Coin Tossing (Example of Variance of a Sum)
Suppose we toss a fair coin $n$ times with $n$ 'reasonably large' (e.g., $n>5$). Let $X$ be the number of switches from $H$ to $T$. **What's $\text{var}(X)$*?*
Let's use [[indicator random variable]] decomposition. For $1 \leq i \leq n$, let $X_i := \begin{cases} 1 &\text{ if the $i^{th}$ toss is H and the $(i+1)^{th}$ toss is T} \\ 0 & \text{otherwise}. \\
\end{cases} \ \ $
Then, $X = X_1 + \dots + X_n,$
and $\text{var}(X) = \sum_{i=1}^{n-1} \text{var}(X_i) + \sum_{i \neq j} \text{cov}(X_iX_j).$
$\text{var}(X_i) = E(X_i^2) - E^2(X_i) = E(X_i) - E^2(X_i)$. $P(X_i = 1)=\frac{1}{4}$, implying $E(X_i) = \frac{1}{4}$, so $\text{var}(X_i) = 1/4 - 1/16 = 3/16$.
The interesting part: if $d(i,j) = |i-j| > 1$, then $\text{cov}(X_iX_j)=0$ because of [[independent|independence]]. So we need only to consider $\text{cov}(X_i, X_{i+1}) = E(X_i, X_{i+1}) - E(X_i)E(X_{i+1})$.
$X_i X_{i+1}=0$ (see why?).
$\text{cov}$ ends up being $-1/16$ (ask someone for notes) so, $\text{var}(X) = (n-1)(\frac{3}{16}) + (n-1 + n-1)(\frac{-1}{16}) = \frac{n+1}{16}$.
## 4 - [[the problem of letters#With Variance]]
# Variance and [[Probability Generating Function]]
$G_X''(s) = \sum_{k=2}^\infty k(k-1)P(X=k)s^k = \sum_{k=2}^\infty (k^2 - k) P(X=k)s^k = \sum_{k=2}^\infty k^2P(X=k)s^k - \sum_{k=2}^\infty kP(X=k)s^k \implies G_X''(1) = E(X^2) - E(X)$ ^ this tells us $\text{var}(X) = E(X^2) - E(X) = G_X''(1) + G_X'(1) - G_X'^2(1)$
![[CleanShot 2022-11-11 at 20.27.44.jpg]]
## Example
Toss a fair coin until we obtain two heads in a row, $HH$. In homework we computed the [[probability generating function|PGF]] here to be $G_X(s) = \frac{s^2}{4-2s-s^2} = s^2(4-2s-s^2)^{-1}.$
[[derivative|differentiating]] twice is painful, but it gives us everything we need! ![[CleanShot 2022-11-11 at 20.30.44.jpg]]
Though, we could also do it by [[conditional expectation|conditioning]] on the first toss if we so chose. First, get $E(X):$ ![[CleanShot 2022-11-11 at 20.31.51.jpg]]
By [[total expectation]], $E(X) = 6$.
![[CleanShot 2022-11-11 at 20.40.15.jpg]]
#notFormatted
----
#### References
> [!backlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM [[]]
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```