----
[^8]: What are the [[random variable|random variables]] $X_{1},X_{2}$ inducing such a law $\mathbb{P}_{(X_{1},X_{2})}=\mu \ltimes k$? They are the coordinate projections $X_{1}=\pi_{1}$, $X_{2}=\pi_{2}$ for the [[probability|probability space]] $(E\times E, \mathcal{E} \otimes \mathcal{E}, \mathbb{P})$, where $\mathbb{P}:=\mu \ltimes k$. Indeed, under such a definition$\mathbb{P}_{(X_{1}, X_{2})}(A \times B)=\mathbb{P}(X _{1} \in A, X_{2} \in B)=\mathbb{P}(A \times B)=(\mu \ltimes k)(A \times B).$
> [!definition] Definition. ([[joint probability distribution]])
> Let $X_{1},\dots,X_{n}:(\Omega, \mathcal{F}) \to (E_{i}, \mathcal{E}_{i})$ be [[random variable|random variables]] on a [[probability|probability space]] $(\Omega, \mathcal{F}, \mathbb{P})$. Their **random vector** is the map $\boldsymbol X:=(X_{1}, \dots , X_{n}) : (\Omega, \mathcal{F}) \to \left( \prod_{i=1}^{n}E_{i}, \bigotimes_{i=1}^{n} \mathcal{E}_{i} \right),$
which is [[measurable function|measurable]] since each $X_{i}$ is. The [[probability distribution|distribution]] $\mathbb{P}_{\boldsymbol X}$ of $\boldsymbol X$ is called the **joint distribution** or **joint law** of $X_{1},\dots,X_{n}$.
>
We call the individual distribution $\mathbb{P}_{X_{i}}=\mathbb{P}_{\boldsymbol X} \circ \pi_{i} ^{-1}$ the **$i$th marginal distribution** of $\boldsymbol X$.
>
>The canonical example of a joint law is the [[transition kernel|semidirect product]] $\mu \ltimes k$ of a [[probability|probability measure]] $\mu$ (a 'marginal') and a [[transition kernel|probability kernel]] $k$ (a 'conditional').[^8] Indeed, for *[[good measurable space|good]]* $E$ every joint law factorizes into such a form. This factorization is not unique (e.g. can always look at first vs. second marginal). To elaborate: Suppose that $(E, \mathcal{E})$ is a [[good measurable space]], so that the [[joint probability distribution|joint law]] $\mathbb{P}_{(X_{1}, X_{2})}$ of [[random variable|random variables]] $X_{1},X_{2}:\Omega \to E$ [[transition kernel|factors]] as $\mathbb{P}_{(X_{1},X_{2})}=\mathbb{P}_{{2}} \ltimes k_{1 |2}$ and as $\mathbb{P}_{(X_{1},X_{2})}=\mathbb{P}_{{1}} \ltimes k_{2 | 1}$ for some [[transition kernel|probability kernel]] $k_{2}: E \times \mathcal{E} \to [0,1]$. Then for all $A,B \in \mathcal{E}$ $\mathbb{P}_{(X_{1}, X_{2})}(A \times B)= \int _{A} k_{2 |1}(x, B)\, d\mathbb{P}_{1}(x)$ and
> $\mathbb{P}_{(X_{1}, X_{2})}(A \times B)= \int _{B} k_{1 | 2}(y, A)\, d\mathbb{P}_{2}(y). $
> Note that in this case the marginals may be realized by taking the [[integral]] over all of $E$, i.e.[^3] $\mathbb{P}_{1}(A)= \int k_{1 |2}(y ,A) \, d\mathbb{P}_{2}(y)=\big(* \xrightarrow{\mathbb{P}_{2}}(E, \mathcal{E}) \xrightarrow{k_{1 |2}}(E, \mathcal{E})\big) \text{ in } \mathsf{Stoch}. $
> and $\mathbb{P}_{2}(B)= \int k_{2 |1} (x, B) \, d\mathbb{P}_{1} (x)=\big( * \xrightarrow{\mathbb{P}_{1}}(E, \mathcal{E}) \xrightarrow{k_{2 |1}}(E, \mathcal{E})\big) \text{ in } \mathsf{Stoch}.$
> This recovers the usual "average/sum away the variable" interpretation of marginals.
[^3]: Indeed, plug in $E$ for $A$ resp. $B$, [[pushforward measure|e.g.]]$\mathbb{P}_{1}(A)=\mathbb{P}_{(X_{1}, X_{2})}\big( \pi ^{-1} (A)\big)=\mathbb{P}_{(X_{1},X_{2})}(A \times E)=\int k_{1 | 2}(x, A) \, d\mathbb{P}_{2}(x) .$
Then (either here or in [[transition kernel|probability kernel]] or in [[conditional probability]]) we want to show $k_{2 | 1}=\mathbb{P}\big(X_{2}\in B |\sigma(X_{1})\big)$ [[almost-everywhere|a.s.]] and similar for $k_{1 |2}$. Maybe not too difficult.
Slogan:
$\text{joint law}=\text{(marginal)} \ltimes (\text{kernel/conditional law})$
[[sampling from a probability distribution|Sample by]] "first sample from marginal, then from conditional".
> [!definition] Definition. ([[joint probability distribution]] -- very old)
>
loosely speaking, the definition of the **joint probability distribution** of [[random variable]]s $X_{1}, \dots, X_{n}$ is ambiguous. It usually refers to the notion of "probability that $X_{1},\dots,X_{n}$ take on the value $x_{1},\dots,x_{n}
quot;; often written as $P(X_{1}=x_{1} \cap \dots \cap X_{n}=x_{n}).$
This definition will probably be refined in the future. Also see [[joint cumulative distribution function]].
> [!basicexample]
>
For an example in [[classifier|classification problems]], see [[marginal and conditional distributions characterize the joint distribution in classification settings]].
----
####
----
----
----
####
----
#### References
> [!backlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM [[]]
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```
> [!frontlink]
> ```dataview
> TABLE rows.file.link as "Further Reading"
> FROM outgoing([[]])
> FLATTEN file.tags as Tag
> WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom"
> GROUP BY Tag
> ```