Erdos-Renyi random graph model

---- > [!definition] Definition. ([[Erdos-Renyi random graph model]]) > The **Erdos-Renyi random graph model** refers to one of two (closely related) models for generating [[simple graph|simple]] [[random graph|random graphs]] of $n$ nodes. > ## Fixed number of edges $m$: $G(n,m)$ > In the $G(n,m)$ model, a [[network]] is chosen [[random variable|uniformly at random]] from the collection of all [[network]]s having $n$ labeled nodes and $m$ edges. > ## Edges exist with [[probability distribution|probability]] $p$: $G(n,p)$ > In the $G(n,p)$ model, a [[network]] is constructed by connecting labeled nodes randomly. Each edge is included in the [[network]] with [[probability distribution|probability]] $p$, [[independent|independently]] from every other edge. That is, the probability for generating each graph which has $n$ nodes and $m$ edges is $\overbrace{p^{m}}^{p\text{($m$ edges present)}} \times \overbrace{(1-p)^{{n \choose 2} - m}}^{p\text{(no other edges present)}}.$ > > [!basicproperties] The structure of $G(n,p)$. > 1. The [[expectation|expected]] number of edges in $G(n,p)$ is $n \choose 2$p$; > 2. The [[expected|expected]] [[mean degree]] is $c:=(n-1)p$; > 3. $G(n,p)$ has a [[binomial random variable|binomial]] [[degree distribution]] in general, which under sparsity assumptions becomes [[Poisson random variable|poisson]] with parameter $c$ in the [[function limit]] of large $n$; > 4. The [[clustering coefficient]] is (trivially) $C:= \frac{c}{n-1}$ [^1] > 5. [[characterization of giant component in Erdos-Renyi random graph model]] > 6. [[small components of Erdos-Renyi random graph model are trees]] > 7. [[size of small components in Erdos-Renyi random graph model]] (nuanced) > 8. [[the small-world effect in the Erdos-Renyi random graph model]] > [!proof] Proof of Properties. > Define a [[random variable]] $X=\text{``number of edges"}$. We want $\mathbb{E}[X]$. > Let us compute the [[expectation|expected]] number of edges in $G(n,p)$. For each $i,j \in [n]$ define an [[indicator random variable]] $X_{ij}=\begin{cases}1 & \text{edge exists between nodes }i,j \\ 0 & \text{otherwise.}\end{cases}$. > Then $2X=\sum_{i \neq j}^{} X_{ij}$. Note that this sum has $n(n-1)$ terms. Using [[expectation|linearity of expectation]] we obtain $\begin{align} 2\mathbb{E}[X]= & \mathbb{E}\left[ \sum_{i \neq j}^{} X_{ij} \right] \\ = & \sum_{i \neq j}^{} \mathbb{E} [X_{ij}] \\ = & \sum_{i \neq j}^{} (1 \cdot p + 0 \cdot (1-p)) \\ = & n(n-1)p . \end{align}$ Hence $\mathbb{E}[X]={n \choose 2}p$. > It follows that the [[expectation|expected]] [[mean degree]] is $\frac{2 \cdot {n \choose 2}p}{n}=(n-1)p=: c$. > Let's look at the [[degree distribution]] of $G(n,p)$. Let $i$ be a [[random variable|random]] node; we want to find $p(i=k)$ for any $k < n$. We use [[independent random variables|independence]] heavily. The [[probability distribution|probability]] that $i$ is connected to a particular $k$ out of $n-1$ nodes and none of the remainder is $p^{k}(1-p)^{n-k-1}$. Now the probability that $i$ is connected to *any* $k$ nodes is given by the disjoint specialization of [[the exclusion rule]]: $\begin{align} p(i\text{ is connected to set A of $k$ nodes OR set B of $k$ nodes or...})\\ = \sum_{{n-1 \choose k}\text{ options }}^{} p^{k}(1-p)^{n-k-1}\\ = {n-1 \choose k} p^{k}(1-p)^{n-k-1}. \ \end{align}$ > To see the relationship with [[Poisson random variable|Poisson distribution]], suppose $G$ is sparse so that $p=\frac{c}{n-1}$ becomes vanishingly small as $n$ becomes large, allowing us to write $\begin{align} \ln[(1-p)^{n-1-k}]= & (n-1-k)\ln \left( 1- \frac{c}{n-1} \right) \\ \approx & -(n-1-k) \frac{c}{n-1} \\ \approx & -c, \end{align}$ where we have [[taylor series|Taylor-expanded]] $\ln$ and $\approx \to =$ as $n \to \infty$ with $k$ fixed. Now we exponentiate both sides to find that $(1-p)^{n-1-k}=e^{-c}$ in the large-$n$ [[function limit]]. Also for large $n$ we have ${n-1 \choose k} = \frac{(n-1)!}{k! (n-1-k)!} \approx \frac{(n-1)^{k}}{k!},$ and thus our binomial expression above becomes $p_{k}=\frac{(n-1)^{k}}{k!}p^{k}e^{-c}=\frac{(n-1)^{k}}{k!} \left( \frac{c}{n-1} \right)^{k} e ^{-c}=e^{-c} \frac{c^{k}}{k!}.$ This is the [[Poisson random variable|poisson distribution]]. > Let's calculate the [[clustering coefficient]] $C$ of the [[network]], which is the uniform probability that a given two neighbors of a node $i$ are themselves connected. Since all connections in $G(n,p)$ are [[independent random variables|independently made]], this is merely the probability $p=c / (n-1)$ that any two given nodes are connected. [^1]: This is perhaps one of the most unrealistic consequences of the assumptions $G(n,p)$ makes. Real networks often have extremely high [[clustering coefficient]]s, while this value of $C$ tends to $0$ as $n\to \infty$! > [!basicexample] Examples and Exercises. >- (Newman 11.1) [[(no) clustering in Erdos-Renyi random graph]] >- (Newman 11.2) [[size of small components in Erdos-Renyi random graph model|small component degree distribution in ERRG]] or [[fraction of nodes in small component of given degree in Erdos-Renyi random graph model]] (same thing) > - (Newman 11.4) [[degree distribution within giant component of Erdos-Renyi random graph]] >- (Newman 11.5) [[giant bicomponent in Erdos-Renyi random graph]] >- (Newman 11.6) [[when small components imply existence of giant component]] >- (Newman 11.7) [[directed Erdos-Renyi random graph]] >- (Newman 11.8) [[the cascade model]] > - [[random graph with nontrivial clustering coefficient]] ---- #### ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```