random graph with nontrivial clustering coefficient

----- 3_cmplxsys535_hw6 > [!proposition] Proposition. ([[bernoulli random variable]]) > Assume $n$ is large. The [[clustering coefficient]] for the [[Erdos-Renyi random graph model]] tends to $0$ for large $n$. Here we construct a different [[random graph]] model with nontrivial [[clustering coefficient]] as follows: take $n$ nodes and go through each distinct trio of three nodes, of which there are ${n \choose 3}$, and with [[independent random variables|independent]] [[probability]] $p$ connect the members of the trio together using three edges to form a triangle, where $p=\frac{c}{{n-1} \choose 2}$ with $c$ a constant. In particular, we have $C=\frac{1}{2c+1}.$ > [!proof]- Proof. ([[bernoulli random variable]]) > **Part a.** Show that the [[mean degree]] of a node $i$ in this model [[network]] is $2c$. > >**Solution.** Let $X$ be a [[random variable]] corresponding to the number of triangles that $i$ participates in. For nodes $j,k$ distinct from $i$ and each other define $X_{jk}:= \begin{cases} 1 & \text{ if } i,j,k \text{ form a triangle} \\ 0 & \text{ if not.} \end{cases}$ Note that $X=\sum_{{n-1 \choose 2} \text{ terms}}^{} X_{jk}$, so that the [[expectation|expected number of triangles]] in which $i$ participates is $\mathbb{E}[X]=\sum_{{n-1 \choose 2} \text{ terms}}^{} \overbrace{\mathbb{E}[X_{jk}]}^{\textcolor{Thistle}{p}}= {n-1 \choose 2} \textcolor{Thistle}{\frac{c}{{n-1 \choose 2}}}= c.$ Now, each triangle contributes $2$ edges attached to $k_{i}$ (under suitable assumptions regarding non-simplicity and/or $n$), so the [[mean degree]] is $2c$. > **Part b.** Show that the [[degree distribution]] is $p_{k}=\begin{cases}e^{-c}c^{k / 2} & k \text{ even} \\ 0 & k \text{ odd.}\end{cases}$ > Let $T$ be a [[random variable]] that tracks the number of triangles in which the node $i$ participates. The probability $i$ is participates in (at least) a specific collection of $t$ triangles is $p^{t}$ and the probability that it participates in no other $t$ triangles is $(1-p)^{{n -1 \choose 2}-t}$. Hence the probability $i$ participates in precisely $t$ triangles is $p^{t}(1-p)^{{n-1 \choose 2}-t}$. Now the probability that $i$ participates in *any* $t$ triangles is $\sum_{ {{n \choose 3} \choose t} \text{disjoint events}}^{}p^{t}(1-p)^{{n-1 \choose 2} - t }={ {{n-1 \choose 2} \choose t}}p^{t}(1-p)^{{n-1 \choose 2} - t} = \mathbb{P}(T == t).$. > We see that $T$ obeys a [[binomial random variable|binomial distribution]], with $\mathbb{E}[T]={n-1 \choose 2}p=c$. Thus in the [[Poisson random variable|Poisson limit]] of $n \to \infty$ we obtain $\mathbb{P}(T==t)= e ^{-\mathbb{E}[T]} \frac{\mathbb{E}[T]^{t}}{t!}=\frac{e^{-c}c^{t}}{t!}.$ In the [[function limit]] of large $n$, each triangle in which $i$ participates provides 2 edges to its [[degree]], enforcing that $k$ is even (so, $p_{k}=0$ if $k$ odd) with $p_{k}=\mathbb{P}(k_{i}==k)=e^{-c} \frac{c^{k/2}}{(k/2)!}.$ **Part c.** Show that the [[clustering coefficient]] is $C=\frac{1}{2c+1}$. > We want to find $C=\frac{\text{number of triangles} \times 3}{\text{number of connected triples}}.$ Define $\begin{align} Z = & \text{``number of triangles in the whole network"} \\ Z_{i} = & \text{``number of triangles anchored at node $iquot;}, \end{align}$ note that $3Z=\sum_{i=1}^{n}Z_{i}$. We've shown $\mathbb{E}[Z_{i}]=c$; hence $\mathbb{E}[Z]=\frac{nc}{3}$. > We saw before that the expected number of triangles in which node $i$ participates is $c$. Hence the expected number of triangles in the whole [[network]] is $\frac{nc}{3}$, > Define the [[random variable]]s $\begin{align} Y = & \text{``number of connected triples in the whole network"} \\ Y_{t}= & \text{``number of connected triples from nodes anchoring $t$ triangles"}. \end{align}$ Note that $Y=\sum_{t \in \mathbb{Z}_{\geq 0}}^{}Y_{t}$. > If a single node $i$ anchors $t$ triangles, then $i$ contributes ${2t \choose 2}=t(2t-1)$ connected triples to the [[network]]. So $Y_{t}=t(2t-1)X_{t}$, where $X_{t}$ is the number of nodes anchoring $t$ triangles. > It is easy to see using [[indicator random variable|indicators]] that $\mathbb{E}[X_{t}]=n \mathbb{P}(T==t)$. $t(2t-1)$ is constant once $t$ is fixed; thus using [[expectation|linearity of expectation]]: $\mathbb{E}[Y_{t}]=t(2t-1) \mathbb{E}[X_{t}] = t(2t-1) n\mathbb{P}(T==t).$ It follows that $\mathbb{E}[Y]=\sum_{t \in \mathbb{Z}_{\geq 0}}^{} \mathbb{E}[Y_{t}]=\sum_{t \in \mathbb{Z}_{\geq 0}}^{}t(2t-1)n \mathbb{P}(T==t)= \sum_{t \in \mathbb{Z}_{\geq 0}}^{} \frac{t(2t-1)ne^{-c}c^{t}}{t!},$ which equals $c(2c+1)n$ according to WolframAlpha. > The [[clustering coefficient]] is then $C=\frac{\text{number of triangles} \times 3}{\text{number of connected triples}}=\frac{\mathbb{E}[Z] \times 3}{\mathbb{E}[Y]}=\frac{3 \times nc / 3 }{c(2c+1)n}=\frac{1}{2c+1}$ as desired. > **Part d.** Show that as a fraction of the [[network]] size, the [[expectation|expected size]] $S$ of the [[giant component]], if there is one, satisfies $S=1-e^{-cS(2-S)}$. > We proceed analogously as in the [[characterization of giant component in Erdos-Renyi random graph model]]. Suppose a [[giant component]] exists, meaning that as $n$ grows, the average size of the largest [[component of a graph|component]] grows with it, and hence the largest component occupies a constant fraction of the whole [[network]]. We will calculate said fraction in the [[function limit]] $n\to \infty$. > Let $i$ be any node in the [[network]] and denote by $u$ the probability that (uniformly chosen) $i$ is not in the [[giant component]]. For any pair of nodes $j,k$ distinct from $i$ and each other in the [[network]], we have that either >- (event 1) $j,k,i$ do not form a triangle > > - occurs with [[probability]] $1-p$. >- (event 2) $j,k,i$ *do* form a triangle but neither $j$ nor $k$ is in the [[giant component]] > > - $j,k,i$ form a triangle with [[probability]] $p$ > > - $j,k$ are both not in the [[giant component]] with [[probability]] $u^{2}$ > > - Hence (event 2) occurs with probability $pu^{2}$ > (event 1) and (event 2) are disjoint, [[the exclusion rule|so we conclude]] the probability that $i$ is not connected to the [[giant component]] via $j$ and $k$ is $1-p+pu^{2}$, and therefore the probability $u$ that $i$ is not in the [[giant component]] (not connected via *any* of the ${n-1 \choose 2}$ pairs $j,k$) is $u=(1-p + p u^{2})^{{n-1 \choose 2}}.$ Now substitute $p=\frac{c}{n-1 \choose 2}$ to get $u=(1- \frac{c}{{n-1 \choose 2}}+ \frac{c}{n-1 \choose 2}u^{2})^{{n-1 \choose 2}}.$ Now we apply $\ln$ to both sides to obtain $\begin{align}\ln u = & {n-1 \choose 2} \ln \big( 1 - \frac{c}{{n-1 \choose 2}} + \frac{c}{{n-1 \choose 2}} u ^{2} \big) \\ =& {n-1 \choose 2} \cdot ( -\frac{c}{{n-1 \choose 2}} + \frac{c}{{n-1 \choose 2}}u ^{2} \text{ Taylor approximation $\ln(1+x) \approx x$ }\\ =& -c +c u ^{2}. \end{align}$ Now exponentiation yields $u = e ^{-c(1-u^{2})} \text{ in the large-}n \text{ limit.}$ Now $S=1-u$ and therefore $\begin{align} S= & 1- e ^{-c(1-(1-S)^{2})} \\ = & 1- e ^{-c(-2S+S^{2})} \\ = & 1- e ^{-cS(2-S)}.\end{align}$ > **Part e.** At what value of $c$ does a [[giant component]] first appear? > We showed in [[characterization of giant component in Erdos-Renyi random graph model]] that we can check for a [[giant component]] phase transition by taking derivative of both sides in **d**: $1 = -(2cS-2c) e ^{-cS(2-S)},$ setting $S=0$: $1=2c$ and solving for $c$: $c=\frac{1}{2}.$ (Does this trend continue for quadruplets, quintuplets, etc.?) > **Part f.** What is the value of the [[clustering coefficient]] when the [[giant component]] fills half of the [[network]]? > Suppose $S=\frac{1}{2}$. Then $\frac{1}{2}= 1- e ^{-c(\frac{1}{2})(2-\frac{1}{2})}$, and solving for $c$ yields $c\approx 0.924$. Hence $C=\frac{1}{2c+1}\approx 0.351.$ ----- #### ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```