configuration model - Jacob Hume

---- > [!definition] Definition. ([[configuration model]]) > The **configuration model** is a method for generating a [[random graph]] with $n$ nodes and $m$ edges, given a [[degree|degree sequence]] $k_{1},\dots,k_{n}$ and $m$ even. The model is generated by the following algorithm: > 1. Give each node $i$ a number $k_{i}$ of **stubs** of edges (also called **half-edges**). There are $\sum_{i=1}^{n}k_{i}=2m$ stubs in total. > 2. Choose two stubs uniformly at random and connect them together to form an edge. Then choose another pair from the remaining $2m-2$ stubs, connect those, and so on until all stubs are used up. > ![[CleanShot 2023-10-31 at 22.50.59.jpg|300]] > > **Remark.** The possibility of multi-edges and self-edges is not excluded; however, their [[density of a network|density]] as a proportion of the [[network]]'s edges goes to $0$ as $n \to \infty$. > [!basicproperties] Property. (Crucial Property of the Configuration Model) > By construction, any stub in a [[configuration model]] [[network]] is matched to any other with equal [[probability]]. ^89d7a2 > [!basicproperties] The Structure of The Configuration Model. Assume $n,m$ large. > 1. Nodes $i,j$ are connected with [[probability]] $p_{ij}= \frac{k_{i}k_{j}}{2m}$. > 2. The [[expectation|expected]] number of multi-edges remains a constant $\frac{1}{2} [\frac{\langle k ^{2} \rangle - \langle k \rangle }{\langle k \rangle } ] ^{2}$ as $n\to \infty$ (hence the [[density of a network|density]] of multi-edges vanishes as $n \to \infty$). > 3. The [[expectation|expected]] number of common neighbors node $i,j$ share is $p_{ij} \frac{\langle k^{2} \rangle - \langle k \rangle }{\langle k \rangle }.$ > 4. [[the friendship paradox]] > 5. The [[excess degree distribution]] of the [[configuration model]] is $q_{k}=\frac{(k+1)p_{k+1}}{\langle k \rangle}.$ The average/[[expectation|expected]] [[excess degree]] is $\frac{\langle k^{2} \rangle - \langle k \rangle}{\langle k \rangle}=\frac{c_{2}}{c_{1}}$, where $c_{1}:=\langle k \rangle$ is the average number of neighbors ([[mean degree]]) and $c_{2}$ is the average number of second neighbors. > 6. The [[clustering coefficient]] of the [[configuration model]] is $\frac{1}{n} \frac{[\langle k \rangle^{2} - \langle k \rangle ]^{2}}{\langle k \rangle^{3} }$, which (unrealistically) vanishes as $n \to \infty$ but can be quite large due to the $\langle k \rangle^{2}$ factor. > 7. [[the configuration model is locally tree-like]] > 8. The [[expectation|expected]] number of second neighbors of a node in the [[configuration model]] is $c_{2}:= \langle k^{2} \rangle - \langle k \rangle$. > 9. The [[expectation|expected]] number of $d^{th}$ neighbors is $c_{d}=\left( \frac{c_{2}}{c_{1}} \right)^{d-1}c_{1}.$ > 10. [[characterization of giant component in the configuration model]] > 11. [[size of small components in the configuration model]] > 12. The [[probability generating function]]s $g_{0}(u)=\sum_{k}p_{k}u^{k}$ and $g_{1}(u)=\sum_{k}q_{k}u^{k}$ of the [[degree distribution]] and [[excess degree distribution]] of the [[configuration model]] are related by $g_{1}(u)=\frac{g_{0}'(u)}{g_{0}'(1)}.$ ^443b4f > [!proof] Proof of Properties. > 1. Consider nodes $i$ and $j$ with respective [[degree]] $k_{i}$ and $k_{j}$. The [[probability]] that a particular stub from $i$ connects to a particular stub from $j$ is $\frac{1}{2m-1}$. Thus the [[probability]] that said stub from $i$ connects to *any* stub from $j$ [[the exclusion rule|is]] $\sum_{\text{}k_{j} \text{ terms}} \frac{1}{2m-1}=\frac{k_{j}}{2m-1}$. Now the probability that *any* stub from $i$ connects to *any* stub from $j$ is $\sum_{k_{i}\text{ terms}} \frac{k_{j}}{2m-1}= \frac{k_{i}k_{j}}{2m-1} \approx \frac{k_{i}k_{j}}{2m}$ (we don't need to subtract off event intersections because we don't care about double-edges). >2. $1$ says the [[probability]] of an edge between $i$ and $j$ is $\frac{k_{i}k_{j}}{2m}$. The [[probability]] of a second edge is then $\frac{(k_{i}-1)(k_{j}-1)}{2m}$. The probability of both events (i.e., a multi-edge) is $\frac{k_{i}k_{j}(k_{i}-1)(k_{j}-1)}{4m^{2}}$. Now letting $X=\text{number of multi-edges in network}$ and $X_{ij}=\mathbb{1}_{i,j \text{ have multi-edge}}$ such that $2X=\sum_{i,j}X_{ij}$, we get $\begin{align} \mathbb{E}[X]:= & \frac{1}{2} \sum_{i,j} \mathbb{E}[X_{ij}] \\ = & \sum_{i,j} \frac{k_{i}k_{j}(k_{i}-1)(k_{j}-1)}{2 (2m)^{2}} \\ = & \sum_{i,j} \frac{k_{i}k_{j}(k_{i}-1)(k_{j}-1)}{2 \langle k \rangle ^{2} n^{2} } \\ = & \frac{1}{2 \langle k \rangle^{2} n ^{2} } \sum_{i} (k_{i}^{2}-k_{i}) \sum_{j} (k_{j}^{2}-k_{j}) \\ = & \frac{1}{2 \langle k \rangle^{2} } \left( \frac{1}{n} \sum_{i} k_{i}^{2} - \frac{1}{n} \sum_{i}k_{i} \right) \left( \frac{1}{n} \sum_{j}k_{j}^{2} - \frac{1}{n } \sum_{j}k_{j}\right) \\ = & \frac{1}{2} [\frac{\langle k ^{2} \rangle - \langle k \rangle }{\langle k \rangle } ] ^{2} \end{align}$ which is constant as $n \to \infty$ as desired and $\langle k \rangle,\langle k^{2} \rangle$ denote the first and second [[moment of a measurable function|moments]] of the [[degree distribution]] respectively. where we used that the [[mean degree]] $\langle k \rangle=\frac{2m}{n}$. > >3. Let $X=\text{number of common neighbors }i,j \text{ share}$. Define $X_{\ell}=\mathbb{1}_{\text{node } \ell \text{ is a common neighbor of }i,j }$, so that $X=\sum_{\ell}X_{\ell}$. The [[probability]] that $\ell$ is a connected node $i$ is $p_{i\ell}= \frac{k_{i}k_{\ell}}{2m}$ and then the [[probability]] that it is connected to node $j$ via one of its remaining stubs is $\frac{(k_{\ell}-1)k_{j}}{2m}$. So the probability that $\ell$ is a common neighbor of $i$ and $j$ is $\frac{k_{i}k_\ell(k_{\ell}-1)k_{j}}{(2m)^{2}}$. Now $\begin{align} \mathbb{E}[X] = \sum_{\ell} \mathbb{E}[X_{\ell}] \\ = & \sum_{\ell} \frac{k_{i}k_{\ell}}{2m} \ \frac{k_{j}(k_{\ell}-1)}{2m} \\ = & \frac{k_{i}k_{j}}{2m} \frac{\sum_{\ell} k_{\ell}(k_{\ell}-1)}{n \langle k \rangle } \\ = & p_{ij} \frac{\langle k^{2} \rangle - \langle k \rangle }{\langle k \rangle }. \end{align}$ In this calculation we have ignored that the probability of self-edges is different than the probability of other edges, which in $(2)$ we showed is fine given that $m,n$ are large. >4. See [[the friendship paradox]] >5. [[the friendship paradox]] tells us that the [[probability]] a node reached by following an edge from some chosen node has [[degree]] $k$ is $\frac{kp_{k}}{\langle k \rangle}$. The probability of having [[excess degree]] $k$ equals the probability of having [[degree]] $k+1$, hence $q_{k}=\frac{(k+1)p_{k+1}}{\langle k \rangle}$. We calculate the [[expectation|average]] [[excess degree]] thus: $\sum_{k=0}^{\infty} kq_{k}= \frac{1}{\langle k \rangle } k(k+1)p_{k+1} = \frac{1}{\langle k \rangle }\sum_{k=0}^{\infty}(k-1)kp_{k}= \frac{\langle k^{2} \rangle - \langle k \rangle }{\langle k \rangle }.$ >6. The [[clustering coefficient]] is the uniform probability that two neighbors of a node are also neighbors of each other. Consider a node $v$ with two neighbors $i$ and $j$. Being neighbors of $v$, $i$ and $j$ are both at the other ends of edges from $v$, and hence the number of *other* edges attached to them, which we denote $k_{i}$ and $k_{j}$ respectively are distributed as $q_{k}$ above. The probability of an edge between $i$ and $j$ is $\frac{k_{i}k_{j}}{2m}$ by $(1)$. Via [[law of total probability]]: $\begin{align} \mathbb{P}(\text{edge}(i,j)) = & \sum_{k_{i},k_{j}=0}^{\infty} \mathbb{P}(\text{edge exists}|\text{nodes }i,j \text{ have excess degrees }k_{i}, k_{j}) \mathbb{P}(k_{i}, k_{j}) \\ = & \sum_{k_{i},k_{j}=0}^{\infty} q_{k_{i}}q_{k_{j}} \frac{k_{i}k_{j}}{2m} \\ = & \frac{1}{2m} \left[ \sum_{k=0}^{\infty} kq_{k} \right]^{2} \\ = & \frac{1}{2m \langle k \rangle^{2}} \left[ \sum_{k=0}^{\infty} k(k+1)p_{k+1} \right]^{2} \\ = & \frac{1}{2m \langle k \rangle^{2} } \left[ \sum_{k=0}^{\infty}(k-1)kp_{k} \right]^{2} \\ = & \frac{1}{n} \frac{[\langle k \rangle^{2} - \langle k \rangle ]^{2}}{\langle k \rangle^{3} }, \end{align}$ where we've used that [[mean degree]] is $\frac{2m}{n}$. >7. See [[the configuration model is locally tree-like]] >8. We now compute the average number of second neighbors of a node $i$in the [[configuration model]]. Following one of the stubs from $i$ leads us to a neighbor of $i$ having [[excess degree distribution]] $q_{k}$ and expected [[excess degree]] $\frac{\langle k^{2} \rangle - \langle k \rangle}{\langle k \rangle}$. $i$ has $\langle k \rangle$ neighbors. Thus the average number of second neighbors will just be $c_{2}:= \langle k \rangle \frac{\langle k^{2} \rangle - \langle k \rangle }{\langle k \rangle } = \langle k^{2} \rangle - \langle k \rangle . $ Hence the average excess degree can be written as $\frac{c_{2}}{c_{1}}$. >9. The average number of 3rd neighbors is just the average number of second neighbors times the expected [[excess degree]] of each second neighbor, i.e., $c_{2} \times \frac{c_{2}}{c_{1}}=\frac{c_{2}^{2}}{c_{1}}$. Generalizing, the average number of $d^{th}$ neighbors is the average number of $(d-1)^{th}$ neighbors times the expected [[excess degree]] of each $(d-1)^{th}$ (which is $\frac{c_{2}}{c_{1}}$ as always); i.e., $c_{d-1} \times \frac{c_{2}}{c_{1}}= c_{d-2} \times \frac{c_{2}^{2}}{c_{1}^{2}}= \dots = c_{1} \times (\frac{c_{2}}{c_{1}})^{d-1}. $ > [!basicexample] Examples and Exercises. >- (Newman 12.1) [[number of matchings in configuration model]] >- (Newman 12.3) [[TODO]] >- (Newman 12.4) [[k-regular configuration model]] >- (Newman 12.5) [[no small components in configuration model with nodes of degree greater than 1]] >- (Newman 12.7) [[binomial random variable#Example with Generating Functions]] >- (Newman 12.8) [[newman 12.8]] >- (Newman 12.10) [[symmetric geometric degree distribution configuration model]] >- (Newman 12.11) [[exponential degree distribution configuration model]] >- (Newman 12.12) [[average degree of within giant component in configuration model exceeds average degree in network as a whole]] >- (Newman 12.16) [[weighted geometric distribution configuration model]] >- (Newman 12.17) [[configuration model of the internet]] ---- #### ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```