scale-free network - Jacob Hume

---- > [!definition] Definition. ([[scale-free network]]) Let $G$ be a [[network]] and $K_{i}$ a [[random variable]] representing the [[degree]] of an arbitrary node $i$. Assume no nodes of $G$ have [[degree]] zero. \ A **scale-free network** is a [[network]] whose [[degree distribution]] looks like a [[power-law probability distribution]] outside of the small-$k$ regime with [[discrete random variable|mass]] $p_{k}=\P(K_{i}=k)=Ck^{-\alpha}=\frac{k^{-\alpha}}{\zeta(\alpha)}$ whenever $k \geq k_{\min}$, where $k_{\min}$ is the smallest [[degree]] $k$ for which the [[power law]] holds. \ Or, equivalently, when $k \geq k_{\min}$, the [[distribution function]] looks like $P_{k}:=\P(K_{i} \geq k)=C\sum_{k'=k}^{\infty} k' ^{-\alpha} \approx C \int_{k}^{\infty} k'^{-\alpha}\, dk' = \frac{C}{\alpha-1} k ^{-(\alpha-1)}. $ **Remark.** In practice, the existence [[power law]] relationship can be readily inferred by looking for linear patterns in the log-log plot (of mass or CDF). However, $\alpha$ should *not* be estimated using [[least squares]] fitting in such a manner due to bias. Instead, letting $N=(\# \text{ of nodes with degree greater than or equal to } k_{\min} )$, one should employ the closed-form [[maximum likelihood estimation|MLE]] $\alpha= 1 + N\left( \sum_{i: k_{i}\geq k_{\min}}^{} \ln \frac{k_{i}}{k_{\min} - \frac{1}{2}} \right)^{-1}$ with statistical error $\sigma=\sqrt{ N }\left( \sum_{i: k_{i} \geq k_{\min}}^{} \ln \frac{k_{i}}{k_{\min - \frac{1}{2}}} \right)^{-1} = \frac{\alpha-1}{N}.$ ![[CleanShot 2023-10-15 at [email protected]|300]] > ![[CleanShot 2023-10-15 at [email protected]|300]] > [!basicproperties] > **Normalization.** Recall that we assume no nodes of $G$ have [[degree]] zero. Assume the [[power-law probability distribution|power-law distribution]] is followed for $k \geq 1$. Since we need to have $\sum_{k=0}^{\infty}p_{k}=\sum_{k=1}^{\infty}p_{k}=1$, we see that $C\sum_{k=1}^{\infty}p_{k}=1, \text{ i.e., } C=\frac{1}{\sum_{k=1}^{\infty}k^{-\alpha}}=\frac{1}{\zeta(\alpha)},$ > where $\zeta(\alpha)$ is the [[Riemann zeta function]]. > If we instead assume the [[power-law probability distribution|power-law distribution]] is followed for $k \geq k_{\min}>1$, we need to invoke the [[generalized zeta function]] but otherwise proceed similarly; see the text. > \ > **Moments.** Suppose we have a [[degree distribution]] that follows a [[power law]] $p_{k}=Ck^{-\alpha}$ for $k \geq k_{\min}$. Then the $m^{th}$ [[moment of a measurable function|moment]] is $\langle k^{m} \rangle=\sum_{k=0}^{k_{\min}-1} k^{m} p_{k} + C \sum_{k=k_{\min}}^{\infty} k^{m-\alpha}. $ > The [[power law]] is a slowly varying function of $k$ for large $k$, so lets approximate the second sum by an [[integral]] $\begin{align} \langle k^{m} \rangle \approx& \sum_{k=0}^{k_{\min}-1} k^{m}p_{k} + C \int_{k_{\min}}^{\infty}k^{m-\alpha}\, dk \\ =& \sum_{k=0}^{k_{\min}-1} k^{m}p_{k} + \frac{C}{m-\alpha+1}\big[k^{m-\alpha+1}\big]_{k_{\min}}^{\infty}. \end{align}$ We see that the $m^{th}$ [[moment of a measurable function|moment]] is finite iff $m \geq \alpha - 1$. Of special interest is the case $m=2$. In many real-world networks we have $\alpha \in [2,3]$, but this observation says $\langle k^{2} \rangle$ will [[diverge]] whenever $\alpha \leq 3$. Of course, all moments are finite in practice, but we will still see that this leads to very interesting behavior. \ **Top-Heavy Distributions.** Read pg. 328 of the book. > > [!basicexample] > **Exercise.** > > #### (a) One of these networks is approximately scale-free, the other is not. Which is which and how can you tell? > ![[CleanShot 2023-10-16 at 15.25.13@2x 1.jpg|300]] > **Solution.** The LHS network is approximately scale-free; the RHS is not. This is clear because the LHS CDF appears roughly linear on a log-log plot. > > #### For the scale-free network give a rough estimate of the exponent $\alpha$ of the degree distribution. > We know that for any $k \geq k_{\min}$, $P_{k}\approx \frac{C}{\alpha-1}k^{-(\alpha - 1)}$, hence $\log P_{k} \approx \log C-(\alpha-1)\log (k) - (\alpha - 1)$. We see that considering the input $k=1$ and corresponding $P_{k}=1$ cancellations give $\log 1 \approx \log C - (\alpha - 1)\log(1) - (\alpha-1), $ > i.e., $\log C\approx \alpha - 1 .$ > Thus picking the input $k=10^{6}$ and corresponding $P_{k}=10^{-8}$ we see $\log 10 ^{-8}= -(\alpha - 1) \log 10^{6},$ > from which we conclude $\alpha \approx \frac{7}{3}$. > > #### Explain why your estimate is only approximate and what source or sources of error are present. What would be a better method of estimating $\alpha$? > The plots are [[distribution function]]s for which successive points are correlated and hence not independent. So there is bias in our estimate. A better method of estimating $\alpha$ is via the [[maximum likelihood estimation|MLE]] discussed above. > > > **Exercise.** A particular network is believed to have a degree distribution that follows a power law for nodes of degree 10 or greater. Among a random sample of nodes in the network, the degrees of the first 20 nodes with degree 10 or greater are: > > $ > \begin{matrix} > 16 & 17 & 10 & 26 & 13 \\ > 14 & 28 & 45 & 10 & 12 \\ > 12 & 10 & 136 & 16 & 25 \\ > 36 & 12 & 14 & 22 & 10 > \end{matrix} > $ > > Estimate the exponent $\alpha$ of the power law and the error on that estimate using the MLE above. > > **Solution.** Within our representative sample, we are given that $k_{\min}=10$, $N=20$. Straightforward computation (Python script) yields the result $2.53 \pm 0.34$. > ---- #### ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```