in-degree distribution of Price's model is a power law

----- > [!proposition] Proposition. ([[in-degree distribution of Price's model is a power law]]) > Consider a [[network]] grown using [[Price's DAG growing model]] with average citation parameter $c$ and launch constant $a$. The [[master equation]] for the evolution of the [[degree distribution|in-degree distribution]] is $\overbrace{(n+1)p_{q}(n+1)}^{\# \text{ at new time step}} = \begin{cases} \overbrace{np_{q}(n)}^{\# \text{ previous}} + \overbrace{\frac{c(q-1+a)}{c+a}p_{q-1}(n)}^{\# \text{ gained}} - \overbrace{\frac{c(q+a)}{c+a}p_{q}(n)}^{\# \text{ lost}} & q > 0 \\ 1 - \frac{ca}{c+a}p_{0}(n) & q=0. \end{cases}$ Writing $p_{q}=p_{q}(\infty)$, the above's asymptotic form is $p_{q} = \begin{cases} \frac{c}{c+a} [(q-1+a)p_{q-1} - (q+a)p_{q}] & \text{ for } q > 0 \\ 1 - \frac{ca}{c+a}p_{0} & \text{ for }q=0 . \end{cases}$ \ The solution to this [[recurrence relation]] is $\begin{align} p_{q} = & \begin{cases} (1 + a / c) \frac{\Gamma(a+q)\Gamma(a+1+ a / c)}{\Gamma(a) \Gamma(a+q+2+ a / c)} & q > 0 \\ \frac{1+ a / c}{a + 1 + a /c} & q=0 \end{cases} \\ = & \frac{B(q+a, \textcolor{Thistle}{2 + a / c})}{B(a, 1 + a / c)}. \end{align}$ Since the [[beta function]] falls off as a [[power law]], it holds that for large values of $q$, the in-degree distribution of the [[network]] goes as $p_{q} \sim q^{-\alpha}$ when $q \gg a$, where the exponent $\alpha$ is $\alpha=\textcolor{Thistle}{2+\frac{a}{c}}$. The [[distribution function]] is $P_{q} = \sum_{q'=q}^{\infty}p_{q'} = \frac{B(q+a, 1+ a / c)}{B(a, 1 + a / c)}$ and we see it also falls with a [[power law]] tail according to $1 + \frac{a}{c}$. > [!proof]- Proof. ([[in-degree distribution of Price's model is a power law]]) > First, we write down equations governing the [[degree distribution|distributions]] of [[degree|in-degree]] $q$ of nodes, i.e., the the number of citations received by papers in terms of $c$ and $a$ in the limit of large $n$. Let $p_{q}(n)$ denote the [[degree distribution|in-degree distribution]] of the growing [[network]], i.e., the fraction of nodes in the [[network]] having in-degree $q$ when the [[network]] contains $n$ nodes and let us examine what happens when we append a single node. > > Consider one of the citations made by this new node. Following the model's definition this citation is to a particular other node $i$ with [[probability]] $\frac{q_{i}+a}{\sum_{i=1}^{n}(q_{i}+a)}=\frac{q_{i}+a}{n \langle q \rangle + na }= \frac{q_{i}+a}{n(c+a)}$ where $q_{i}$ is the in-degree of $i$. > > Now the [[expectation|expected number new citations]] for node $i$ upon introduction of our new paper is $c \cdot \frac{q_{i}+a}{n(c+a)}$. [^1] And there are $np_{q}(n)$ nodes with [[degree|in-degree]] $q$ in our [[network]] and hence the expected number of new citations to nodes having [[degree|in-degree]] $q$ is $n p_{q}(n) \cdot c \cdot \frac{q+a}{n(c+a)}=\frac{c (q+a)}{c+a} p_{q}(n). \ \ (*)$ > Now we can write the [[master equation]] for the evolution of the in-degree distribution as follows. When we add a single new node to our network of $n$ nodes, the number of nodes in the [[network]] having in-degree $q$ will increment by $1$ per previous node with in-degree $q-1$ that receives a citation. [^2] From $(*)$ we know [[expectation|the expected]] number of such nodes is $\frac{c(q-1+a)}{c+a} p_{q-1}(n).$ > Similarly, when we add a single new node to our network of $n$ nodes, the number of nodes in the [[network]] having in-degree $q$ will decrement by $1$ per node previously having degree $q$ that receives a citation and thereby becomes a node of degree $q+1$. $(*)$ says the expected number of such nodes is $\frac{c(q+a)}{c+a} p_{q}(n).$ > The number of nodes with in-degree $q$ in the [[network]] after the addition of a single new node is $(n+1)p_{q}(n+1)$ which, putting together with the results above, is $(n+1)p_{q}(n+1) = \overbrace{np_{q}(n)}^{\# \text{ previous}} + \overbrace{\frac{c(q-1+a)}{c+a}p_{q-1}(n)}^{\# \text{ gained}} - \overbrace{\frac{c(q+a)}{c+a}p_{q}(n)}^{\# \text{ lost}}$ > for $q \neq 0$. When $q=0$ there are no nodes of lower degree that can gain an edge to have degree $0$, so the second term $\# \text{ gained}$ does not appear. On the other hand, we gain a node of degree $0$ any time a new node is added to the [[network]], meaning that $(n+1)p_{0}(n+1)=np_{q}(n) + 1 - \frac{ca}{c+a}p_{0}(n).$ > Assume at the equilibrium, $\lim_{ n \to \infty }p_{q}(n)=p_{q}$. Then we have > > the asymptotic form is $p_{q} = \begin{cases} > \frac{c}{c+a} [(q-1+a)p_{q-1} - (q+a)p_{q}] & \text{ for } q \geq 1 \\ > 1 - \frac{ca}{c+a}p_{0} & \text{ for }q=0 , > \end{cases}$ > . In the case $q=0$ it is easy to explicitly solve $p_{0}=\frac{1+ a / c}{a + 1 + a /c}.$ In the case $q \geq 1$ we get a [[recurrence relation]] $p_{q} = \frac{q+a-1}{q+a+1+ a / c} p _{q-1}.$ > We'll solve inductively. To begin, $p_{1} = \frac{1+a-1}{a+2+ a / c}p_{0} = \frac{a}{a+2+ a / c} \cdot \frac{1 + a/c}{a+1+ a / c}.$ > Then $p_{2} = \frac{2+a-1}{a + 3 + a / c} p _{1}=\frac{a+1}{a+3+ a / c} \cdot \frac{a}{a+2+ a / c} \cdot \frac{1 + a/c}{a+1+ a / c}= \frac{(a+1)a}{(a+3+ a / c)(a+2+a / c)} \cdot \frac{(1+ a/c)}{(a+1+ a / c)}.$ > And $p_{3} = \frac{a+2}{a + 4 + a / c}p_{2} = \frac{(a+2)(a+1)a}{(a + 4 + a / c)(a + 3 + a / c)(a + 2 + a/c)} \cdot \frac{1 + a / c}{a+ 1 + a / c},$ > from which we conclude in the general case that $\begin{align} > p_{q} = & \frac{(a+q-1)(a+q-2) \cdots a}{(a+q+1 + a / c) \cdots (a + 2 + a / c) } \cdot \frac{1 + a / c}{(a + 1 + a / c)} \\ > = & (1 + a / c) \frac{\Gamma(a+q)\Gamma(a+1+ a / c)}{\Gamma(a) \Gamma(a+q+2+ a / c)} \ (1) \\ > = & \frac{B(q+a, 2 + a / c)}{B(a, 1 + a / c)} \ (2), > \end{align}$ > where $\Gamma$ is the [[gamma function]] and $B$ is the [[beta function]]. See the textbook for justification of $(1)$ and $(2)$ (hint: the property $\Gamma(x+1)=x \Gamma(x)$ is useful). $(2)$ is sometimes known as the [[yule distribution]]. Observe that $p_{q}$ only depends on $q$ via the first argument of the upper [[beta function]]. So understanding the [[degree distribution]] amounts to understanding this particular function. For large values of its first argument we can rewrite the [[beta function]] using [[Stirling's approximation]] of the [[gamma function]] $\Gamma(x) \approx \sqrt{ 2\pi } e^{-x}x^{x - \frac{1}{2}},$ > which means that $B(x,y)= \frac{\Gamma(x) \Gamma(y)}{\Gamma(x+y) } \approx \frac{e^{-x}x^{x - \frac{1}{2}}}{e^{-(x+y)}x^{x+y - \frac{1}{2}}} \Gamma(y).$ > Now, we know $(x+y)^{x+y - \frac{1}{2}} = x^{x+y - \frac{1}{2}}\left( 1+ \frac{y}{x} \right)^{x + y - \frac{1}{2}} \approx x^{x+y - \frac{1}{2}}e^{y,}$ > where the last equality [[taylor series|becomes exact]] in the limit of large $x$. Then $B(x,y) \approx \frac{e^{-x}x^{x - \frac{1}{2}}}{e^{-(x+y)}x^{x+y-\frac{1}{2}}e^{y}}\Gamma(y)=x^{-y}\Gamma(y).$ > In other words, the [[beta function]] falls off as a [[power law]] for large values of $x$, with exponent $y$. Applying this finding to $(2)$, we discover that for large values of $q$ the in-degree distribution of the [[network]] goes as $p_{q} \sim q^{-\alpha}$ when $q \gg a$, where the exponent $\alpha$ is $\alpha=2+\frac{a}{c}$. > > ----- #### [^1]: This may be seen as follows. Let $X$ be a [[random variable]] denoting the number of *new* citations entering paper $i$ upon adding the new node. The new node cites $c$ papers. For $j \in [c]$ define $X_{j}=\mathbb{1}_{\{ \text{the }j ^{th} \text{ citation goes to paper }i \}}$. Then $X=\sum_{j=1}^{c}X_{j}$ and hence $\mathbb{E}[X]=\sum_{j=1}^{c} \mathbb{E}[X_{j}]=\sum_{j=1}^{c} \mathbb{P}(X_{j} = = 1)=c \cdot \frac{q_{i}+a}{n(c+a)}$. [^2]: We don't worry about multiedges here because their probability of existence vanishes as $n \to \infty$. ---- #### References > [!backlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM [[]] > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ``` > [!frontlink] > ```dataview > TABLE rows.file.link as "Further Reading" > FROM outgoing([[]]) > FLATTEN file.tags as Tag > WHERE Tag = "#definition" OR Tag = "#theorem" OR Tag = "#MOC" OR Tag = "#proposition" OR Tag = "#axiom" > GROUP BY Tag > ```