Leskovec, Kleinberg, Faloutsos, KDD 2005

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

Contents

[edit] Graphs Over Time: Densification Laws, Shrinking Diameters and Possible Explanations

[edit] Overview

Graphs like social networks are characterized by a topology, outgoing links distribution and diameter. Approaches in literature studying real-world graphs have shown that real-world graphs obey a scale free distribution (power law) of out-degree and in-degree links. However, it is also interesting to see how graphs evolve over time. In particular, the temporal changes in graph characteristics like diameter and density (average out-degree) apart from addition and deletion of nodes. Approaches in literature assume that the density of a graph remains constant over time, while the diameter of the graph increases slowly. In contrast, this paper makes two important observations (empirically)

  1. Density-Trend: The density of links in a graph increases with time in a super-linear (and sub-quadratic) fashion. If V(t) are the number of nodes in a graph at time t, the number of edges E(t) is given by: E(t) = A * V(t)^k, where 1 < k < 2 and A is a constant.
  2. Diameter-Trend: The diameter of the the graph keeps shrinking with time.

2. has an intuitive explanation that nodes that get added to the network in due course of time can serve as intermediaries between distantly connected nodes thus shrinking the diameter. The main contributions of this paper are

  • An empirical analysis of real-world graphs confirming with the aforementioned observations.
  • Models for generating graphs that account for these observations while having scale free properties.

[edit] Empirical Analysis

[edit] Density-Trend

Analysis performed over (1) paper-citation graph (ArXiv), (2) patent citation graph, (3) autonomous systems graph and (4) author-paper affiliation graphs confirms that the number of edges in graphs are a super-linear function of the number of nodes. Therefore, as nodes get added to the graphs with time, the density of the graphs increases. The rate of increase of density varies depending on the type of graph (1.15 < k < 1.69), For example, the rate of density increase is lower for author-paper affiliation than ArXiv graph but super-linear nonetheless!).

[edit] Diameter-Trend

The diameter of a graph is defined as the maximum distance between a pair of nodes in the graph such that 90% of the nodes in the graphs have distances between them less than this value. This definition of diameter is termed as the effective diameter of the graph. Since computing shortest paths among all node pairs for computing diameter is an expensive process on real graphs, subgraphs of the graphs selected at random are used for computation. For all the datasets (1)(2)(3) and (4), the diameter of the graphs is shown to decrease with time.

[edit] Network Models

In this paper, two network-models attempting to satisfy the observed temporal trends along with basic properties like power-law distribution, are proposed:

[edit] Community guided attachment (CGA) Model

Scale free networks in the real world are characterized by a self similar structure (fractal patterns). For example a large community can in turn contain a set of sub-communities. This recursive structure can be described by a n-ary tree in which the leaves form the nodes of the graph. The probability of a connection between any two nodes (leaves) is expressed in terms of a hierarchical tree-distance which is essentially the lowest ancestor common to both the nodes. If d(X,Y) is the tree-distance between two nodes (leaves) X and Y, the probability of a link is between A and B is given by P(X,Y) = B * exp(-d(X,Y)). The graph built using this distribution satisifies the density-trend but does not obey power laws or diameter-shrinkage trend.

[edit] Forest-fire Model

The limitations of the CGA model are addressed by the Forest-fire model. If a person joins an organization, he/she has a set of initial contacts. These contacts tend to introduce him to more individuals in the organization. The person in turn can continue the process recursively with the newly introduced individuals. Forest-fire model uses this intuition for modeling a network. Specifically, the steps involved in generating graphs with this model are as follows.

  1. Let G be a network and N be the new node that wants to join network
  2. A node in G is chosen randomly (say M) and N is connected to it.
  3. A binomial random number T is generated and T links (in-links and out-links) of M are selected with a certain probability. Let m_1, m_2, ..., m_T be the nodes at the end of these links
  4. Connections are made to m_1, m_2, ...m_T from N
  5. Steps (3)-(4) are repeated recursively with m_1, m_2, ...m_T

The authors use this model to to generate synthetic graphs and analyze them. Simulation results show that the Forest-fire model not only confirms with the density increase and diameter decrease trends, but also obeys the power-law distribution that is inherent in the real-world networks.

Personal tools