Statistical Network Models

The aim of statistical analysis is to infer useful information from available data. In many settings this data can represent the interplay between some elements of a system, often called a network. An intuitive way to represent networks is with a mathematical graph, where nodes represent the entities and edges represent the present connections between those entities.

Examples of such networks have been seen in my previous posts on network flows and the small world problem. Sometimes the structure of the network is known and the aim is to infer some properties of it, such as how 'commodities' flow through it, i.e. network flows. Alternatively, we may not know the underlying network structure and would like to infer this from the data.

In order to make statistical inference on network data it is necessary to have a model to describe a general assumption on how the data was generated. Most network models are parametric and can be organised quite nicely into two groups: static and dynamic. Static models consider explaining an observed network based on one snapshot of data, while dynamic models are more concerned with examining the mechanisms which cause changes in the network over time.

Static Models

These are the models which have historically received the most attention. One of the earliest models was the Erdös-Rényi random graph model proposed by Paul Erdös and Alfréd Rényi in 1959. Assume that we have some data on a network which consists of a known number of nodes and that the network graph is complete, i.e. there exists an edge between any two nodes. In this model we assume that each edge can be in one of two states; it can be 'on', meaning that a connection exists, or it can be 'off' meaning no connection exists. The general idea is that there is some fixed parameter \(p\) which represents the probability of an edge being on between any two nodes. It is then assumed that any observed network data was generated by considering each edge independently and setting it to be on with probability \(p\).

Consider the two different observations of some network below. Under the assumptions of the Erdös-Rényi model, since A has less edges we would expect it to have a low probability \(p\), whilst since B has a lot more edges we would expect this parameter to be much higher. This method of analysis can be done statistically using either Likelihood or Bayesian inference. The independence assumption on the edges makes this procedure very straightforward, allowing estimates for the parameter \(\hat{p}\) to be found analytically. For example, in this case we find that for A we have \(\hat{p}_A=0.27\) whilst for B we have \(\hat{p}_B=0.8\).

The assumption that edges are independent is unfortunately a strong and unrealistic one. For example, if the nodes represent people and edges friendships, then if there are three nodes \(i\), \(j\), \(k\) such that there exist an edge from \(i\) to \(j\) and also an edge from \(j\) to \(k\) then it is highly likely that there is an edge from \(i\) to \(k\), since they share a mutual friend. More complex models are thus required which can capture these dependencies. For most of them, however, the procedure is the same as above; an assumption on the origins of the data is made, dependent on some parameters \(\underline{\theta} = ( \theta_1, \theta_2, \dots, \theta_d)\). One then looks to use the data to infer the values of these parameters, which in turn allows conclusions to be drawn about the features of the network

Dynamic Models

As I mentioned earlier there are other models which take a different perspective, viewing a network as a dynamic system. A strong argument can be made that many networks are dynamic in nature, in that they represent relationships between entities which are not stationary over time, and instead evolve. The aim of dynamic network models is therefore to try and discover the nature of this evolution.

The ideas of inference are essentially the same as for the static models, though the procedures used can be more complicated. Most dynamic models are also parametric, specifying a generating procedure for the data
which depends on some parameters, but also on time. With a model in place one can then similarly apply Likelihood or Bayesian techniques for parameter estimation. A recent graduate from STOR-i worked in this area during his PhD, developing a dynamic extension of the static Stochastic Block Model [2].

References

[1] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi. 2010. A Survey of Statistical Network Models. Found. Trends Mach. Learn. 2, 2 (February 2010), 129-233. DOI=http://dx.doi.org/10.1561/2200000005

[2] Ludkin, M., Eckley, I. & Neal, P. Stat Comput (2018) 28: 1201. https://doi.org/10.1007/s11222-017-9788-9

George's Blog

Search This Blog