A Random Post

The world we live in is full of uncertainty. At times it does feel as though anything could happen. However, often the 'randomness' we observe is not as arbitrary as it seems, and instead exhibits patterns. This crucial observation led to the development of probability theory, which looked to handle such uncertainties with rigorous mathematics. This has since be utilised across a wide range of disciplines, including both Statistics and Operational Research.

Making the random rigorous 

Every time you roll a dice you know that a number from 1 to 6 will come up, each equally likely. There is no way of knowing what number you will roll before hand, so in some sense the result is 'random'. However, it is not completely random.

A crucial idea of probability theory is that of a random variable, which essentially formalises situations like rolling a dice in a mathematically rigorous way. In technical terms this is a function, which takes some number in its domain (1 to 6 for a dice) and returns a value between 0 and 1, representing how likely this value is to show up. In the case of the dice, each number has an equal probability of coming up, i.e. a probability of 1/6. However, we may have a biased dice, and so the probability of one number may be higher whilst being lower for other numbers. The only constraint on these values is that when we sum the probabilities of all the cases it must equal 1.

It's a little confusing since this is called a random variable, when it isn't really random in the usual sense of the word, and instead exhibits a certain pattern. For example, consider the video below of the so-called Galton Board. The balls which are dropped in seem to jump around the obstacles very arbitrarily, however, the distribution of them, i.e. the number of the balls which end up in any given position, follow a very clear bell-shaped pattern. This is in fact the well known normal distribution, and what is being observed here is the normal approximation to the binomial distribution.


Although the behavior of the balls is random, the distribution of the value is in a sense certain.  A wide variety of other uncertainties we observe in daily life also follow specific distributions. This allows us to say something concrete about the likelihood of some things happening, such as extreme events (see previous post). This general idea forms the basis of much inferential statistics, whereby a statistician will make some assumption on the general family the distribution of the data belongs, described by some parameters, and then use the data to infer the values of these parameters, hence fully describing the distribution.

A new generation

Traditionally random variables were just a tool for handling uncertainty mathematically, however, in recent times vast improvements in computational technology mean that it is possible to actually generate values which follow pretty well any distribution you like, we call this random variate generation.

This is useful in simulation, a field of operational research whereby certain physical systems are replicated, such as queues or traffic through road networks. For example, to simulate a queue we may know that people arrive at intervals which follow some distribution. We therefore draw these values from the distribution and hence simulate people arriving. Similar ideas can be applied to simulate other parts of the service process and hence recreate the whole thing. This can be used to test proposed operational set-ups, either completely new or as an alternative to an existing one. 

Random variate generation also forms the basis of a very powerful statistical inference method known as Markov Chain Monte Carlo (MCMC), which helps solve the parameter inference problem highlighted above. Broadly, it allows us to sample values from the distribution of these parameters, from which we can make conclusions concerning their possible values, such as which we think is most likely and how certain we are of this.

Generating them

Rolling a dice is probably the purest way to generate random numbers. However, in many applications 6 values is not enough. In fact, very often we want to generate random numbers which come from continuous distributions, i.e. they could be any real number. This means we need to do something a little smarter. Thankfully, computers are there to help us.

A key idea in the generation of random variables is the probability integral transform. Which essentially says there is one distribution, the uniform distribution (on the 0-1 domain), which can be used to generate any other, provided we have knowledge of its distribution function. This means that all we really need to be able to generate are uniform random variables.

This is a problem which many have looked to solve and has been done so with applications of number theory. The numbers which are generated are referred to as pseudo-random numbers, since they are not truly random. They are instead generated by some iterative procedure which produces a very very long list of numbers between 0 and 1. These lists are so long that the numbers we see do appear to be random. Any of the packages you may use which generate random variables, such as \(\texttt{R}\), will be driven by such pseudo-random number generators.

Comments