Testing Independence for Multivariate Time Series

Over the next couple of weeks I will be attending a series of lectures from researchers here at Lancaster University. They will present to me and my peers an overview of their research area and their specific interests. The topics covered are diverse in their content and applications, however, they all come overarching title of 'Statistics and Operational Research'. The idea is that I will get a broad overview of the diverse areas of interest here, guiding my choice of topics for a couple of reports I am due to produce in the coming months. I will be writing blog posts over the next two weeks to highlight some interesting things I have picked up from these talks. The first research area we were introduced to was time series/business analytics.

What are time series?

A time series is essentially a collection of data that is recorded sequentially at regular intervals. For example, this could be measurements of the temperature in your living room every second, every hour or every evening. A multivariate time series is a collection of these time series running in parallel. An example of this could be measurements of your living room, bedroom and kitchen at different times. A usual way of examining such data is to see if there are any dependencies between the observations. There are a couple of ways we could see dependencies. Firstly, it may be that the temperature in the room now depends somehow on the temperature ten minutes before, i.e. dependence on past observations. Secondly, it may be that the temperature in the living room depends on the temperature in the kitchen, i.e. between the different time series. You may also expect a combination of the two, for example, the kitchen being hot right now might mean that the living room heats up a few minutes later.

To analyse multivariate time series we therefore need ways of measuring these dependencies. A usual measure is the correlation/covariance matrix. These are nice in that they are relatively intuitive and they have consistent estimators (get more accurate with more data) that be can used to make inferences from data. The drawback is they only capture linear dependence, meaning they cannot pick up on more complicated relationships between observations, a good example is the relationship between temperature and energy consumption discussed in a previous post. Furthermore, since the correlation takes a value between -1 and 1, going from strongly negative (-1) to strongly positive (1), it would make sense for 0 to represent that the data are independent, which unfortunately is not the case. However, whilst attending the talk last week by Prof Konstantinos Fokianos, I was directed to some work he did with Maria Pitsillou (2018) [1]. They propose the so called matrix auto-distance covariance/correlation functions as an alternative to alleviate this pitfall.

Examining the drawbacks of auto-covariance as a measure for dependence

If \(\mathbf{X}_t = \{X_{t;r} : t \in \mathbb{Z}, r=1, \dots , d\}\)  be a \(d\)-dimensional stationary time series with mean \(\mathbf{\mu}\) . The autocovariance matrix of \(\{ X_t \}\), at lag \(j\), is given by

\(\Gamma(j) = \mathrm{E} \left[ (\mathbf{X}_{t+j} -\mathbf{\mu})(\mathbf{X}_{t} -\mathbf{\mu})^{'}\right] = \Big[ \gamma_{a,b}(j)\Big]_{a,b=1}^d\).

The important thing to observe here is that each of the entries in this matrix contain a term of the form \(\mathrm{Cov}(X_{t;a},X_{t+j;b})\), i.e. the covariance between the \(a\)th time series at time \(t\) and the \(b\)th time series at time \(t+j\). I will now illustrate how this measure can miss some non-linear dependencies. Suppose we have some \(X \sim \mathrm{N}(0, \sigma^2)\) and then define the random variable \(Y=X^2\). Clearly \(X\) and \(Y\) are dependent, since if I tell you that \(X=x\) then one must have \(Y=x^2\). However, we have the following result 

\( \mathrm{Cov}(X,Y) = \mathrm{E}[ XY ]- \mathrm{E}[X] \mathrm{E}[Y]\)
                                      \(=  \mathrm{E}[ X^3] - 0 \cdot \mathrm{E}[X^2] = 0\)

since \(\mathrm{E}[X^3]=0\) due to the pdf of \(X\) being an even function. This counter example shows that two random variables having zero covariance/correlation does not imply that they are independent. 

The proposed alternative

To measure the dependence between \(X_{t;a}\) and \(X_{t+j;b}\) they propose the pairwise auto-distance correlation function 

\(\displaystyle V_{a,b}^2(j) = \frac{1}{\pi^2} \int_{\mathbb{R}^2} \frac{|\mathrm{Cov}(e^{iuX_{t;a}}, e^{ivX_{t+j;b}})|^2}{|u|^2|v|^2} \: dudv \).

This to me looks a lot more complicated than the ordinary autocovariance, you may or may not agree. Either way, this has a really nice property: if it is zero then this implies that \(X_{t;a}\) and \(X_{t+j;b}\) are independent. Fokianos and Pitsillou also go on to present estimators for these values so that it can be used in practice by calculating these estimates with the data and seeing how close the estimates are to zero. If they are close enough then we can conclude that the data are indeed independent! The draw back of this measure is that it assumes the time series are stationary. This essentially means that they have no long term trend, which is quite a big assumption. Nonetheless, I though it was a nice result.

[1]K Fokianos, M Pitsillou; Testing independence for multivariate time series via the auto-distance correlation matrix, Biometrika, Volume 105, Issue 2, 1 June 2018, Pages 337–352,  https://doi.org/10.1093/biomet/asx082

Comments