### Pairs trading

A market-neutral trading strategy used by arbitrageurs to make money from the divergence and re-convergence of prices of a pair of equities.

Key to the strategy is an analysis of the historical movements of the prices of the two stocks: the historical data of the pair are statistically tested to see if a relationship exists between them.

### Correlation

One measure of the strength of this relationship is correlation. A high correlation between two pairs of equities indicates that the two move together. Knowledge from the statistical tests is combined with knowledge of the market, industry or sector to take a market position on the two securities.

### Cointegration

A far more robust measure to test the relationship between a pair of equities is cointegration. Cointegration is a statistical property of two or more time series that indicates whether the series have a long term relationship. When two series are cointegrated their prices may move away from each other in the short term, but they will in time revert to a mean value. This mean value is identified by the cointegrating regression of the two variables.

### Residuals

Suppose a sample of investment bankers is taken and the average bonus of the group is calculated. The amount by which each individual banker’s bonus differs from this average is called a residual or prediction error. Summing these residuals necessarily equals zero.

### Normality

Since the sum total of all residual values equals zero they display, by definition, mean-reverting characteristics and oscillate around zero. For pairs traders they will also, ideally, be normally distributed *both through time and around zero*. ArbMaker has configurable tools designed to assess the degree to which this is so for price spreads – mirrors of residuals. Forecasting and trading become easier when the most promising such cases can be identified. An example is this Q-Q Normailty chart:

Analysing residuals for normality and mean-reversion is a crucial aspect of assessing the tradability of a pair. Simply identifying the presence of cointegration alone is not enough to achieve consistent results.

### Tradability

In the application of cointegration to pair trading, the residuals are analysed to make the best determination of when the prices of the pair have deviated from their long-term mean; and to estimate the time of mean reversion of the series. This is, in essence, the process of gauging tradability. To assist this judgement ArbMaker uses normality filters during the scanning process that help return the most tradeable pairs in addition to reporting mean reversion statistics.

### Cointegrating regression

A cointegrating regression, like a single linear regression, regresses two or more series or variables and uses output statistics from the regression to determine if they are cointegrated. There are, however, important considerations when interpreting outputs.

Among the outputs are the mean reversion coefficient, beta (β), the coefficient of determination (R^{2}/R-squared) and t-values all of which are used to interpret the degree and nature of the cointegration, if any.

### Stationarity

Prerequisite to running a cointegrating regression is testing that the time series are integrated of the same order. The order of integration of a time series tells us about the stochastic properties of the series over time. In simple terms, a stationary series has a zero mean and constant variance.

Stationarity is a desirable property in analysing and forecasting time series. The degree of non-stationarity and order of a series is determined by the number to times it must be differenced to make it stationary. Most economic and financial series are integrated of order one, or have a single ‘ unit roots’ – ie they are differenced once to achieve stationarity.

### Mean reversion coefficient

Also called an ‘adjustment coefficient’ it indicates the expected ‘speed’ of mean reversion per time period. Thus a mean reversion coefficient of -0.45 suggests 45% movement towards equilibrium one time period from now.

Absolute mean reversion coefficient values between 0 and 1 suggest stable systems; absolute values greater than one are less stable and also suggest over-shoot. For a cointegrating equation to be valid the mean reversion coefficient should be between zero and -1. The negative sign indicates adjustment back towards equilibrium. Mean reversion coefficients greater than 0 and 1 in a stationary time series contain autocorrelation.

### Half-life

The mean reversion coefficent is the basis for the half-life calculation in ArbMaker:

t½ = ln(2)/mean reversion coefficient

Where ln refers to the log of 2. Half-life is not a magic number; and numbers derived from mean reversion coefficients > absolute (1) deserve scepticism. ArbMaker outputs the half-lives of pairs but the quantity and timing of the cross-overs on the residual chart, in combination with ArbMaker’s proprietary normality measures, should not be ignored and are frequently more helpful.

### Beta (**β**)

The cointegrating coefficient or factor; and slope of the line. A beta of 1.15 suggests for every move of 1 in X, Y will adjust by 1.15.

### Coefficient of correlation (R^{2}/R-squared)

The output of most regressions, it is the degree of determination between X and Y explaining the relationship between X and Y. R-squared, however, is only one of the outputs from the cointegrating regression that must be evaluated in reaching a conclusion and taking a market position.

### Modelling time series – lags, autocorrelation and partial autocorrelation

Most data on economic and financial variables are time series. A time series is a sequence of variables measured over different intervals in time – daily, weekly, monthly quarterly or annual, for example. A first step in the statistical analysis of equity prices for pairs is selecting an appropriate model of the series generated over time by the economic or financial phenomenon.

### Autocorrelation

Autocorrelation is used to identify an appropriate model in studying or analysing a time series. Autocorrelation answers the question, “Is there a (linear) relationship between the value of a series now and the value of the same series one or more time periods in the past?”

Positive autocorrelation is the tendency for a given condition to persist – for example, during rainy season tomorrow is likely to be raining and wet if today is raining and wet. In equity markets a parallel might be price momentum – especially in the days after financial results. Positve reaction is followed by positve reaction; and negative by negative. But why is it important for pairs traders? This Partial Autocorrelation Function graph helps explain:

Negative autocorrelation, on the other hand, is the tendency for negative observations to be followed by positive observations and vice versa. Think of an exam situation. A student spends too much time on the first question, senses time slipping away and rushes through the second not spending enough time to get full marks. On the third he feels he back on track and again spends too long finishing. Come the fourth question he tries to recuperate the time once more…and so on with the cyle repeated for the entire exam.

### Autoregression & Lags

A time series may, among others, be modelled and forecasted using a white noise model, a moving average (MA) model, an autoregressive (AR) model or a combination of a MA and AR (ARMA). The dominant model in ArbMaker is the AR.

An AR model is a linear regression of the current value (y_{t}) of a time series against one or more prior values of the series (y_{t-1}, y_{t-2}, etc.), with t = 1, 2, 3 … t. The values of the time series before the present are characterised as lags: so the value one time period back, y_{t-1}, is referred to as lag one; that two time periods back, y_{t-2} is lag two; and so on.

Ignoring lag selection will distort results (in the worst case materially flattering them) and lead to poor trade decisions.

### Partial Autocorrelation

ArbMaker uses a partial autocorrelation calculation to determine the appropriate lag order of the AR model. If the sample autocorrelation plot indicates that an AR model may be appropriate, then the sample partial autocorrelation is calculated and plotted to identify the order of the model. The order is the number of lags to include as regressors in the model and is shown on a PACF plot at the point where the partial autocorrelations essentially become zero. Lagged pairs are generally tougher to trade because normalising their residuals means offsetting data series in the recalculation of the regressions necessary to test for cointegration. This in turn means losing observation points from the graphical representations of the relationship. For trading purposes that is not very helpful. ArbMaker thus automatically filters out higher lagged pairs from its returns.