I was wondering for a long time how to implement a partial autocorrelation function (pacf)in Python. Maybe it was the temptation to choose the fastest, and freely available algorithm like the Levinson-Durbin algorithm and its variants but the more I tried to read the references, the more I got discouraged. Even the description of the Levinson-Durbin algorithm in a wikipedia article I find dense.

The simplest explanation I could find and follow and from which I wrote a Python function for the spacf is by Gidon Eshel, "Yule-Walker Equations for the AR coefficients

In teaching econometrics, we try to go for the simplest and easiest to describe algorithm that works. Then we will point out to students references to faster algorithms. No, I have not gotten around to teaching the time series part of basic econometrics, but I will do so in the second semester.

Basically we are given a time series data and from it we can compute the sample autocorrelation (sacf) function for various lags.

### AutoRegressive model coefficients

Assume that we have data vector X indeXed from 0 to n-1 and consider the the $$AR(p)$$, AutoRegressive model of order $$p$$, equation:

$$X_{i+1}= \phi_1 X_i + \phi_2 X_{i-1} + \cdots \phi_pX_{i-p+1} + \epsilon_{i+1}$$

where $X_{i+k}$ is the time series value with lag $k$. In terms of least squares. As a particular example, assume that n = 5

and p is 3. then the model data matriX $X$ and the dependent variable $Y$ written in augmented matriX form as

Note that $Y = [X_p, X_{p+1} \ldots ... X_{n-1}]$. The column vector of lag $k$ is

$X_k= [X_{p-k} \ldots X_{n-(k)}]$,

and the least squares normal equations would be given by $(X^tX) \Phi = X^t Y$. The unknown

$\Phi$ 's may then be obtained from the solution $\Phi = (X^tX)^{-1} X^t Y$.

### Autocorrelations

This is already implemented in the sacf function in a previous post. For convenience we repeat it here. The function expects the vector X and the lag for the other vector.def sacf(X, k=1): """ Computes the sample autocorrelation function coeffficient. for given lag k. """ if k == 0: return 1.0 flen = float(len(X)) ybar = float(sum([x for x in X])) / flen D = sum([(x-xbar)**2 for x in X]) N = sum([(x-xbar)* (xtpk -xbar) for (x, xtpk) in zip(X[:-k],X[k:])]) return N/D

### Relation between autocorrelations and AR coefficients $$\Phi$$

The relation between autocorrelations and the $$\Phi$$ coefficients of the AR(p) models.

or more compactly as $$r = R \Phi$$. The matrix $$R$$ is symmetric and is full rank and is invertible, which allows us to solve for $$\Phi = R^{-1} r.$$

### Computing Partial Correlation Coefficients from autocorrelations and AR(p) coefficients

The partial autocorrelations are now obtained by the following algorithm implementing the one described by Eshel:

Given acf - autocorrelation function values up to lag maxlag.

Find pcf - corresponding partial correlation function vector.

set up array pcf up to size maxlag pcf[0] = 1.0 pcf[1] = acf[1] for lag from 2 upto maxlag: Compute R Compute r Compute Phi = R**(-1) r pcf[lag] = Phi[lag] return pcf

### Python code

And finally here is our Python code, which has taken a long time to develop:

# -*- coding: utf-8 -*- """ file : spacf.py author: dr. ernesto p. adorio version: 0.0.1 april 29, 2010 reference: Gidon Eshel, Yule-Walker Equations for the AR coefficients. """ import math import matlib from gausselim import Solve def acf(X, k=1): """ Computes the sample autocorrelation function coeffficient. for given lag k and input data X. """ if k == 0: return 1.0 flen = float(len(X)) xbar = float(sum([x for x in X])) / flen D = sum([(x-xbar)**2 for x in X]) N = sum([ (x-xbar)* (xtpk -xbar) for (x, xtpk) in zip(X[:-k],X[k:])]) return N/D def sacf(X, maxlag): return [acf(X, i) for i in range(maxlag+1)] #includes 0 lag def Rmatrix(r, maxlag): """ Creates R matrix (Yule-Walker correlation matrix) Input: r - sample acf values up to maxlag. maxlag- maximum lag """ R = matlib.matIdentity(maxlag) for i in range(0, maxlag): for j in range(i+1, maxlag): R[i][j] = R[j][i]= r[j-i] return R def spacf(acf, maxlag): pacfs=acf[:] for i in range(2,maxlag+1): R = Rmatrix(acf, i) r = acf[1:i+1] Phi = Solve(R, r) # solution of R Phi = r pacfs[i] = Phi[-1] # last element of vector Phi. return pacfs #data from Danao's text. page 323. X = ['772.9', '909.4', '1080.3', '1276.2', '1380.6', '1354.8', '1096.9', '1066.7', '1108.7', '1109', '1203.7', '1328.2', '1380', '1435.3', '1416.2', '1494.9', '1525.6', '1551.1', '1539.2', '1629.1', '1665.3', '1708.7', '1799.4', '1873.3', '1973.3', '2087.6', '2208.3', '2271.4', '2365.6', '2423.3', '2416.2', '2484.8', '2608.5', '2744.1', '2729.3', '2695', '2826.7', '2958.6', '3115.2', '3192.4', '3187.1', '3248.8', '3166', '3279.1', '3489.9', '3585.2', '3676.5'] def Test1(maxlag): """ Prints out values of acf and pacf for the Danao data set. """ Y = [float(x) for x in X] r = sacf(Y, maxlag) pacfs = spacf(r, maxlag) print "The autocorrelations and partisl acorrelation" for i , (acf, pacf) in enumerate(zip(r, pacfs)): print i, acf, pacf if __name__== "__main__": Test1(17)

Given the vector acf's for various lags, the corresponding pacf values will be computed by the

`sacf`function. Here is the output from the terminal when running python sacf.py:

The autocorrelations and partisl acorrelation

0 1.0 1.0

1 0.925682317386 0.925682317386

2 0.852706579655 -0.0292160394675

3 0.787096604484 0.0122016750938

4 0.737850083142 0.0784204182616

5 0.697253316633 0.0358005082792

6 0.64842031925 -0.071567073034

7 0.587527096625 -0.0988314678858

8 0.519141887224 -0.0843023226042

9 0.450228026064 -0.0629615959671

10 0.384896320219 -0.0480711212172

11 0.32584304195 -0.0201806609446

12 0.273845336962 0.00621787084071

13 0.216766465976 -0.0631790415256

14 0.156888401912 -0.0477940531702

15 0.0992408085419 -0.0204290294085

16 0.0477462812535 -0.0101131483561

17 -0.00206714577028 -0.0495417448475

The computed pacf values match those with page 324 of Danao's text and we are confident the implementation above is correct. In practice the maximum lag to use should not exceed N/4 where N is the vector length of the time series data.

To be revised. Latex formulas will be displayed as images in a future version of this post.

The gaussian elimination routine may be found in Python: gaussian elimination which includes the Solve routine for solving linear systems $$A x = b$$.

We will try to implement the recursive Durbin algorithm in a future post.

## No comments:

## Post a Comment