Given a time-series vector Y, and corresponding time vector T, the Dickey-Fuller test for the presence of a unit root or non-stationarity works with the following test equations (models):
basic: $$\Delta y_t = \tau y_{t-1}$$
with a constant: $$\Delta y_t = \beta + \tau + y_{t-1}$$
with constant and trend: $$\Delta y_t = \beta_1 + \beta_2 T_t + \tau y_{t-1}$$
Each model require its own particular Dickey-Fuller table of critical values which is related to but different from the t-statistical distribution.
The full Augmented Dickey-Fuller test works with the following models given a specified max lag of the differenced vector $$\Delta Y$$
basic: $$\Delta y_t = \tau y_{t-1} + \rho_1 \Delta Y_{t-1}+ \rho_2 \Delta y_{t-2} +\ldots + \rho_p \Delta y_{t-p}$$
with a constant: $$\Delta y_t = \beta + \tau y_{t-1} + \rho \Delta y_{t-1}+ \rho_2 \Delta y_{t-2} + \ldots + \rho_p \Delta y_{t-p}$$
with constant and trend: $$\Delta y_t = \beta_1 + \beta_2 T_t + \tau y_{t-1}+\rho \Delta Y_{t-1}+ \rho_2 \Delta y_{t-2} + \ldots + \rho_p \Delta y_{t-p}$$
Thus if we have to be careful in specifying the input X matrix and the independent response vector to an OLS routine.
The test statistic Dickey-Fuller test in all models is the value $${\frac{\hat{\tau}}{se (\hat{\tau})}$$ where the denominator is the standard error of the computed coefficient of the lagged term $$y_{t-1}.$$
There are published tables for the critical values of the D-F test.
@@@ Place references here!!!@@@
Unlike the well known standard distribution such as the normal, t, F and $$\chi^2$$ distributions, the critical values are obtained by simulation.
The NUll hypothesis is that $$H_0: \tau = 0$$, i.e, the series $$Y$$ is non-stationary. If the absolute values of the estimated $\tau$$ is less than the absolute value of the critical values for specified $$\alpha$$, we do not reject the Null hypothesis, otherwise we reject the Null hypothesis.
Time for an example: Let us review Danao's Dicker-Fuller test applied on the GNP data in page 323. For convenience we copy it here:
YEAR GNPR PCER PGNP
1940 772.9 502.6 0.1299
1941 909.4 531.1 0.138003
1942 1080.3 527.6 0.147181
1943 1276.2 539.9 0.150995
1944 1380.6 557.1 0.153122
1945 1354.8 592.7 0.157514
1946 1096.9 655 0.193637
1947 1066.7 666.6 0.220493
1948 1108.7 681.8 0.235952
1949 1109 695.4 0.234806
1950 1203.7 733.2 0.239512
1951 1328.2 748.7 0.251016
1952 1380 771.4 0.254783
1953 1435.3 802.5 0.258901
1954 1416.2 822.7 0.263028
1955 1494.9 873.8 0.271523
1956 1525.6 899.8 0.280676
1957 1551.1 919.7 0.290761
1958 1539.2 932.9 0.296778
1959 1629.1 979.4 0.30434
1960 1665.3 1005.1 0.309434
1961 1708.7 1025.2 0.312401
1962 1799.4 1069 0.319329
1963 1873.3 1108.4 0.323974
1964 1973.3 1170.6 0.329296
1965 2087.6 1236.4 0.337756
1966 2208.3 1298.9 0.34959
1967 2271.4 1337.7 0.359426
1968 2365.6 1405.9 0.377367
1969 2423.3 1456.7 0.397763
1970 2416.2 1492 0.420288
1971 2484.8 1538.8 0.443778
1972 2608.5 1621.9 0.464942
1973 2744.1 1689.6 0.495354
1974 2729.3 1674 0.539626
1975 2695 1711.9 0.593098
1976 2826.7 1803.9 0.6307
1977 2958.6 1883.8 0.672784
1978 3115.2 1961 0.722169
1979 3192.4 2004.4 0.785678
1980 3187.1 2000.4 0.857206
1981 3248.8 2024.2 0.939608
1982 3166 2050.7 1
1983 3279.1 2146 1.038608
1984 3489.9 2246.3 1.078827
1985 3585.2 2324.5 1.115168
1986 3676.5 2418.6 1.144703
Using the GNP and Year columns only, the model to fit in R notation is Delta(GNPR)_{t-1} ~ 1 + Year + GNPR_{t-1}. The response variable is the lagged differenced of GNPR and the independent variable are a constant, the year and the lagged GNPR. When we use our Python program, it outputs the following:
[-12794.909494518186, 6.6503686622997975, -0.098105000002848719]
[8538.0875917954763, 4.4232851542101859, 0.073658549799719808]
With drift-trend: df test_statistic -1.33188883395
The first output array are the computed estimated slope coefficients. The next array list the corresponding standard errors. The computed DF statistic matches that of Danao.
Now let us specify a max lag for the differenced responsse variable. The computed input X matrix is given by the following (we added labels for clarity):
C Year Ylag delYlag delY
1.0000 1942.0000 909.4000 34.4000 | 170.9000
1.0000 1943.0000 1080.3000 25.0000 | 195.9000
1.0000 1944.0000 1276.2000 -91.5000 | 104.4000
1.0000 1945.0000 1380.6000 -130.2000 | -25.8000
1.0000 1946.0000 1354.8000 -232.1000 | -257.9000
1.0000 1947.0000 1096.9000 227.7000 | -30.2000
1.0000 1948.0000 1066.7000 72.2000 | 42.0000
1.0000 1949.0000 1108.7000 -41.7000 | 0.3000
1.0000 1950.0000 1109.0000 94.4000 | 94.7000
1.0000 1951.0000 1203.7000 29.8000 | 124.5000
1.0000 1952.0000 1328.2000 -72.7000 | 51.8000
1.0000 1953.0000 1380.0000 3.5000 | 55.3000
1.0000 1954.0000 1435.3000 -74.4000 | -19.1000
1.0000 1955.0000 1416.2000 97.8000 | 78.7000
1.0000 1956.0000 1494.9000 -48.0000 | 30.7000
1.0000 1957.0000 1525.6000 -5.2000 | 25.5000
1.0000 1958.0000 1551.1000 -37.4000 | -11.9000
1.0000 1959.0000 1539.2000 101.8000 | 89.9000
1.0000 1960.0000 1629.1000 -53.7000 | 36.2000
1.0000 1961.0000 1665.3000 7.2000 | 43.4000
1.0000 1962.0000 1708.7000 47.3000 | 90.7000
1.0000 1963.0000 1799.4000 -16.8000 | 73.9000
1.0000 1964.0000 1873.3000 26.1000 | 100.0000
1.0000 1965.0000 1973.3000 14.3000 | 114.3000
1.0000 1966.0000 2087.6000 6.4000 | 120.7000
1.0000 1967.0000 2208.3000 -57.6000 | 63.1000
1.0000 1968.0000 2271.4000 31.1000 | 94.2000
1.0000 1969.0000 2365.6000 -36.5000 | 57.7000
1.0000 1970.0000 2423.3000 -64.8000 | -7.1000
1.0000 1971.0000 2416.2000 75.7000 | 68.6000
1.0000 1972.0000 2484.8000 55.1000 | 123.7000
1.0000 1973.0000 2608.5000 11.9000 | 135.6000
1.0000 1974.0000 2744.1000 -150.4000 | -14.8000
1.0000 1975.0000 2729.3000 -19.5000 | -34.3000
1.0000 1976.0000 2695.0000 166.0000 | 131.7000
1.0000 1977.0000 2826.7000 0.2000 | 131.9000
1.0000 1978.0000 2958.6000 24.7000 | 156.6000
1.0000 1979.0000 3115.2000 -79.4000 | 77.2000
1.0000 1980.0000 3192.4000 -82.5000 | -5.3000
1.0000 1981.0000 3187.1000 67.0000 | 61.7000
1.0000 1982.0000 3248.8000 -144.5000 | -82.8000
1.0000 1983.0000 3166.0000 195.9000 | 113.1000
1.0000 1984.0000 3279.1000 97.7000 | 210.8000
1.0000 1985.0000 3489.9000 -115.5000 | 95.3000
1.0000 1986.0000 3585.2000 -4.0000 | 91.3000
Our program returned the following slopes/st. errors:
[466.8367900006233, -0.22813001176738901, 0.020909396153443754, 0.50373782051647442]
[8182.7301885576253, 4.2390077773331614, 0.070306560324028558, 0.12685872804481083]
With drift-trend: df test_statistic 0.297403201879
The result does not match that of Danao's published result of -2.436537!! Testing this on R we get the following summary:
> fit <-lm( dely ~ 1 + year + ylag +dely1 ) > summary(fit)
Call:
lm(formula = dely ~ 1 + year + ylag + dely1)
Residuals:
Min 1Q Median 3Q Max
-192.206 -32.704 3.383 39.123 137.138
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 466.83679 8182.73019 0.057 0.954781
year -0.22813 4.23901 -0.054 0.957342
ylag 0.02091 0.07031 0.297 0.767660
dely1 0.50374 0.12686 3.971 0.000282 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 68.08 on 41 degrees of freedom
Multiple R-squared: 0.3353, Adjusted R-squared: 0.2866
F-statistic: 6.893 on 3 and 41 DF, p-value: 0.000728
Our slope(beta) and se(std.error) for ylag matches that of R. This is contrast with Danao's published output:
Std.Error
(Intercept) 124.1005 49.00592
@TREND(1940) 10.71098 4.127310
GNPR(-1) -0.168378 0.969105
D(GNPR(-1) 0.452307 0.140006
and with computed ADF test statistic of -2,436537.(Read p.331). We will find out if there are errors.
Here is the complete Python program to compute the test statistic of the DF and ADF test.
""" file adftest.py author Dr. Ernesto Adorio U.P. Clark, Pampanga ernesto.adorio@gmail.com Version 0.0.1 June 20, 2010 """ from lm import * def adftest(Y, t=None, withintercept=False, maxp=0): """ Description Computes Dickey-Fuller Unit test values. Arguments Y - input vector t - time vector, None if no trend is to be incorporated. withintercept - true if there is a constant in model. maxp maximum lag for differences. Return tau, coefficient of lagged variable Y(-1). se(tau), standard error of coefficient. n- number of data points used. """ # Form matrix of columns corresponding to delY-1 to delY-p # Be simple! n = len(Y) delyp = [0]* (maxp+1) delyp[0] = [Y[i]-Y[i-1] for i in range(1, n)] for p in range(1, maxp+1): delyp[p] = [delyp[p-1][i] - delyp[p-1][i-1] for i in range(1,len(delyp[p-1]))] # Adjust differences to same length. for p in range(0, maxp+1): delyp[p] = delyp[p][maxp-p:] # Form the X design matrix. X = [] for i in range(maxp + 1, n): x = [] if withintercept: x.append(1) if t: x.append(t[i]) x.append(Y[i-1]) for p in range(1, maxp+1): x.append(delyp[p][i-maxp-1]) X.append(x) mataugprint(X, delyp[0]) fit = ols( X, delyp[0]) ylagindex = 0 if withintercept: ylagindex+=1 if t: ylagindex+=1 print "yindex = ", ylagindex tau = fit.betas[ylagindex] se = fit.sdbetas[ylagindex] print fit.betas print fit.sdbetas return tau, se, len(X) if __name__ == "__main__": data = """YEAR GNPR PCER PGNP 1940 772.9 502.6 0.1299 1941 909.4 531.1 0.138003 1942 1080.3 527.6 0.147181 1943 1276.2 539.9 0.150995 1944 1380.6 557.1 0.153122 1945 1354.8 592.7 0.157514 1946 1096.9 655 0.193637 1947 1066.7 666.6 0.220493 1948 1108.7 681.8 0.235952 1949 1109 695.4 0.234806 1950 1203.7 733.2 0.239512 1951 1328.2 748.7 0.251016 1952 1380 771.4 0.254783 1953 1435.3 802.5 0.258901 1954 1416.2 822.7 0.263028 1955 1494.9 873.8 0.271523 1956 1525.6 899.8 0.280676 1957 1551.1 919.7 0.290761 1958 1539.2 932.9 0.296778 1959 1629.1 979.4 0.30434 1960 1665.3 1005.1 0.309434 1961 1708.7 1025.2 0.312401 1962 1799.4 1069 0.319329 1963 1873.3 1108.4 0.323974 1964 1973.3 1170.6 0.329296 1965 2087.6 1236.4 0.337756 1966 2208.3 1298.9 0.34959 1967 2271.4 1337.7 0.359426 1968 2365.6 1405.9 0.377367 1969 2423.3 1456.7 0.397763 1970 2416.2 1492 0.420288 1971 2484.8 1538.8 0.443778 1972 2608.5 1621.9 0.464942 1973 2744.1 1689.6 0.495354 1974 2729.3 1674 0.539626 1975 2695 1711.9 0.593098 1976 2826.7 1803.9 0.6307 1977 2958.6 1883.8 0.672784 1978 3115.2 1961 0.722169 1979 3192.4 2004.4 0.785678 1980 3187.1 2000.4 0.857206 1981 3248.8 2024.2 0.939608 1982 3166 2050.7 1 1983 3279.1 2146 1.038608 1984 3489.9 2246.3 1.078827 1985 3585.2 2324.5 1.115168 1986 3676.5 2418.6 1.144703 """ header = None X= [] for row in data.split("\n"): xvars = row.split() if len(xvars) == 4: if header is None: header = xvars else: cases = [float(v) for v in xvars] X.append(cases) TIME = [x[0] for x in X] GNPR = [x[1] for x in X] tau, se, n = adftest(GNPR, t=TIME, withintercept=True, maxp = 1) print "With drift-trend: df test_statistic", tau/se
We will try to find out why our results do not match with that of Danao when the lag of the differenced response is not zero.
We only saw the following reference after we wrote our code.
Additional Resources: http://econpy.googlecode.com/svn/trunk/pytrix/unitroot.py
No comments:
Post a Comment