Tuesday, June 1, 2010

Python, Econometrics: The Koyck geometrically distributed lag model, updated with Leviatan's IV method.

Version 0.0.1 jun 2
Version 0.0.2 jun 6 added Leviatan's instrumental variable method

This blog post will undergo massive revisions. Use the code at your own risk.

Consider the model with infinite number of lags:
$$Y_t = \alpha + \sum_{i= 0}^{\infty} \beta_i X_{t - 1}$$.
This model has an infinite number of unknown coefficients. A simplification
was obtained by Koyck who assumed that $$\beta _i = \beta_0 \lambda^i.$$
are geometrically distributed . An equivalent formulation for this case is
given by $$Y_t =\alpha (1 -\lambda) + \beta_0 X_t + -\lambda Y_{t-1} + (u_t - \lambda u_{t-1}) + v_t$$

and we need only to determine at most only three (?!) values: $$\alpha, \lambda$$ and \$\beta_0.$$The model matrix with a constant column is then$$Y = 1 + X|Y_{t-1}$$. Let c_0, c_1, c_2$$ be the computed parameters for this model.
Then the original parameters are obtained by the following:$\begin{array}{l}\lambda = c_2\\ \beta_0 = c_1\\ \alpha = c_0 / (1 - c_2).\\ \end{array}$

The question is, the linear model rhs contains a dependent, endogenous, stochastic lagged
variable $$Y_{t-1}$$ which violates the assumptions of least squares method.
The output of applying the least squares method will result in biased, inconsistent(when the errors are not normally distributed). Most results obtained by blindly applying least squares
exhibit high multicollinearity and serially correlated errors.

We present a Python function to generate the Koyck linear model given the
input endogenous stochastic Y vector and the independent exogenous vector X.

import random
from matlib import matzero,  matInsertConstCol, matprint

def koyckZmatrix(Y, X,  withconstcol = True):
n = len(Y)
if n != len(X):
raise ValueError, "in koyckmatrix():Y,X have different lengths!"

Z = matzero(n-1, 2)
for i in range(n-1):    # one less due to lag.
Z[i][0],  Z[i][1] = X[i+1], Y[i]
if withconstcol:
matInsertConstCol(Z,  0,  1)
return Y[1:],  Z

Y = [random.random() for i in range(10)]
X = range(10)

print "The input Y, X vectors"
for i in range(len(X)):
print i,  Y[i],  X[i]

Y,  Z = koyckZmatrix(Y, X)
print "The Input Koyck Y and Z matrix."
matprint(Z)


In spite of the results being biased and inconsistent, we can use the OLs procedure as a generator of starting values to a better scheme like Maximum Likelihood estimation.

A secon approach to solving the Koyck geometrically lagged model is the method of instrumental variables. We define an instrumental variable which is NOT correlated with the the error term of the Koyck model.

Leviatan developed the following normal equations, obtained by choosing $$X_{t-1}$$ as a proxy variable. This system is shown in page 678 of Gujarati's text on page 678. $$\begin{array}{ll}\sum {Y_t} &= n \hat{\alpha_0} + \hat{\alpha_1} \sum {X_t} + \alpha_2 \sum Y_{t-1} \\ \sum {Y_t X_t} &= \hat{\alpha_0}\sum {X_t} + \hat{\alpha_1}\sum{X_t^2} + \hat{\alpha_2} Y_{t-1}X_t \end{array}$$
$$\sum Y_t X_{t-1} &= \hat{\alpha_0}\sum X_{t-1} +\hat{\alpha_1} \sum X_{t}X_{t-1} + \hat{\alpha_2}\sum Y_{t-1} X_{t-1}$$

The following Python code sets-up the normal equations and solve for the coefficients: $$\hat{\alpha_0}, \hat{\alpha_1}, \hat{\alpha_2}.$$

from scipy import linalg

def leviatan(Y, X):
"""
Solves the Koyck distributed lag system by instrumental variable technique.

Reference: pp.678-679, Gujarati, 4e, "Basic Econometrics"

"""
#RHS vector.
b=[0.0] * 3
b[0] = sum(Y[1:])
b[1] = sum([y * x for y, x in zip(Y[1:],X[1:])])
b[2] = sum([y * x for y, x in zip(Y[1:], X[:-1])])

#Least squares matrix.
A = matzero(3,3)
A[0][0] = len(Y)-1
A[0][1] = A[1][0] = sum(X[:-1])
A[0][2] = sum(Y[1:])

A[1][0] = sum(X[:-1])
A[1][1] = sum([x*x for x in X[1:]])
A[1][2] = sum([y* x for y, x in zip(Y[:-1], X[1:])])

A[2][0] = sum ([x for  x in X[:-1]])
A[2][1] = sum ([x * xtm1        for x , xtm1        in zip(X[1:],  X[:-1])])
A[2][2] = sum ([ ytm1 * xtm1 for ytm1,  xtm1 in zip(Y[:-1], X[:-1])])

#Call linear solver
C = linalg.solve (A,  b)
return C


The code, unfortunately, has not been checked on an example. We will go back to this later and we add that the estimators returned may be consistent but still high in multicollinearity and low in aefficiency.

We used the linear solver in scipy. One can also use the code for lm.py which we described in Canned linear solver

We will present codes for the Maximum Likelihood method later...

Duh, checking the latex...
TBC