$$Y = \alpha_1 +\alpha_2 X_{2t} +\alpha_3 X_{3t} + \log X_{4t} + u_t$$
versus
$$\log Y = \beta_1 +\beta_2 \log X_{2t} +\beta_3 \log X_{3t} +\beta_4 \log X_{4t} + u_t$$
The MWD test, named after McKinnon, White and Davidson allows us to use ordinary least squares to help choose the appropriate model to fit the data.
Our main reference is Gujarati, 4e, pp. 280-282. The outline of the method may be seen in our python code, though if you are a beginner to Python, it will take time to understand.
Our mwdtest expects only the matrix of regressors,X, the first column of which consists of all ones (a constant column) and the response variable Y. All values of the model matrix and Y must
be greater than 0! since the logarithmic function has for its domain, the positive real numbers.
""" file mwdtest.py author Dr. Ernesto P. Adorio U.P. Clarkfield ernesto . adorio @ gmail . com description Performs the McKinnon, White and Davidson test for testing suitability between competing linear vs. log-linear models. version 0.0.1 2010.06.21 initial """ from matlib import * from lm import * from dataframe import * from math import * def mwdtest(X, Y, ztol): """ Arguments. X - matrix of regressors including constant column of 1's. Y - output vector. ztol - comparison with zero tolerance. Output: fitted linear and loglinear objects. Warning: The input X matrix may be modified by this routine. """ # Is first column of X all ones? allones = True for row in X: if (row[0] - 1) > ztol: allones = False raise ValueError, "mwd():expects first column of all ones" #linear model print "@@@ debug: linear fit model." mataugprint(X,Y) linearfit1 = ols(X,Y) Yhat = linearfit1.Yhat #loglinear model logX =[] for row in X: a = [1.0] a.extend([log(x) for x in row[1:]]) logX.append(a) logY = [log(y) for y in Y] print "@@@ debug:log linear model" mataugprint(logX, logY) logfit1 = ols(logX, logY) logYhat = logfit1.Yhat Z1 = [log(yhat) - logyhat for yhat, logyhat in zip(Yhat, logYhat)] for x, z1 in zip(X, Z1): x.append(z1) linearfit2 = ols(X, Y) Z2 = [exp(logy) - y for logy,y in zip(logYhat,Yhat)] for x, z2 in zip (logX, Z2): x.append(z2) logfit2 = ols(logX, logY) return linearfit2, logfit2 def Test(X,Y, ztol): lfit, logfit = mwdtest(X,Y, ztol) print "testing linear fit:" for i, (t, beta,pvalue) in enumerate(zip(lfit.tbetas, lfit.betas, lfit.pvalues)): print "beta(", i,")=", beta,"t=", t, "pvalue=", pvalue print "F=", lfit.F, "R^2 = ", lfit.R2 print print "testing log-linear fit:" for i, (t, beta,pvalue) in enumerate(zip(logfit.tbetas, logfit.betas, logfit.pvalues)): print "beta(", i,")=", beta,"t=", t, "pvalue=", pvalue print "F=", logfit.F, "R^2 = ", logfit.R2 print if __name__ == "__main__": #Rose demand data from Gujarati. data = """ # "Y" "X2" "X3" "X4" "X5" # "Y" = Quantity of Roses Sold, Dozens # "X2" = Average Wholesale Price of Roses, $ Per Dozen # "X3" = Average Wholesale Price of Carnations, $ Per Dozen # "X4" = Average Weekly Family Disposable Income, $ Per Week # "X5" = Trend Variable: 1, 2, 3, ... Y X2 X3 X4 X5 11484 2.26 3.49 158.11 1 9348 2.54 2.85 173.36 2 8429 3.07 4.06 165.26 3 10079 2.91 3.64 172.92 4 9240 2.73 3.21 178.46 5 8862 2.77 3.66 198.62 6 6216 3.59 3.76 186.28 7 8253 3.23 3.49 188.98 8 8038 2.60 3.13 180.49 9 7476 2.89 3.20 183.33 10 5911 3.77 3.65 181.87 11 7950 3.64 3.60 185.00 12 6134 2.82 2.94 184.00 13 5868 2.96 3.12 188.20 14 3160 4.24 3.58 175.67 15 5872 3.69 3.53 188.00 16 """ D = dataframe(data, header=True) X = matSelectCols(D.data, [1,2]) #prepend a 1 column to X model matrix. for x in X: x.insert(0, 1.0) Y = [d[0] for d in D.data] # See input data print "Input matrix." mataugprint(X, Y) ztol = 1.0e-5 Test(X,Y, ztol)
Let us compare the results with Gujarati's output in pp.281-282.
python mwdtest.py
Input matrix.
1.0000 2.2600 3.4900 | 11484.0000
1.0000 2.5400 2.8500 | 9348.0000
1.0000 3.0700 4.0600 | 8429.0000
1.0000 2.9100 3.6400 | 10079.0000
1.0000 2.7300 3.2100 | 9240.0000
1.0000 2.7700 3.6600 | 8862.0000
1.0000 3.5900 3.7600 | 6216.0000
1.0000 3.2300 3.4900 | 8253.0000
1.0000 2.6000 3.1300 | 8038.0000
1.0000 2.8900 3.2000 | 7476.0000
1.0000 3.7700 3.6500 | 5911.0000
1.0000 3.6400 3.6000 | 7950.0000
1.0000 2.8200 2.9400 | 6134.0000
1.0000 2.9600 3.1200 | 5868.0000
1.0000 4.2400 3.5800 | 3160.0000
1.0000 3.6900 3.5300 | 5872.0000
@@@ debug: linear fit model.
1.0000 2.2600 3.4900 | 11484.0000
1.0000 2.5400 2.8500 | 9348.0000
1.0000 3.0700 4.0600 | 8429.0000
1.0000 2.9100 3.6400 | 10079.0000
1.0000 2.7300 3.2100 | 9240.0000
1.0000 2.7700 3.6600 | 8862.0000
1.0000 3.5900 3.7600 | 6216.0000
1.0000 3.2300 3.4900 | 8253.0000
1.0000 2.6000 3.1300 | 8038.0000
1.0000 2.8900 3.2000 | 7476.0000
1.0000 3.7700 3.6500 | 5911.0000
1.0000 3.6400 3.6000 | 7950.0000
1.0000 2.8200 2.9400 | 6134.0000
1.0000 2.9600 3.1200 | 5868.0000
1.0000 4.2400 3.5800 | 3160.0000
1.0000 3.6900 3.5300 | 5872.0000
@@@ debug:log linear model
1.0000 0.8154 1.2499 | 9.3487
1.0000 0.9322 1.0473 | 9.1429
1.0000 1.1217 1.4012 | 9.0394
1.0000 1.0682 1.2920 | 9.2182
1.0000 1.0043 1.1663 | 9.1313
1.0000 1.0188 1.2975 | 9.0895
1.0000 1.2782 1.3244 | 8.7349
1.0000 1.1725 1.2499 | 9.0183
1.0000 0.9555 1.1410 | 8.9919
1.0000 1.0613 1.1632 | 8.9195
1.0000 1.3271 1.2947 | 8.6846
1.0000 1.2920 1.2809 | 8.9809
1.0000 1.0367 1.0784 | 8.7216
1.0000 1.0852 1.1378 | 8.6773
1.0000 1.4446 1.2754 | 8.0583
1.0000 1.3056 1.2613 | 8.6780
testing linear fit:
beta( 0 )= 9727.56629203 t= 3.21783350732 pvalue= 0.191817531244
beta( 1 )= -3783.06271822 t= -6.33375547952 pvalue= 0.0996893146221
beta( 2 )= 2817.71671158 t= 2.83663692683 pvalue= 0.215767665862
beta( 3 )= 85.3186515319 t= 0.0207244164541 pvalue= 0.986808315114
F= 13.4410359246 R^2 = 0.770655824729
testing log-linear fit:
beta( 0 )= 9.14861065241 t= 17.0825302798 pvalue= 0.0372248169624
beta( 1 )= -1.96990658676 t= -6.41899335191 pvalue= 0.0983866546526
beta( 2 )= 1.58915441923 t= 3.07287928323 pvalue= 0.200292472074
beta( 3 )= -0.00012938402573 t= -1.6612759925 pvalue= 0.344952324446
F= 14.1664502895 R^2 = 0.779813891197
The important values are for the last coefficient. Except for the published pvalue for the log-linear fit of 0.1225 vs our own 0.34495, all the other value match.
REMINDER TO SELF: The lm code needs to be checked for the right computation of the p-values.
No comments:
Post a Comment