Doc Ernie's Adventures in the Web: Python

Showing posts with label Python. Show all posts

Sunday, August 29, 2010

Generating a numbered list of passers for PRC exams.

The PRC usually releases the results in pdf file. We don't criticise them for this as pdf is indeed portable. Much better if they also offer a bare plain text file with numbered list of names of passers. But who are we to complain?

Step 1. Install pdftotext if not available in you Linux system.

Step 2. Download the pdf file of passers from PRC site prc.gov.ph
Rename it to july-2010-NLE.pdf

Step 3. Convert the file to txt file: pdftotext july-2010-NLE.pdf > july-2010-nle-prc.txt

Step 4. Process the text file to output the numbered list using the following program which writes to standard output.

Here is a Python program, easily modified for other professions.

D= open("july-2010-nle-prc.dat").read()
nolastname = []
pnum= 0
for i, line in enumerate(D.split("\n")):
   line = line.strip()
   if not line:
      continue
   
   getnewline = False 
   for word in ["NURSE", "Held","Page","Page:","Released","Roll","N a m e", "Seq."]:
       if line.startswith(word):
           getnewline=True
           break
   if getnewline: continue

   firstword =""
   secondword=""
   if "," in line:
     tokens = line.split(",")
     if len(tokens) >=1:
       firstword=tokens[0]
     if len(tokens)>= 2:
       secondword = tokens[1]
   else:
     firstword = line.split()[0]
   if firstword.isdigit():
      continue
   if line.startswith("NOTHING"):
       break
       
   if line.endswith(",") or \
      line.endswith("DE") or \
      line.endswith("De") or \
      line.endswith("Dela") or \
      line.endswith("DELA") or \
      line.endswith("De la") or \
      line.endswith("DE LA"):
      nolastname.append(line)
      continue
   elif "," not in line:
      if nolastname:
         line = nolastname[0] + line
         del nolastname[0]
   pnum+=1
   print pnum, line

The problem is that I got only 37573 names of passers instead of the official 37679 passers! It will involve a painful page by page check. But we will do it when we have the time. So succeeding PRC data files can be processed or transformed quickly into other formats.

We discover that the pdftotext converter chokes on the following part of the pdf file.

BORNEA, MAY ANN SAGA
BORNEA, PHILIP KLARC MANGIBUNONG
BORNEO, BERNADETTE JOSEPHINE MARIANNE
BORRAL, YASMIN INTERIOR
BORRES, ALLISON CABESADA
BORRES, CAMILLE CHRISTINE ARCILLA
BORRES, DANIELLE JOY BERGONIO
BORRES, JEAN LAPINIG
BORRICO, CARLO BRYAN CASTRO

MONTEZA

Roll of Successful Examinees in the
NURSE LICENSURE EXAMINATION
Held on JULY 3 & 4, 2010
Released on AUGUST 25, 2010

Notice the MONTEZA by itself? It should belong to BORNEO, BERNADETTE JOSEPHINE MARIANNE MONTEZA.
When I tried to cut the text from the pdf file itself, it showed a separator for MONTEZA.
Here is the original pdf file from PRC and with the MONTEZA name.

So it is pdf2text converter that is at fault. On the other hand, Okular's export to text outputted a text file that that included the Monteza as part of Borneo's name. Thus we cannot fully automate the process, unless we study how to use the expect program.

Sunday, August 15, 2010

Statistics and Python: Creating a frequency counts array for data.

Given a list of recorded data X, a list of upper class marks UCM, we generate a list of frequency counts for the data such that each x in X is counted in the freq[i] if x <= ucm[i] where i is as high as possible.

Our frequency counting function automatically appends an excess entry whenever x is greater than the last upper class mark.It is considered a bad design if this happens, as it may denote data with outliers and the compressed frequency count table may not truly represent the original data.

Later we will handle the way to design the upper class marks list.We hasten to add that the introduction of computers has allowed statistical analysis of the ORIGINAL data even in the millions and has deprecated the manual frequency count table. Nevertheless, for historical curiosity and for teachers who insist that students know how to create such tables, we will write Python programs as aids in such laborious constructions.

Here is our frequency counting Python code. It expects the original list of data X and a list of upper class marks. Each upper class mark represent a class interval (ucm[i-1], ucm[i]]. A data point x falls in such class whenever $$ucm[i-1] < x \le ucm[i]$$. For good results, all data points must fall in some class interval. The minimum value of X should fall withing (ucm[0]-cw,cm[0]] where cw is the class width = ucm[1] - ucm[0] and the maximum value of X must fall within (ucm[n-2], ucm[n-1]]. Recall that Python lists uses zero based indexing.

"""
"""
file    distrib.py
author  Ernesto P. Adorio
        ernesto.adorio
version 0.0.1  2010.08.16  first version for counting only.
                           this will be added to xfreqdescstats.py after
                           thorough testing.
"""

def distrib(X, ucm):
    """
    Given upper class marks, and input array of values X,
    returns a table of frequencies, which may be longer than ucm.
    """
    n = len(ucm)
    print "len(ucm) = ", n
    freq = [0] *(n+1)
    for x in X:
        if x < ucm[0]:     
           freq[0]+= 1
        elif x > ucm[n-1]: # excess frequencies!
           freq[n]+= 1
        else:
          i = 0
          while (x > ucm[i]):
             i += 1
          freq[i] += 1
    return freq

def disttable(ucm, freq):
    """
    Creates a  frequency table, with ucm, class representative x, frequency count, relative frequency and cumulative relative frequency.
WARNING:  
    """ 
    # Adjust for excess frequency counts.
    n    = len(ucm)
    cw   = ucm[1] - ucm[0]
    xrep = 0.5 * (ucm[1] + ucm[0])-cw # class rep of first class interval.
    if len(freq) > n and freq[n] > 0:
       ucm =array(ucm).append(ucm[n-1] + cw)
       n = n + 1
    totfreq = sum(freq)

    T = [[0.0]*5 for i in range(n)]
    crf = 0.0
    print "totfreq=", totfreq
    for i in range(n):
        #  class mark, x
        rf = freq[i]/float(totfreq)
        crf += rf
        T[i][0], T[i][1], T[i][2], T[i][3],T[i][4] = ucm[i], xrep, freq[i], rf, crf

        xrep += cw
    return T
 

if __name__ == "__main__":
   from scipy import *
   import scipy.stats as stat

   X = stat.norm.rvs(size= 100)
   ucm = arange(-3.0, 3.1, 0.25)
   freq = distrib(X, ucm)
   T = disttable(ucm, freq)
   print freq
   print "Frequency distribution table for example:"
   for row in T:
      print row

Here is one run of the program, for 100 generated normal random numbers.

len(ucm) = 25
totfreq= 100
[0, 0, 0, 0, 0, 0, 2, 4, 4, 7, 6, 10, 8, 9, 8, 12, 6, 9, 7, 2, 2, 2, 1, 1, 0, 0]
Frequency distribution table for example:
[-3.0, -3.125, 0, 0.0, 0.0]
[-2.75, -2.875, 0, 0.0, 0.0]
[-2.5, -2.625, 0, 0.0, 0.0]
[-2.25, -2.375, 0, 0.0, 0.0]
[-2.0, -2.125, 0, 0.0, 0.0]
[-1.75, -1.875, 0, 0.0, 0.0]
[-1.5, -1.625, 2, 0.02, 0.02]
[-1.25, -1.375, 4, 0.040000000000000001, 0.059999999999999998]
[-1.0, -1.125, 4, 0.040000000000000001, 0.10000000000000001]
[-0.75, -0.875, 7, 0.070000000000000007, 0.17000000000000001]
[-0.5, -0.625, 6, 0.059999999999999998, 0.23000000000000001]
[-0.25, -0.375, 10, 0.10000000000000001, 0.33000000000000002]
[0.0, -0.125, 8, 0.080000000000000002, 0.41000000000000003]
[0.25, 0.125, 9, 0.089999999999999997, 0.5]
[0.5, 0.375, 8, 0.080000000000000002, 0.57999999999999996]
[0.75, 0.625, 12, 0.12, 0.69999999999999996]
[1.0, 0.875, 6, 0.059999999999999998, 0.76000000000000001]
[1.25, 1.125, 9, 0.089999999999999997, 0.84999999999999998]
[1.5, 1.375, 7, 0.070000000000000007, 0.91999999999999993]
[1.75, 1.625, 2, 0.02, 0.93999999999999995]
[2.0, 1.875, 2, 0.02, 0.95999999999999996]
[2.25, 2.125, 2, 0.02, 0.97999999999999998]
[2.5, 2.375, 1, 0.01, 0.98999999999999999]
[2.75, 2.625, 1, 0.01, 1.0]
[3.0, 2.875, 0, 0.0, 1.0]

The listed arrays are the ucm, class rep, freq, rf, and the crf for each class. They are not pretty printed and it is left to the reader to format the output.

I am wondering why blogger removes our br tags and removes indents from our code!

Python, Statistics: Stem and Leaf Part 4

I am publishing stem and leaf article here. Wordpress (at Digital Explorations) chokes on our latest work! (fixed) Our old stem and leaf code has been updated to include frequency counts. Here is the latest incarnation. Here is the latest incarnation, following or last past discussion in Part 3

"""
file     stem.py
author   Ernesto Adorio, Ph.D.
         UPDEPP, UP at Clarkfield
         ernesto.adorio@gmail.com
version  0.0.4   2010.08.16 with freq. counts on stems.
"""


from   math import *


def stemleafpairs(X, stempos= 0, leafwidth=1):
    """
    X - data array
    stempos   - position of last digit of stem,
                  from decimal point.
    leafwidth - number of digits in leaves.

    Return value:
     a list of stem-leaf pairs
    """
    stem10 = pow(10, stempos)
    leaf10 = 10**leafwidth

    output = []
    for x in X:
        y = x
        if stempos > 0:
           leaf, stem = modf (x * stem10)
        else:
           leaf, stem = modf(x/ stem10)

        leaf = abs(leaf * leaf10)
        #print x, int(stem), round(leaf) # decomment after testing!
        output.append((int(stem), int(leaf) ))
    return output


def stemleafplot(Pairs,scale=1,sortQ=False,sep = "", stemwidth=4,leafwidth=1, withcounts= 0):
    """
    Given a list of Pairs (stem, leaf), prints it out.
    
    Arguments:
     Pairs     - liest of (stem-leaf) pairs
     scale     - 1 for standard stems, 2 with upper half-stems.
     sortQ     - True if data is to be sorted and false if unsorted.
     stemwidth - printing width of stem
     leafwidth - number of digits in display of leaves.
     withcounts - 0 - no count,
                  1 - simple count.
                  2 - cumulative count.
    """
    if sortQ:
        Pairs.sort()

    minstem, minleaf = min(Pairs)
    print "minstem=", minstem
    maxstem, maxleaf = max(Pairs)
    print "maxstem=", maxstem

    # Transform list into a dictionary.
    stemleaves = {}
    if scale == 1:
      for (stem, leaf) in Pairs:
       stem = str(stem)
       if stem not in stemleaves:
          stemleaves[stem]= [leaf]
       else:
          stemleaves[stem].append(leaf)
    elif scale == 2:
       half = int("5" + "0" * (leafwidth-1)) 
       for (stem, leaf) in Pairs:
           stem = str(stem)
           if leaf >= half:
              stem = stem + "*"
           if stem in stemleaves:
               stemleaves[stem].append(leaf)
           else:
               stemleaves[stem] = [leaf]

    # the actual printing of output.
    totcount = 0
    for i in range(minstem, maxstem+1):
        key = str(i)
        count = 0
        if key in stemleaves:
           count = len(stemleaves[key])
        
        totcount += count
        if withcounts == 0:
           print "%*s|" %(stemwidth,key),
        elif withcounts == 1:
           print "%*s(%4d)|" %(stemwidth,key, count),
        elif withcounts == 2:
           print "%*s(%4d)|" %(stemwidth,key, totcount),
        if key in stemleaves:
           leaves = ""
           for leaf in stemleaves[key]:
               leaves += "%0*d" %(leafwidth, int(leaf))+sep
           print leaves
        else:
           print 

        if scale == 2:
           key = key + "*"
           print "%*s|" % (stemwidth, key),
           if key in stemleaves:
               leaves = ""
               for leaf in stemleaves[key]:
                   leaves += "%0*d" %(leafwidth, int(leaf))+sep
               print leaves
           else:
               print

def stem(X, stempos=1, scale=1,sortQ=False,sep = "", stemwidth=4,leafwidth=1, withcounts= 2):
    Pairs = stemleafpairs (X, stempos, leafwidth)
    stemleafplot(Pairs,scale=scale,sortQ=sortQ,sep = sep, stemwidth=stemwidth,leafwidth=leafwidth, withcounts= withcounts)

if __name__ =="__main__":    
   import scipy.stats as stat
   X = stat.norm.rvs(size=1000)
   stem(X)

When the above program runs with default arguments, it outputs the following:

-34( 1)| 6
-33( 1)|
-32( 2)| 1
-31( 2)|
-30( 2)|
-29( 2)|
-28( 4)| 08
-27( 4)|
-26( 5)| 4
-25( 7)| 77
-24( 10)| 422
-23( 13)| 260
-22( 19)| 694086
-21( 21)| 75
-20( 27)| 163143
-19( 28)| 5
-18( 33)| 36358
-17( 43)| 6893302148
-16( 52)| 322560070
-15( 59)| 1331906
-14( 70)| 49124950854
-13( 91)| 401253263854452827628
-12( 109)| 572452780211853243
-11( 130)| 077133985997876037339
-10( 150)| 04430092495280373731
-9( 175)| 4293411168169465118423909
-8( 190)| 818880143091624
-7( 224)| 5161410514534334310234657019034655
-6( 253)| 59607404249373790654201441626
-5( 291)| 53836210159491267372596227328789414461
-4( 329)| 52652987680976425075914021503791325718
-3( 354)| 2038247991214802390377294
-2( 396)| 520316774771912137458409442885171857745958
-1( 443)| 51714559553431520132437128841770795660895852361
0( 529)| 44036499255722225220796139361838505887904141332066633213167414864454127175508117523351
1( 569)| 5590562775582330092803462031362997581151
2( 604)| 37111480345836150687428470279322769
3( 652)| 446677471660096619850750047910077556028146374802
4( 687)| 27892444604915179645262316905991017
5( 713)| 34250730580367963818408036
6( 752)| 616035247612979989663910648454898469322
7( 791)| 548156635043052102545471356231163311052
8( 813)| 6445159984577578411322
9( 838)| 2022629067693963656545664
10( 863)| 5431874466621296418005519
11( 881)| 710747499048405915
12( 901)| 46507445580872987358
13( 917)| 1622476524280308
14( 929)| 335731472657
15( 944)| 669096613933089
16( 958)| 26144266812863
17( 968)| 6245893442
18( 972)| 5365
19( 982)| 5123787375
20( 985)| 584
21( 988)| 211
22( 990)| 00
23( 995)| 37063
24( 996)| 7
25( 997)| 7
26( 998)| 4
27( 998)|
28( 999)| 0
29( 999)|
30(1000)| 2

The code looks complicated, I may rewrite it later. I will appreciate someone else to simplify, simplify the code!

Python, Statistics: Computing the lower, mid and upper "hinges" of a sample.

The statistical hinges of a sample are designed for easy manual computation. The mid-hinge of an odd number of data points is the middle value. If the number of data points is even, then the midhinge is the average of the two middle values.

The lower and upper hinges are the midhinges of the left and right subsamples with the midhinge removed, i.e, the sample has been split into two. Assume for example that [1,2,3,4,5,6,7] is the sample. the mid hinge is 4 and the lower hinge is the midhinge of [1,2,3] which is 2 and the upper hinge is the midhinge of [5,6,7] or 6.

On the other hand, for the sample [1,2,3,4,5,6], the midhinge is the mean of 3 and 4 which is (3.5).
Then the lower hinge is the midhinge of the remaining lef subsample[1,2,3] which is 2 and the upper hinge is the midhinge of the remaining right subsample [4, 5,6 ] which is 5.

Here is hinges.py, a Python program for computing the hinges of a sample.

"""
file   hinges.py

author Ernesto P. Adorio
       ernesto.adorio@gmail.com
       UPDEPP(UP at Clarkfield)

version 0.0.1  August 15, 2010
"""

def midhinge(X):
    n= len(X)
    if n == 1: 
        return X[0]
    if  n == 0:
        return None
    if n % 2== 0: # even number.
        a,  b = n//2 - 1,  n//2
        return (X[a] + X[b])* 0.5
    else:
        m = (n-1)    //2
        return X[m]

def lowerhinge(X):
    n = len(X)
    if n % 2 == 0:
       return midhinge(X[:n//2])
    else:
        return midhinge(X[:(n-1)//2])

def upperhinge(X):
    n = len(X)
    if n %2 == 0:
        return midhinge(X[n//2:])
    else:
        return midhinge(X[n//2+1:])

def fivenum(X):
    """
    Five number summary for sample X. added aug.16, 2010
    """
    return (min(X), lowerhinge(X), midhinge(X), upperhinge(X), max(X))

if __name__== "__main__":
    X = range(1,  11)
    for i in range(10):
       print "X=",  X, ":","n=",len(X) ,"hinges=",  
       print lowerhinge(X),  midhinge(X), upperhinge(X)
       X=X[:-1]

When the above program runs, it outputs:

python hinges.py

python hinges.py
X= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] : n= 10 hinges= 3 5.5 8
X= [1, 2, 3, 4, 5, 6, 7, 8, 9] : n= 9 hinges= 2.5 5 7.5
X= [1, 2, 3, 4, 5, 6, 7, 8] : n= 8 hinges= 2.5 4.5 6.5
X= [1, 2, 3, 4, 5, 6, 7] : n= 7 hinges= 2 4 6
X= [1, 2, 3, 4, 5, 6] : n= 6 hinges= 2 3.5 5
X= [1, 2, 3, 4, 5] : n= 5 hinges= 1.5 3 4.5
X= [1, 2, 3, 4] : n= 4 hinges= 1.5 2.5 3.5
X= [1, 2, 3] : n= 3 hinges= 1 2 3
X= [1, 2] : n= 2 hinges= 1 1.5 2
X= [1] : n= 1 hinges= None 1 None

Don't forget to sort the input data sample for computing its hinges!!

We are confident that the program is correct. But we need independent confirmation.
Also I would appreciate any revisions which will make the above code more compact.

TBD: fivenum will be incorporated in the next version of hinges.py. Done! But the version number
stays at 0.0.1

Descriptive Statistics in Python

I googled for descriptive statistics in Python and I could not find any of our blog in the first page. Am sure there are scattered articles about functions for descriptive statistics and I wanted now a compact module for descriptive statistics. Of course scipy or numpy has mean, var and others, but if you want a small module for teaching statistics, and believe that Python code is actually simple to comprehend, here is descstats.py.

"""
file   descstats.py
author Ernesto P. Adorio
       UPDEPP (U.P. Clarkfield)
desc   basic stats
version 0.0.1  aug. 15, 2010
"""

from quantile import quantile
from math import sqrt,  exp,  log


def mean(X, npasses = 1):
    """
    arithmetic mean of vector X.
    Arguments:
        npasses : 1- no correction term.
                        2- with correction term.
    """
    n = float(len(X))
    avg = sum(X)/n
    if npasses > 1:
      return avg + sum([(x - avg) for x in X])/ n
    return avg
    
def rms(X):
    """
    root mean square average of vector X.
    """
    return sqrt(sum([x*x for x in X])/len(X))

def hm(X):
    """
    harmonic mean of vector X
    """
    if min(X)<= 0.0:
        return None
    return len(X) / sum([1.0/x for x in X])

def gm(X):
    """
    geometric mean of X
    """
    if min(X) <= 0.0:
        return None
    return exp(sum([log(x) for x in X])/len(X))


def pvar(X , npasses = 1):
    """
    Population variance of X, divisor is n.
    """
    sxsqr = sum([x*x for x in X])
    xbar  = mean(X, npasses)
    n = len(X)
    if npasses == 2:
      return sum([(x - xbar)**2 for x in X])/n
    else:
      return sxsqr/n - (xbar)**2

def svar(X, npasses = 1):
    """
    Sample variance of X, divisor is n-1.
    """
    n = len(X)
    return pvar(X, npasses) * n / (n-1.0)

def pstd(X, npasses = 1):
    """
    Population standard deviation.
    """
    return sqrt(pvar(X, npasses))

def sstd(X, npasses = 1):
    """
    Sample standard deviation.
    """
    return sqrt((svar(X, npasses)))

avg = mean
am = mean
pstddev = pstd
psdev = pstd
ssdev = sstd
sstddev = sstd


def median(X):
    """
    Median of  X
    """
    return quantile(X, 0.5)


def mad(X,  xbar = None):
    """
    Mean absolute deviation of X from mean.
    """
    if xbar is None: xbar = mean(X)
    return float(sum([abs(x-xbar) for x in X]))/len(X)
                    
madmean = mad

def madmed(X,  xmed = None):
    """
    Mean absolute deviation of X from median.
    """
    if xmed is None: xmed = median(X)
    return float(sum([abs(x- xmed) for x in X]))/len(X)
                    
def Q1(X):
    """
    First quartile of X
    """
    return quantile(X, 0.25)

def Q3(X):
    """
    Third quartile of X
    """
    return quantile(X, 0.75)

def midhinge(X):
    """
    average of first and third quartile of X. 
    """
    return 0.5 * (Q1(X) + Q2(X))

def trimean(X):
    return 0.25 * (Q1(X) + 2 * median(X) + Q3(X))
def IQR(X):
    """
    Interquartile range of X.
    """
    return Q3(X) - Q1(x)

def summary(X):
    """
    Tuple of quartiles of X.
    """
    return (min(X), Q1(X), median(X), Q3(x), max(X))



def skew(X, npasses = 1, xbar=None, sdev = None):
    """
    Skewness of X.
    """
    if xbar is None: xbar = mean(X, npasses)
    if sdev is None: sdev = sstd(X, npasses)
    S = sum([(x - xbar)**3 for x in X])/((len(X) -1.0)* sdev**3)
    return S

def kurt(X, npasses = 1, xbar = None, sdev = None):
    """
    Excess Kurtosis of X.
    """
    if xbar is None: xbar = mean(X, npasses)
    if sdev is None: sdev = sstd(X, npasses)
    K = sum([(x - xbar)**4 for x in X])/((len(X)-1.0) * sdev**4) - 3.0
    return K


if  __name__ == "__main__":
    import scipy.stats as stat
    X = stat.norm.rvs(loc = 5,  scale = 2,  size=10)
    print "mean = ",  mean(X)
    print "rms= ",  rms(X)
    print "hm=", hm(X)
    print "gm=",  gm(X)

    print "min=",  min(X)
    print "Q1 = ",  Q1(X)
    print "median=", median(X)
    print "Q3 = ",  Q3(X)
    print "max=",  max(X)
    print "skew=", skew(X)
    print "kurt =",  kurt(X)

When the above code was run, it printed the following: python descstats.py mean = 4.84676942423 rms= 5.03778343707 hm= 4.32905902716 gm= 4.61426927173 min= 2.0131960908 Q1 = 4.13817781222 median= 4.92637729064 Q3 = 5.47954029648 max= 7.31749490871 skew= -0.233219987989 kurt = -0.280050258646 The quantile.py may be found in the article Statistics: Computing quantiles with Python.

TBD. lower and upper hinges.Some authors define lower and upper hinges as synonymous with Q1 and Q3 respectively but we will check with the more authoritative R software later.

Pretty printing a vector in Python

Users of R gets a nicely aligned rectangular display when printing a vector to the console. Here is a Python function to pretty print a vector of floating point real numbers.

def maxdigits(n):
    #number of decimal digits in positive integer n.
    n = abs(n) # default behaviour for negative numbers.
    d = 0
    while n:
        n = n // 10
        d += 1
    return max(d,  1)

def vecprint(X, cols = 10, printwidth=10,  decimals = 7, withcount = True, 
             offset = 0):
    """
    Pretty prints a vector with cols elements per row.
    Format for each number is  printwidth wide with decimals.
    Arguments
        X                  input vector
        cols              number of columns in each to print
        printwidth    number of characters per element
        decimals      number of trailing fractional digits after decimal point.
        withcount     display starting index per row.
        offset            0 for Python, 1 for Fortran or R. 
    """
    L = len(X)
    d = maxdigits(L)
    startcount =0
    while L > 0:
        if withcount:
            print "[%*d]" % (d,  startcount+offset), 
        for j in range(min(cols,  L)):
            print "%*.*f"  % (printwidth,  decimals,  X[startcount+ j]), 
        print
        startcount  += cols
        L = L - cols

if name=="__main__":        
   import scipy.stats as stat
   X = stat.norm.rvs(0, 10, size= 101)

   vecprint (X,  cols = 10, printwidth = 10,  decimals = 6, withcount = True,  offset= 1)

When the above program runs, it prints out

python test.py
[  1] -12.274544  24.962005   0.898519   8.298623  -5.206765  11.369435   2.844898   5.382779   8.871070 -10.778256
[ 11]   8.993302   0.122253 -12.870449   6.865899  -0.013347  -6.294830   6.737924  -3.464933 -22.138001  -5.830818
[ 21]   4.333220   1.212900 -10.323043  -2.362759  -4.030961  -8.217033   5.081078  -1.496543  12.623560 -13.239969
[ 31]  19.963765 -27.012378   2.662621   8.417971 -11.408826 -11.900326  13.602873  -1.835010 -13.824051   1.494528
[ 41]  16.184007  -8.594634  12.703305  -3.304798   2.455711  -5.566551  -7.778173   5.475580  -4.503368  -5.652534
[ 51]  -9.810076 -13.494154  47.439976   5.627506  -1.501500 -17.052534   1.487389   8.717306   1.645366 -22.285412
[ 61]   8.923515 -14.373864  -4.206617 -13.933159   0.142266   8.084607 -16.791519   3.783785  18.676505  -6.416143
[ 71]  -1.453329  -6.619796  10.727761   1.809717   8.193551   9.959034  -3.531820 -10.767832   4.450496  -4.931875
[ 81]   1.994453   7.415965   6.156138   8.437512   3.415086   9.394266  27.906924 -16.393906 -10.795677   1.990537
[ 91] -16.822923   1.512371  10.182129  10.210412  -6.038684   7.556653   5.627985   8.988172  19.519813  -4.976217
[101]  11.222303

Tuesday, July 6, 2010

July 6, 2010 Topblogs/education blogs ranking.

This is a corrected report of the Topblog rankings for education sites. Hope we can now concentrate in writing more of varied interesting topics from now on, so we go up the charts!

Today	Prev	Chg	Blog	Behind	Google PR	Alexa
1(9,837)	1(4075)	+0(5762)	Thinking Made Easy	0	2	150937
2(7,841)	2(3484)	+0(4357)	Coolbuster - The Power of Information	1996	3	153897
3(3,428)	3(1255)	+0(2173)	Civil Service and PRC Board Exam Results	4413	2	1115767
4(3,028)	4(1244)	+0(1784)	NurseReview.Org	400	4	484278
5(2,422)	5(1035)	+0(1387)	PRC Board Exam Results	606	1	5079529
6(1,712)	6(909)	+0(803)	Smackblog	710	2	18970
7(1,236)	7(635)	+0(601)	Scolex Portal	476	1	719913
8 (963)	10(396)	+2(567)	Maikling Kuwento	273	2	8608462
9(933)	9(426)	+0(507)	jlapitan.com	30	5	935926
10 (759)	14(220)	+4(539)	linkmoko	174	3	1948160
11(758)	11(330)	+0(428)	Philippine Nurse	1	3	1962933
12 (580)	13(222)	+1(358)	Bazics NewsFeeds	178	0	3291374
13(562)	12(279)	-1(283)	Batangueno	18	3	54
14 (478)	15(209)	+1(269)	Ancient Digger	84	6	180804
15 (389)	16(179)	+1(210)	Philippine Nursing Directory	89	1	5797648
16 (357)	17(150)	+1(207)	Pilipinas atbp.	32	1	26837778
17 (343)	19(133)	+2(210)	digital explorations	14	2	609512
18(339)	18(138)	+0(201)	Mathematics and Multimedia	4	2	4
19 (321)	20(130)	+1(191)	NLE Results July 2010	18	1	6772082
20 (292)	21(122)	+1(170)	The Critical Thinker(tm)	29	4	3057049
21 (231)	23(106)	+2(125)	Infophalanx	61	0	14413093
22 (223)	31(81)	+9(142)	School Librarian In Action	8	6	1160957
23 (206)	29(83)	+6(123)	pinoyachievers	17	0	15432503
24 (199)	25(91)	+1(108)	The Philippines Top Universities and colleges	7	0	16754746
25 (197)	26(90)	+1(107)	Hibang	2	3	9730334
26(194)	24(99)	-2(95)	keeping math simple	3	2	3212730
27 (194)	33(78)	+6(116)	Coffeeholic writes	0	0	26175200
28(187)	28(85)	+0(102)	Architecture Overload	7	3	9757368
29 (184)	30(82)	+1(102)	mentors town	3	2	15990679
30 (181)	34(67)	+4(114)	Amps4u - Learn Share Download	3	0	4
31 (174)	36(58)	+5(116)	rizal college of laguna	7	1	20197694
32(171)	27(85)	-5(86)	Nursing Lectures	3	2	8971456
33(170)	32(78)	-1(92)	Modern Architectural Concepts	1	4	11351570
34(150)	22(107)	-12(43)	LOVESUPERKARMA	20	1	792117
35 (141)	38(57)	+3(84)	Board Exam Review	9	1	9680206
36 (140)	40(44)	+4(96)	College Student in Capiz	1	1	18970
37(140)	35(65)	-2(75)	Christian Homeschooler	0	1	5544993
38 (120)	39(47)	+1(73)	NurseXchange.com	20	1	2948110
39(111)	37(58)	-2(53)	EJPM	9	0	-1
40 (94)	41(37)	+1(57)	ErickJohnCuevas	17	0	-1
41 (92)	44(33)	+3(59)	tips ni katoto	2	1	-1
42 (78)	43(33)	+1(45)	What is a Progressive School?	14	3	8542449
43 (75)	46(30)	+3(45)	time travelling	3	3	13164954
44 (70)	48(29)	+4(41)	Architectural E-Books	5	2	3057158
45(69)	42(34)	-3(35)	PRC Board Exam Results:2	1	1	5079529
46 (66)	52(22)	+6(44)	Post Homework	3	0	19630419
47 (61)	49(23)	+2(38)	The Sh*t Detector	5	0	7497419
48(61)	45(31)	-3(30)	Herbal Medicines	0	0	11130497
49 (60)	58(19)	+9(41)	Green Architecture	1	3	8878246
50(57)	47(29)	-3(28)	Investing Pinoy	3	0	4333967
51(50)	51(23)	+0(27)	Licensure Exam for Teachers \| LET Results	7	2	6411116
52 (50)	57(20)	+5(30)	Review Portal Philippines	0	0	971934
53 (50)	54(22)	+1(28)	CAFA Notes	0	3	14434260
54 (48)	55(20)	+1(28)	Teacher's Pet Corner	2	0	18970
55(47)	53(22)	-2(25)	Nursing Board Exam	1	1	4533084
56(44)	56(20)	+0(24)	A Teacher's Odyssey	3	3	5113779
57(44)	50(23)	-7(21)	littlelabster	0	1	18970
58 (32)	59(17)	+1(15)	Career \| Male Nurse	12	3	672375
59 (31)	67(10)	+8(21)	Philippine Exam Results	1	1	3213934
60(30)	60(13)	+0(17)	Journey of a Filipino Biology Teacher	1	0	-1
61(29)	61(12)	+0(17)	Jojoeland	1	0	8180319
62 (28)	64(11)	+2(17)	Creative Hemispheres	1	0	2656164
63 (27)	70(8)	+7(19)	The Learning Life of Edward Palomo	1	3	4686499
64(25)	62(11)	-2(14)	Doc Ernie's adventure	2	0	1324299
65 (25)	68(9)	+3(16)	Thinking and Teaching	0	0	-1
66(25)	65(11)	-1(14)	Business Matters	0	0	5013826
67(23)	63(11)	-4(12)	A Throb of Knowledge	2	1	8788567
68(21)	66(10)	-2(11)	hudyaka	2	1	-1
69 (21)	94(3)	+25(18)	Intensive English Grammar Course	0	0	-1
70 (20)	83(5)	+13(15)	B.R.I.G.H.T. Art Class	1	3	26228816
71 (18)	84(5)	+13(13)	Filamerian Online Community	2	2	6263428
72(17)	69(9)	-3(8)	PRC Board Exam Results - Board and Bar Passers & Topnotchers	1	0	-1
73 (16)	74(7)	+1(9)	thesisworks	1	0	307
74(16)	72(8)	-2(8)	UNC Placement Service	0	3	26687638
75 (14)	81(5)	+6(9)	Scholar Bank	2	1	9361196
76(14)	76(6)	+0(8)	Padre Faura's Notebook	0	0	-1
77(14)	71(8)	-6(6)	NLE Results November 2009	0	1	-1
78 (13)	89(4)	+11(9)	USJR Law Mentor	1	1	-1
79(13)	79(6)	+0(7)	Geometric Algebra	0	2	26047978
80(12)	73(7)	-7(5)	Education, Development, Progress of the Learner and Personal Growth	1	0	13329741
81 (12)	98(3)	+17(9)	KANIYAH	0	0	307
82 (12)	108(2)	+26(10)	PRC EXAMS	0	1	24588304
83(12)	80(6)	-3(6)	LearnFookien	0	0	17814484
84 (11)	104(2)	+20(9)	Virtual Classroom of Prof. Lardizabal	1	0	364
85 (11)	90(4)	+5(7)	Trabaho Para Sayo	0	0	26840517
86 (11)	91(4)	+5(7)	The ECE Hub	0	1	24091006
87 (10)	101(2)	+14(8)	My Capiznon Bloggers Blog	1	1	46112
88(10)	75(6)	-13(4)	Free Software Explorations	0	0	4195146
89(10)	82(5)	-7(5)	Hectic Capiznon Bloggers 2009	0	0	13017180
90 (10)	92(4)	+2(6)	Super World Records	0	1	-1
91 (10)	93(4)	+2(6)	Accountancy Student	0	0	2735402
92(9)	88(4)	-4(5)	Astrodeus Galaxy	1	1	9492920
93(9)	77(6)	-16(3)	jessamyneology.ü	0	2	9380468
94 (8)	115(1)	+21(7)	WCSAT	1	0	-1
95(8)	95(3)	+0(5)	3 TEST OF SUCCESS	0	0	18970
96 (8)	105(2)	+9(6)	Under the Sampaloc Tree	0	0	2020969
97(8)	96(3)	-1(5)	Poverty is not lack of Money; Poverty is Mismanaged Monet	0	0	18970
98(8)	97(3)	-1(5)	A Lifestyle of Thriftiness	0	0	18970
99 (8)	100(3)	+1(5)	Fscratch mobile	0	0	105322
100(7)	85(4)	-15(3)	EVERYTHING ABOUT EEE	1	0	6634965
101(7)	86(4)	-15(3)	Life, Research, Education	0	0	3770795
102 (7)	103(2)	+1(5)	from the mentor's desk	0	1	-1
103 (7)	128(1)	+25(6)	Call for Research Papers and Other Opportunities (Pinoy Style)	0	0	4996803
104(7)	78(6)	-26(1)	Tula, Maikling Kwento, Atbp	0	1	-1
105(6)	87(4)	-18(2)	Whatever with the SHADOW	1	0	-1
106(6)	99(3)	-7(3)	She Teaches	0	3	603664
107 (5)	120(1)	+13(4)	PASIG High School	1	0	-1
108 (5)	110(2)	+2(3)	dahildyanmynagtxt	0	0	25449246
109(4)	102(2)	-7(2)	Facts and Fiction	1	2	22218344
110 (4)	123(1)	+13(3)	Sagot ni Juan	0	0	-1
111 (4)	124(1)	+13(3)	teacher feature	0	2	-1
112(4)	82(5)	-30(-1)	Hectic Capiznon Bloggers 2009	0	0	13017180
113(4)	107(2)	-6(2)	Gurong Pinay	0	0	-1
114 (4)	126(1)	+12(3)	Manay National High School	0	0	23011396
115 (4)	127(1)	+12(3)	WishLily	0	1	-1
116 (4)	129(1)	+13(3)	www.pinoymediainsider.com	0	0	-1
117(4)	109(2)	-8(2)	QUANTCrunch Training Solutions & Consultancy	0	0	-1
118 (4)	130(1)	+12(3)	HCEC interactive	0	0	-1
119 (4)	131(1)	+12(3)	BASA	0	2	-1
120(4)	111(2)	-9(2)	servedbycairo	0	1	-1
121(4)	New		Green Minded	0	0	25290608
122(4)	114(2)	-8(2)	webblogkoto	0	0	-1
123(3)	116(1)	-7(2)	Emotionsblog	1	1	11380918
124(3)	New		Einstein	0	2	-1
125(3)	118(1)	-7(2)	Homeschooling in the Philippines	0	0	-1
126(3)	119(1)	-7(2)	A Tribute To An Angel	0	0	-1
127(3)	121(1)	-6(2)	APACER (named after my local USB) ala akong maicp eh.Haha :))	0	0	-1
128(3)	106(2)	-22(1)	Jasmin D.Olivare	0	0	-1
129 (3)	132(1)	+3(2)	PRC: Philippine Board Exam Results	0	0	21352667
130 (3)	133(1)	+3(2)	what you see is what you get	0	0	-1
131(3)	New		Tropical Architecture	0	0	-1
132(3)	112(2)	-20(1)	Teacher Ako!	0	0	-1
133 (3)	136(1)	+3(2)	chaxmaster	0	0	-1
134 (3)	137(1)	+3(2)	litsong paksiw	0	0	-1
135(3)	113(2)	-22(1)	elie blogs	0	0	19161563
136(2)	117(1)	-19(1)	Kicking Pinay	1	2	9574182
137(2)	122(1)	-15(1)	Engineering	0	2	14628640
138(2)	New		Homeschooling Yona	0	0	22055098
139(2)	134(1)	-5(1)	Anna's Notebook	0	0	25251242
140(2)	135(1)	-5(1)	college life	0	0	18970
141(2)	New		freebooksonline	0	0	-1
142(2)	138(1)	-4(1)	FScratch™	0	0	364
143(1)	New		Change of Time...Time of Change	1	0	-1
144(1)	New		The Blog Writes	0	0	16661821
145(1)	New		Tempat Belajar	0	0	24846776
146(1)	New		VARIATION	0	0	-1

Python: A nasty problem with encoding specification or how to read html text properly.

I was wondering why my own blog site "Doc Ernie's adventure" was not processed properly by my Python program which should create something like this:



"Doc Ernie's adventure"

. The top line of my multireportgen contains the following encoding:
# -*- coding: utf-8 -*-.

Yet the Python program could not read the string properly. Viewing the page source of the Topblog's page shows that our blog title is "Doc Ernie's adventure" Note the escaped coding Ampersand, Sharp sign, 039 code.
Then I went back today at the Python documentation and found other encoding values such as Latin1 and
iso-8859-15. When I replaced utf-8 with the 8859-15 value, the report was outputted properly with the link to the url of the blog shown!

Incidentally I saw this message from Emacs:

Selected encoding utf-8-unix disagrees with iso-latin-9-unix specified by file contents. Really save (else edit coding cookies and try again)? (yes or no)

So it was just a matter of encoding specification! Visit the corrected Topblogs July 6 report.

Sunday, June 20, 2010

Python, Econometrics, MWD test(release 0.0.1 with possible bug on pvalues? 06.21): Choosing between Linear vs log-linear models

A practical problem arises when choosing between alternative model specifications such as the basic linear model:

$$Y = \alpha_1 +\alpha_2 X_{2t} +\alpha_3 X_{3t} + \log X_{4t} + u_t$$

versus

$$\log Y = \beta_1 +\beta_2 \log X_{2t} +\beta_3 \log X_{3t} +\beta_4 \log X_{4t} + u_t$$

The MWD test, named after McKinnon, White and Davidson allows us to use ordinary least squares to help choose the appropriate model to fit the data.

Our main reference is Gujarati, 4e, pp. 280-282. The outline of the method may be seen in our python code, though if you are a beginner to Python, it will take time to understand.
Our mwdtest expects only the matrix of regressors,X, the first column of which consists of all ones (a constant column) and the response variable Y. All values of the model matrix and Y must
be greater than 0! since the logarithmic function has for its domain, the positive real numbers.

"""
file     mwdtest.py
author   Dr. Ernesto P. Adorio
         U.P. Clarkfield
         ernesto . adorio @ gmail . com
description
         Performs the McKinnon, White and Davidson test for testing
         suitability  between competing linear vs. log-linear models.
version
         0.0.1  2010.06.21 initial                   
"""

from matlib    import *
from lm        import *
from dataframe import *
from math      import *


def mwdtest(X, Y, ztol):
    """
    Arguments.
      X - matrix of regressors including constant column of 1's.
      Y - output vector.
      ztol - comparison with zero tolerance.
    Output:
      fitted linear and loglinear objects.
    Warning:
      The input X matrix may be modified by this routine.
    """
    # Is first column of X all ones?
    allones = True
    for row in X:
        if (row[0] - 1) > ztol:
           allones = False
           raise ValueError, "mwd():expects first column of all ones"

    #linear model
    print "@@@ debug: linear fit model."
    mataugprint(X,Y)
    
    linearfit1  = ols(X,Y)
    Yhat = linearfit1.Yhat

    #loglinear model
    logX =[]
    for row in X:
        a = [1.0]
        a.extend([log(x) for x in row[1:]])
        logX.append(a)
    logY = [log(y) for y in Y]
    print "@@@ debug:log linear model"
    mataugprint(logX, logY)
  
    logfit1 = ols(logX, logY)
    logYhat = logfit1.Yhat
   

    Z1 = [log(yhat) - logyhat for yhat, logyhat in zip(Yhat, logYhat)]
    for x, z1 in zip(X, Z1):
        x.append(z1)
    linearfit2 = ols(X, Y)

    Z2 = [exp(logy) - y for logy,y in zip(logYhat,Yhat)] 
    for x, z2 in zip (logX, Z2):
        x.append(z2)
    logfit2 = ols(logX, logY)
    return linearfit2, logfit2


def Test(X,Y, ztol):
    lfit, logfit = mwdtest(X,Y, ztol)
    print "testing linear fit:"
    for i, (t, beta,pvalue) in enumerate(zip(lfit.tbetas, lfit.betas, lfit.pvalues)):
        print "beta(", i,")=", beta,"t=", t, "pvalue=", pvalue
    print "F=", lfit.F, "R^2 = ", lfit.R2
    print
    print "testing log-linear fit:"
    for i, (t, beta,pvalue) in enumerate(zip(logfit.tbetas, logfit.betas, logfit.pvalues)):
        print "beta(", i,")=", beta,"t=", t, "pvalue=", pvalue
    print "F=", logfit.F, "R^2 = ", logfit.R2
    print

 
if __name__ == "__main__":
    #Rose demand data from Gujarati.
    data = """
# "Y" "X2" "X3" "X4" "X5"
# "Y" = Quantity of Roses Sold, Dozens
# "X2" = Average Wholesale Price of Roses, $ Per Dozen
# "X3" = Average Wholesale Price of Carnations, $ Per Dozen
# "X4" = Average Weekly Family Disposable Income, $ Per Week
# "X5" = Trend Variable: 1, 2, 3, ...
Y     X2   X3   X4  X5
11484 2.26 3.49 158.11 1 
9348 2.54 2.85 173.36 2 
8429 3.07 4.06 165.26 3 
10079 2.91 3.64 172.92 4 
9240 2.73 3.21 178.46 5 
8862 2.77 3.66 198.62 6 
6216 3.59 3.76 186.28 7 
8253 3.23 3.49 188.98 8 
8038 2.60 3.13 180.49 9 
7476 2.89 3.20 183.33 10 
5911 3.77 3.65 181.87 11 
7950 3.64 3.60 185.00 12 
6134 2.82 2.94 184.00 13 
5868 2.96 3.12 188.20 14 
3160 4.24 3.58 175.67 15 
5872 3.69 3.53 188.00 16 
""" 
    D = dataframe(data, header=True)
    X = matSelectCols(D.data, [1,2])
    
    #prepend  a 1 column to X model matrix.
    for x in X:
        x.insert(0, 1.0)
    Y = [d[0] for d in D.data]
 
    # See input data
    print "Input matrix."
    mataugprint(X, Y)

    ztol = 1.0e-5
    Test(X,Y, ztol)

Let us compare the results with Gujarati's output in pp.281-282.

python mwdtest.py Input matrix. 1.0000 2.2600 3.4900 | 11484.0000 1.0000 2.5400 2.8500 | 9348.0000 1.0000 3.0700 4.0600 | 8429.0000 1.0000 2.9100 3.6400 | 10079.0000 1.0000 2.7300 3.2100 | 9240.0000 1.0000 2.7700 3.6600 | 8862.0000 1.0000 3.5900 3.7600 | 6216.0000 1.0000 3.2300 3.4900 | 8253.0000 1.0000 2.6000 3.1300 | 8038.0000 1.0000 2.8900 3.2000 | 7476.0000 1.0000 3.7700 3.6500 | 5911.0000 1.0000 3.6400 3.6000 | 7950.0000 1.0000 2.8200 2.9400 | 6134.0000 1.0000 2.9600 3.1200 | 5868.0000 1.0000 4.2400 3.5800 | 3160.0000 1.0000 3.6900 3.5300 | 5872.0000 @@@ debug: linear fit model. 1.0000 2.2600 3.4900 | 11484.0000 1.0000 2.5400 2.8500 | 9348.0000 1.0000 3.0700 4.0600 | 8429.0000 1.0000 2.9100 3.6400 | 10079.0000 1.0000 2.7300 3.2100 | 9240.0000 1.0000 2.7700 3.6600 | 8862.0000 1.0000 3.5900 3.7600 | 6216.0000 1.0000 3.2300 3.4900 | 8253.0000 1.0000 2.6000 3.1300 | 8038.0000 1.0000 2.8900 3.2000 | 7476.0000 1.0000 3.7700 3.6500 | 5911.0000 1.0000 3.6400 3.6000 | 7950.0000 1.0000 2.8200 2.9400 | 6134.0000 1.0000 2.9600 3.1200 | 5868.0000 1.0000 4.2400 3.5800 | 3160.0000 1.0000 3.6900 3.5300 | 5872.0000 @@@ debug:log linear model 1.0000 0.8154 1.2499 | 9.3487 1.0000 0.9322 1.0473 | 9.1429 1.0000 1.1217 1.4012 | 9.0394 1.0000 1.0682 1.2920 | 9.2182 1.0000 1.0043 1.1663 | 9.1313 1.0000 1.0188 1.2975 | 9.0895 1.0000 1.2782 1.3244 | 8.7349 1.0000 1.1725 1.2499 | 9.0183 1.0000 0.9555 1.1410 | 8.9919 1.0000 1.0613 1.1632 | 8.9195 1.0000 1.3271 1.2947 | 8.6846 1.0000 1.2920 1.2809 | 8.9809 1.0000 1.0367 1.0784 | 8.7216 1.0000 1.0852 1.1378 | 8.6773 1.0000 1.4446 1.2754 | 8.0583 1.0000 1.3056 1.2613 | 8.6780 testing linear fit: beta( 0 )= 9727.56629203 t= 3.21783350732 pvalue= 0.191817531244 beta( 1 )= -3783.06271822 t= -6.33375547952 pvalue= 0.0996893146221 beta( 2 )= 2817.71671158 t= 2.83663692683 pvalue= 0.215767665862 beta( 3 )= 85.3186515319 t= 0.0207244164541 pvalue= 0.986808315114 F= 13.4410359246 R^2 = 0.770655824729 testing log-linear fit: beta( 0 )= 9.14861065241 t= 17.0825302798 pvalue= 0.0372248169624 beta( 1 )= -1.96990658676 t= -6.41899335191 pvalue= 0.0983866546526 beta( 2 )= 1.58915441923 t= 3.07287928323 pvalue= 0.200292472074 beta( 3 )= -0.00012938402573 t= -1.6612759925 pvalue= 0.344952324446 F= 14.1664502895 R^2 = 0.779813891197

The important values are for the last coefficient. Except for the published pvalue for the log-linear fit of 0.1225 vs our own 0.34495, all the other value match.

REMINDER TO SELF: The lm code needs to be checked for the right computation of the p-values.

Saturday, June 19, 2010

Python, Econometrics (revised 06.20.2010): Computing the Augmented Dickey-Fuller test statistic.

A time-series vector Y is non-stationary if in the model $$Y_t = \rho Y_{t-1} + \epsilon$$, $$\rho=1$$.
Given a time-series vector Y, and corresponding time vector T, the Dickey-Fuller test for the presence of a unit root or non-stationarity works with the following test equations (models):

basic: $$\Delta y_t = \tau y_{t-1}$$
with a constant: $$\Delta y_t = \beta + \tau + y_{t-1}$$
with constant and trend: $$\Delta y_t = \beta_1 + \beta_2 T_t + \tau y_{t-1}$$

Each model require its own particular Dickey-Fuller table of critical values which is related to but different from the t-statistical distribution.

The full Augmented Dickey-Fuller test works with the following models given a specified max lag of the differenced vector $$\Delta Y$$

basic: $$\Delta y_t = \tau y_{t-1} + \rho_1 \Delta Y_{t-1}+ \rho_2 \Delta y_{t-2} +\ldots + \rho_p \Delta y_{t-p}$$
with a constant: $$\Delta y_t = \beta + \tau y_{t-1} + \rho \Delta y_{t-1}+ \rho_2 \Delta y_{t-2} + \ldots + \rho_p \Delta y_{t-p}$$

with constant and trend: $$\Delta y_t = \beta_1 + \beta_2 T_t + \tau y_{t-1}+\rho \Delta Y_{t-1}+ \rho_2 \Delta y_{t-2} + \ldots + \rho_p \Delta y_{t-p}$$

Thus if we have to be careful in specifying the input X matrix and the independent response vector to an OLS routine.

The test statistic Dickey-Fuller test in all models is the value $${\frac{\hat{\tau}}{se (\hat{\tau})}$$ where the denominator is the standard error of the computed coefficient of the lagged term $$y_{t-1}.$$

There are published tables for the critical values of the D-F test.
@@@ Place references here!!!@@@

Unlike the well known standard distribution such as the normal, t, F and $$\chi^2$$ distributions, the critical values are obtained by simulation.

The NUll hypothesis is that $$H_0: \tau = 0$$, i.e, the series $$Y$$ is non-stationary. If the absolute values of the estimated $\tau$$ is less than the absolute value of the critical values for specified $$\alpha$$, we do not reject the Null hypothesis, otherwise we reject the Null hypothesis.

Time for an example: Let us review Danao's Dicker-Fuller test applied on the GNP data in page 323. For convenience we copy it here:

YEAR GNPR PCER PGNP 1940 772.9 502.6 0.1299 1941 909.4 531.1 0.138003 1942 1080.3 527.6 0.147181 1943 1276.2 539.9 0.150995 1944 1380.6 557.1 0.153122 1945 1354.8 592.7 0.157514 1946 1096.9 655 0.193637 1947 1066.7 666.6 0.220493 1948 1108.7 681.8 0.235952 1949 1109 695.4 0.234806 1950 1203.7 733.2 0.239512 1951 1328.2 748.7 0.251016 1952 1380 771.4 0.254783 1953 1435.3 802.5 0.258901 1954 1416.2 822.7 0.263028 1955 1494.9 873.8 0.271523 1956 1525.6 899.8 0.280676 1957 1551.1 919.7 0.290761 1958 1539.2 932.9 0.296778 1959 1629.1 979.4 0.30434 1960 1665.3 1005.1 0.309434 1961 1708.7 1025.2 0.312401 1962 1799.4 1069 0.319329 1963 1873.3 1108.4 0.323974 1964 1973.3 1170.6 0.329296 1965 2087.6 1236.4 0.337756 1966 2208.3 1298.9 0.34959 1967 2271.4 1337.7 0.359426 1968 2365.6 1405.9 0.377367 1969 2423.3 1456.7 0.397763 1970 2416.2 1492 0.420288 1971 2484.8 1538.8 0.443778 1972 2608.5 1621.9 0.464942 1973 2744.1 1689.6 0.495354 1974 2729.3 1674 0.539626 1975 2695 1711.9 0.593098 1976 2826.7 1803.9 0.6307 1977 2958.6 1883.8 0.672784 1978 3115.2 1961 0.722169 1979 3192.4 2004.4 0.785678 1980 3187.1 2000.4 0.857206 1981 3248.8 2024.2 0.939608 1982 3166 2050.7 1 1983 3279.1 2146 1.038608 1984 3489.9 2246.3 1.078827 1985 3585.2 2324.5 1.115168 1986 3676.5 2418.6 1.144703

Using the GNP and Year columns only, the model to fit in R notation is Delta(GNPR)_{t-1} ~ 1 + Year + GNPR_{t-1}. The response variable is the lagged differenced of GNPR and the independent variable are a constant, the year and the lagged GNPR. When we use our Python program, it outputs the following:

[-12794.909494518186, 6.6503686622997975, -0.098105000002848719]
[8538.0875917954763, 4.4232851542101859, 0.073658549799719808]
With drift-trend: df test_statistic -1.33188883395

The first output array are the computed estimated slope coefficients. The next array list the corresponding standard errors. The computed DF statistic matches that of Danao.

Now let us specify a max lag for the differenced responsse variable. The computed input X matrix is given by the following (we added labels for clarity):

C Year Ylag delYlag delY 1.0000 1942.0000 909.4000 34.4000 | 170.9000 1.0000 1943.0000 1080.3000 25.0000 | 195.9000 1.0000 1944.0000 1276.2000 -91.5000 | 104.4000 1.0000 1945.0000 1380.6000 -130.2000 | -25.8000 1.0000 1946.0000 1354.8000 -232.1000 | -257.9000 1.0000 1947.0000 1096.9000 227.7000 | -30.2000 1.0000 1948.0000 1066.7000 72.2000 | 42.0000 1.0000 1949.0000 1108.7000 -41.7000 | 0.3000 1.0000 1950.0000 1109.0000 94.4000 | 94.7000 1.0000 1951.0000 1203.7000 29.8000 | 124.5000 1.0000 1952.0000 1328.2000 -72.7000 | 51.8000 1.0000 1953.0000 1380.0000 3.5000 | 55.3000 1.0000 1954.0000 1435.3000 -74.4000 | -19.1000 1.0000 1955.0000 1416.2000 97.8000 | 78.7000 1.0000 1956.0000 1494.9000 -48.0000 | 30.7000 1.0000 1957.0000 1525.6000 -5.2000 | 25.5000 1.0000 1958.0000 1551.1000 -37.4000 | -11.9000 1.0000 1959.0000 1539.2000 101.8000 | 89.9000 1.0000 1960.0000 1629.1000 -53.7000 | 36.2000 1.0000 1961.0000 1665.3000 7.2000 | 43.4000 1.0000 1962.0000 1708.7000 47.3000 | 90.7000 1.0000 1963.0000 1799.4000 -16.8000 | 73.9000 1.0000 1964.0000 1873.3000 26.1000 | 100.0000 1.0000 1965.0000 1973.3000 14.3000 | 114.3000 1.0000 1966.0000 2087.6000 6.4000 | 120.7000 1.0000 1967.0000 2208.3000 -57.6000 | 63.1000 1.0000 1968.0000 2271.4000 31.1000 | 94.2000 1.0000 1969.0000 2365.6000 -36.5000 | 57.7000 1.0000 1970.0000 2423.3000 -64.8000 | -7.1000 1.0000 1971.0000 2416.2000 75.7000 | 68.6000 1.0000 1972.0000 2484.8000 55.1000 | 123.7000 1.0000 1973.0000 2608.5000 11.9000 | 135.6000 1.0000 1974.0000 2744.1000 -150.4000 | -14.8000 1.0000 1975.0000 2729.3000 -19.5000 | -34.3000 1.0000 1976.0000 2695.0000 166.0000 | 131.7000 1.0000 1977.0000 2826.7000 0.2000 | 131.9000 1.0000 1978.0000 2958.6000 24.7000 | 156.6000 1.0000 1979.0000 3115.2000 -79.4000 | 77.2000 1.0000 1980.0000 3192.4000 -82.5000 | -5.3000 1.0000 1981.0000 3187.1000 67.0000 | 61.7000 1.0000 1982.0000 3248.8000 -144.5000 | -82.8000 1.0000 1983.0000 3166.0000 195.9000 | 113.1000 1.0000 1984.0000 3279.1000 97.7000 | 210.8000 1.0000 1985.0000 3489.9000 -115.5000 | 95.3000 1.0000 1986.0000 3585.2000 -4.0000 | 91.3000

Our program returned the following slopes/st. errors:
[466.8367900006233, -0.22813001176738901, 0.020909396153443754, 0.50373782051647442] [8182.7301885576253, 4.2390077773331614, 0.070306560324028558, 0.12685872804481083] With drift-trend: df test_statistic 0.297403201879
The result does not match that of Danao's published result of -2.436537!! Testing this on R we get the following summary:

> fit <-lm( dely ~ 1 + year + ylag +dely1 ) > summary(fit) Call: lm(formula = dely ~ 1 + year + ylag + dely1) Residuals: Min 1Q Median 3Q Max -192.206 -32.704 3.383 39.123 137.138 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 466.83679 8182.73019 0.057 0.954781 year -0.22813 4.23901 -0.054 0.957342 ylag 0.02091 0.07031 0.297 0.767660 dely1 0.50374 0.12686 3.971 0.000282 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 68.08 on 41 degrees of freedom Multiple R-squared: 0.3353, Adjusted R-squared: 0.2866 F-statistic: 6.893 on 3 and 41 DF, p-value: 0.000728

Our slope(beta) and se(std.error) for ylag matches that of R. This is contrast with Danao's published output:

Std.Error (Intercept) 124.1005 49.00592 @TREND(1940) 10.71098 4.127310 GNPR(-1) -0.168378 0.969105 D(GNPR(-1) 0.452307 0.140006

and with computed ADF test statistic of -2,436537.(Read p.331). We will find out if there are errors.

Here is the complete Python program to compute the test statistic of the DF and ADF test.

"""
file   adftest.py

author Dr. Ernesto Adorio
       U.P. Clark, Pampanga
       ernesto.adorio@gmail.com

Version 0.0.1   June 20, 2010
"""    

from lm import *

def adftest(Y, t=None, withintercept=False, maxp=0):
    """
    Description
      Computes Dickey-Fuller Unit test values.
    Arguments
      Y - input vector
      t - time vector, None if no trend is to be incorporated.
      withintercept - true if there is a constant in model.
      maxp  maximum lag for differences.
    Return
      tau, coefficient of lagged variable Y(-1).
      se(tau), standard error of coefficient.
      n- number of data  points used.
    """
    # Form matrix of columns corresponding to delY-1 to delY-p
    # Be simple!
    n        = len(Y)

    delyp    = [0]* (maxp+1)    
    delyp[0] = [Y[i]-Y[i-1] for i in range(1, n)]

    for p in range(1, maxp+1):
        delyp[p] = [delyp[p-1][i] - delyp[p-1][i-1] for i in range(1,len(delyp[p-1]))]

    # Adjust differences to same length.
    for p in range(0, maxp+1):    
        delyp[p] = delyp[p][maxp-p:]

    # Form the X design matrix.
    X = []
    for i in range(maxp + 1, n):
        x = []
        if withintercept:
           x.append(1)
        if t:
           x.append(t[i])
        x.append(Y[i-1])
        for p in range(1, maxp+1):
           x.append(delyp[p][i-maxp-1])
        X.append(x)

    mataugprint(X, delyp[0])

    fit = ols( X,  delyp[0])

    ylagindex = 0
    if withintercept:
       ylagindex+=1
    if t:
       ylagindex+=1
    print "yindex = ", ylagindex        
    tau = fit.betas[ylagindex]
    se  = fit.sdbetas[ylagindex]
    print fit.betas
    print fit.sdbetas
    return tau, se, len(X)
 

if __name__ == "__main__":
    data = """YEAR    GNPR    PCER    PGNP
1940    772.9   502.6   0.1299
1941    909.4   531.1   0.138003
1942    1080.3  527.6   0.147181
1943    1276.2  539.9   0.150995
1944    1380.6  557.1   0.153122
1945    1354.8  592.7   0.157514
1946    1096.9  655     0.193637
1947    1066.7  666.6   0.220493
1948    1108.7  681.8   0.235952
1949    1109    695.4   0.234806
1950    1203.7  733.2   0.239512
1951    1328.2  748.7   0.251016
1952    1380    771.4   0.254783
1953    1435.3  802.5   0.258901
1954    1416.2  822.7   0.263028
1955    1494.9  873.8   0.271523
1956    1525.6  899.8   0.280676
1957    1551.1  919.7   0.290761
1958    1539.2  932.9   0.296778
1959    1629.1  979.4   0.30434
1960    1665.3  1005.1  0.309434
1961    1708.7  1025.2  0.312401
1962    1799.4  1069    0.319329
1963    1873.3  1108.4  0.323974
1964    1973.3  1170.6  0.329296
1965    2087.6  1236.4  0.337756
1966    2208.3  1298.9  0.34959
1967    2271.4  1337.7  0.359426
1968    2365.6  1405.9  0.377367
1969    2423.3  1456.7  0.397763
1970    2416.2  1492    0.420288
1971    2484.8  1538.8  0.443778
1972    2608.5  1621.9  0.464942
1973    2744.1  1689.6  0.495354
1974    2729.3  1674    0.539626
1975    2695    1711.9  0.593098
1976    2826.7  1803.9  0.6307
1977    2958.6  1883.8  0.672784
1978    3115.2  1961    0.722169
1979    3192.4  2004.4  0.785678
1980    3187.1  2000.4  0.857206
1981    3248.8  2024.2  0.939608
1982    3166    2050.7  1
1983    3279.1  2146    1.038608
1984    3489.9  2246.3  1.078827
1985    3585.2  2324.5  1.115168
1986    3676.5  2418.6  1.144703
    """
    header = None
    X= []
    for row in data.split("\n"):      
        xvars = row.split()
        if len(xvars) == 4:
               if header is None:
                  header = xvars
               else:
                  cases = [float(v) for v in xvars]
                  X.append(cases)    
    TIME = [x[0] for x in X]
    GNPR = [x[1] for x in X]

    tau, se, n = adftest(GNPR, t=TIME, withintercept=True, maxp = 1)
    print "With drift-trend: df test_statistic", tau/se

We will try to find out why our results do not match with that of Danao when the lag of the differenced response is not zero.

We only saw the following reference after we wrote our code.

Additional Resources: http://econpy.googlecode.com/svn/trunk/pytrix/unitroot.py

Saturday, June 12, 2010

Here is Python code to get the BSP daily (Monday to Friday) key rates and other statistics. It is assumed that the web pages are stored as html files. The program is written so that it will do the job correctly. It is not elegant nor a speed demon. The output may be viewed at our Economics page:

Economic Indicators Page

# -*- coding: utf-8 -*-

from glob import *

def between(line,  before,  after):
       f = line.find(before)
       if f >= 0:
           s = line[f+len(before): ]
           return s[:s.find(after)]
       else:
           return ""
   
def linefeeder(s=None):
    if s:
       lines = s.plit("\n")
    for line in lines:
        yield line


def mdy(d):
    """
    Reformats date d to form month.day.year 
    """
    day,  month, year = d.split()
    amonths = ["January", "February",'March',  "April",  "May",  "June",  "July", "August", 
                        "September",  "October",  "November",  "December"]
    return "%02d.%02d.%s" % (amonths.index(month)+1,  int(day),  year)
   
state = 0
out = []
for fname  in glob("*bsp*.html"):
    bspdata = open(fname).read()
    s = ""
    state = 0
    for line in bspdata.split("\n"):
        if "" in line:
            date = between (line, "
", "")
            if date:
                s+="
" + mdy(date) + ""
                state = 1
            continue  
        if state == 1:
            if  "keystat/day99.htm" in line:
              state = 2
            continue
        if state == 2:
            #dollar rate.
            dollarrate = between(line,  "right\">", "" ) 
            if dollarrate:
                s+="" + dollarrate +""
                state= 3
            continue
        if state == 3:
            if "bspratesdds" in line:
                state = 4
            continue
        if state == 4:
            reporate = between(line,  '', '')
            if reporate:
                s+=""+ reporate + ""
                state = 5
            continue
        if state == 5:
            if "Reverse Repo Rate" in line:
                state = 6
            continue
        if state == 6:
            revreporate= between(line,  "right\">",  "")
            if revreporate:
                s+=""+ revreporate+"" 
                state = 7
            continue
        if state == 7:
            if "Inflation Rate" in line:
                state = 8
            continue
        if state == 8:
            infrate = between(line, "right\">",  "")
            if infrate:
                s+=""+ infrate+ ""
                state = 9
            continue
        if state == 9:
            if "NEER" in line:
                state = 10
            continue
        if state == 10:
            NEER = between(line,  "right\">",  "").strip()
            if NEER: 
                s+=""+  NEER+  ""
                state = 11
            continue
        if state == 11:
            if "91-day" in line:
                state = 12
            continue
        if state == 12:
            d91 = between(line,  "right\">",  "")
            if d91:
                s+=""+  d91+ ""
                state = 13
            continue
        if state == 13:
            if "Gold" in line:
                state = 14
            continue
        if state == 14:
            gold = between(line,  "right\">",  "")
            if gold:
                s+=""+  gold+"" 
                state = 15
            continue
        if state == 15:
            if "Silver" in line:
                state = 16
            continue
        if state == 16:
            silver = between(line, "right\">",  "")
            if silver:
                s+= ""+ silver +"
"
                state = 0
                out.append(s)
                break
                
out.sort(reverse=True)
print out[0]
for i in range(len(out)):
    if out[i] != out[i-1]:
        print i, ":", out[i]

We plan to make the code cleaner by using the linefeeder function, but this can wait as we have more important things to do.

Friday, June 11, 2010

Python, Econometrics, Time-Series: Burg algorithm for autoregressive modelling

Finally we got around to implement in Python the Burg algorithm. It was not that too hard, we simply translated code by Cedric Collomb in his tutorial "Burgs Method, Algorithm and Recursion and which you can also access in a previous blog post. In honor of his clear exposition with a matching C++ code, we name the Python function, CollombBurg. Basically, the algorithm will return m coefficients $$a_1, a_2, \ldots, a_{m}$$ such that \[ X_i = \sum_{k=1}^m a_{k} X_{i-k}\]. It is assumed that the maximum order, m, (or lag) to use is already known, otherwise the AIC criterion may be applied to find a good estimate.

"""
file              CollombBurg.py

author/translator Ernesto P. Adorio,Ph.D.
                  UPDEPP (UP Clark, Pampanga)
                  ernesto.adorio@gmail.com

Version           0.0.1 jun 11, 2010 # first release.

References        Burg's Method, Algorithm and Recursion, pp. 9-11
"""
from math import cos


def  burg(m,  x):
    """
    Based on Collomb's C++ code, pp. 10-11
    Burgs Method, algorithm and recursion
      m - number of lags in autoregressive model.
      x  - data vector to approximate.
    """
    N = len(x)-1
    coeffs = [0.0] * m
    
    # initialize Ak
    Ak      = [0.0]* (m+1)
    Ak[0] = 1.0 
    # initialize f and b.
    f  = x[:]
    b = x[:]
    # Initialize Dk
    Dk = 0.0
    for j in range(0,  N+1):
        Dk += 2.0 * f[j] * f[j]
    Dk -= f[ 0 ] * f[ 0 ] + b[ N ] * b[ N ];

    #Burg recursion
    for k in range(m):
        # compute mu
        mu = 0.0;
        for n in range(0,  N-k):
            mu += f[ n + k + 1 ] * b[ n ]
        mu *= -2.0 / Dk
        
        # update Ak
        maxn = (k+1)/2 + 1
        for n in range(maxn):
            t1 = Ak[ n ] + mu * Ak[ k + 1 - n ];
            t2 = Ak[ k + 1 - n ] + mu * Ak[ n ];
            Ak[ n ] = t1;
            Ak[ k + 1 - n ] = t2;
        #update f and b
        for n in range(N-k):
            t1 = f[ n + k + 1 ] + mu * b[n]
            t2 = b[ n ] + mu * f[ n + k + 1]
            f[ n + k + 1 ] = t1;
            b[ n ] = t2
            
        #update Dk
        Dk = ( 1.0 - mu * mu ) * Dk - f[ k + 1 ] * f[ k + 1 ] - b[ N - k - 1 ] * b[ N - k - 1 ];
    
    # assign coefficients.
    for i,  a in enumerate(Ak[1:]):
         coeffs[i] =a
    return coeffs
    
    

if __name__ == "__main__":
    # data to 
    original = [0.0]* 128
    for i in range(128):
        t  = cos( i * 0.01 ) + 0.75 *cos( i * 0.03 )\
            + 0.5 *cos( i * 0.05 ) + 0.25 *cos( i * 0.11 )
        original[i] = t
        print i,  t
        
    #get linear prediction coefficients.
    # using BurgAlgorithm( m, original )
    coeffs = burg(4,  original)
    
    # Linear Predict Data
    predicted = [orig for orig in original]
    m = len(coeffs)
    for i in range(m,  len(predicted)):
       predicted[i] =0.0
       for j in range(m):
           predicted[ i ] -= coeffs[ j ] * original[i-1-j]

    #Calculate and display error.
    error = 0.0
    for i in range(len(predicted)):
       print "Index: %2d / Original: %.6f / Predicted: %.6f" % (i, original[i], predicted[i] )
       delta = predicted[i] - original[i]
       error += delta * delta;
    print "Burg Approximation Error: %f\n" % error

When the above code runs, it printed the following:

9
Index:  0 / Original: 2.500000 / Predicted: 2.500000
Index:  1 / Original: 2.497477 / Predicted: 2.497477
Index:  2 / Original: 2.489927 / Predicted: 2.489927
Index:  3 / Original: 2.477411 / Predicted: 2.477411
Index:  4 / Original: 2.460028 / Predicted: 2.460028
Index:  5 / Original: 2.437916 / Predicted: 2.437917
Index:  6 / Original: 2.411250 / Predicted: 2.411250
Index:  7 / Original: 2.380238 / Predicted: 2.380239
Index:  8 / Original: 2.345123 / Predicted: 2.345125
Index:  9 / Original: 2.306177 / Predicted: 2.306178
Index: 10 / Original: 2.263697 / Predicted: 2.263698
Index: 11 / Original: 2.218005 / Predicted: 2.218007
Index: 12 / Original: 2.169443 / Predicted: 2.169445
Index: 13 / Original: 2.118368 / Predicted: 2.118371
Index: 14 / Original: 2.065152 / Predicted: 2.065155
Index: 15 / Original: 2.010171 / Predicted: 2.010174
Index: 16 / Original: 1.953808 / Predicted: 1.953811
Index: 17 / Original: 1.896445 / Predicted: 1.896449
Index: 18 / Original: 1.838460 / Predicted: 1.838465
Index: 19 / Original: 1.780224 / Predicted: 1.780229
Index: 20 / Original: 1.722094 / Predicted: 1.722099
Index: 21 / Original: 1.664412 / Predicted: 1.664417
Index: 22 / Original: 1.607501 / Predicted: 1.607506
Index: 23 / Original: 1.551661 / Predicted: 1.551666
Index: 24 / Original: 1.497167 / Predicted: 1.497172
Index: 25 / Original: 1.444265 / Predicted: 1.444270
Index: 26 / Original: 1.393171 / Predicted: 1.393176
Index: 27 / Original: 1.344070 / Predicted: 1.344075
Index: 28 / Original: 1.297110 / Predicted: 1.297115
Index: 29 / Original: 1.252408 / Predicted: 1.252413
Index: 30 / Original: 1.210043 / Predicted: 1.210047
Index: 31 / Original: 1.170058 / Predicted: 1.170062
Index: 32 / Original: 1.132462 / Predicted: 1.132465
Index: 33 / Original: 1.097229 / Predicted: 1.097232
Index: 34 / Original: 1.064298 / Predicted: 1.064301
Index: 35 / Original: 1.033578 / Predicted: 1.033580
Index: 36 / Original: 1.004946 / Predicted: 1.004947
Index: 37 / Original: 0.978251 / Predicted: 0.978251
Index: 38 / Original: 0.953317 / Predicted: 0.953317
Index: 39 / Original: 0.929947 / Predicted: 0.929946
Index: 40 / Original: 0.907923 / Predicted: 0.907920
Index: 41 / Original: 0.887010 / Predicted: 0.887007
Index: 42 / Original: 0.866964 / Predicted: 0.866960
Index: 43 / Original: 0.847530 / Predicted: 0.847525
Index: 44 / Original: 0.828449 / Predicted: 0.828443
Index: 45 / Original: 0.809461 / Predicted: 0.809453
Index: 46 / Original: 0.790308 / Predicted: 0.790300
Index: 47 / Original: 0.770742 / Predicted: 0.770732
Index: 48 / Original: 0.750521 / Predicted: 0.750511
Index: 49 / Original: 0.729420 / Predicted: 0.729409
Index: 50 / Original: 0.707231 / Predicted: 0.707219
Index: 51 / Original: 0.683766 / Predicted: 0.683754
Index: 52 / Original: 0.658862 / Predicted: 0.658848
Index: 53 / Original: 0.632378 / Predicted: 0.632364
Index: 54 / Original: 0.604207 / Predicted: 0.604192
Index: 55 / Original: 0.574266 / Predicted: 0.574251
Index: 56 / Original: 0.542509 / Predicted: 0.542494
Index: 57 / Original: 0.508920 / Predicted: 0.508904
Index: 58 / Original: 0.473515 / Predicted: 0.473499
Index: 59 / Original: 0.436345 / Predicted: 0.436329
Index: 60 / Original: 0.397496 / Predicted: 0.397479
Index: 61 / Original: 0.357083 / Predicted: 0.357066
Index: 62 / Original: 0.315255 / Predicted: 0.315238
Index: 63 / Original: 0.272189 / Predicted: 0.272173
Index: 64 / Original: 0.228093 / Predicted: 0.228076
Index: 65 / Original: 0.183198 / Predicted: 0.183182
Index: 66 / Original: 0.137759 / Predicted: 0.137744
Index: 67 / Original: 0.092053 / Predicted: 0.092038
Index: 68 / Original: 0.046373 / Predicted: 0.046358
Index: 69 / Original: 0.001024 / Predicted: 0.001009
Index: 70 / Original: -0.043677 / Predicted: -0.043691
Index: 71 / Original: -0.087407 / Predicted: -0.087420
Index: 72 / Original: -0.129840 / Predicted: -0.129853
Index: 73 / Original: -0.170654 / Predicted: -0.170666
Index: 74 / Original: -0.209529 / Predicted: -0.209541
Index: 75 / Original: -0.246158 / Predicted: -0.246169
Index: 76 / Original: -0.280245 / Predicted: -0.280255
Index: 77 / Original: -0.311511 / Predicted: -0.311520
Index: 78 / Original: -0.339699 / Predicted: -0.339708
Index: 79 / Original: -0.364576 / Predicted: -0.364584
Index: 80 / Original: -0.385934 / Predicted: -0.385941
Index: 81 / Original: -0.403595 / Predicted: -0.403602
Index: 82 / Original: -0.417416 / Predicted: -0.417422
Index: 83 / Original: -0.427284 / Predicted: -0.427290
Index: 84 / Original: -0.433126 / Predicted: -0.433130
Index: 85 / Original: -0.434902 / Predicted: -0.434906
Index: 86 / Original: -0.432613 / Predicted: -0.432617
Index: 87 / Original: -0.426297 / Predicted: -0.426300
Index: 88 / Original: -0.416031 / Predicted: -0.416033
Index: 89 / Original: -0.401928 / Predicted: -0.401930
Index: 90 / Original: -0.384140 / Predicted: -0.384142
Index: 91 / Original: -0.362853 / Predicted: -0.362855
Index: 92 / Original: -0.338289 / Predicted: -0.338290
Index: 93 / Original: -0.310697 / Predicted: -0.310699
Index: 94 / Original: -0.280360 / Predicted: -0.280362
Index: 95 / Original: -0.247584 / Predicted: -0.247585
Index: 96 / Original: -0.212698 / Predicted: -0.212700
Index: 97 / Original: -0.176052 / Predicted: -0.176054
Index: 98 / Original: -0.138010 / Predicted: -0.138012
Index: 99 / Original: -0.098950 / Predicted: -0.098952
Index: 100 / Original: -0.059255 / Predicted: -0.059257
Index: 101 / Original: -0.019313 / Predicted: -0.019316
Index: 102 / Original: 0.020487 / Predicted: 0.020484
Index: 103 / Original: 0.059762 / Predicted: 0.059759
Index: 104 / Original: 0.098138 / Predicted: 0.098135
Index: 105 / Original: 0.135254 / Predicted: 0.135251
Index: 106 / Original: 0.170764 / Predicted: 0.170761
Index: 107 / Original: 0.204344 / Predicted: 0.204340
Index: 108 / Original: 0.235691 / Predicted: 0.235688
Index: 109 / Original: 0.264532 / Predicted: 0.264529
Index: 110 / Original: 0.290623 / Predicted: 0.290619
Index: 111 / Original: 0.313749 / Predicted: 0.313745
Index: 112 / Original: 0.333734 / Predicted: 0.333730
Index: 113 / Original: 0.350433 / Predicted: 0.350430
Index: 114 / Original: 0.363743 / Predicted: 0.363740
Index: 115 / Original: 0.373596 / Predicted: 0.373593
Index: 116 / Original: 0.379964 / Predicted: 0.379961
Index: 117 / Original: 0.382856 / Predicted: 0.382853
Index: 118 / Original: 0.382321 / Predicted: 0.382318
Index: 119 / Original: 0.378444 / Predicted: 0.378441
Index: 120 / Original: 0.371345 / Predicted: 0.371343
Index: 121 / Original: 0.361180 / Predicted: 0.361179
Index: 122 / Original: 0.348136 / Predicted: 0.348135
Index: 123 / Original: 0.332429 / Predicted: 0.332428
Index: 124 / Original: 0.314301 / Predicted: 0.314301
Index: 125 / Original: 0.294019 / Predicted: 0.294020
Index: 126 / Original: 0.271870 / Predicted: 0.271871
Index: 127 / Original: 0.248155 / Predicted: 0.248157
Burg Approximation Error: 0.000000

Thanks to Cedrick Collomb.

Doc Ernie's Adventures in the Web

Pages