Top
- Introduction
- Probability mass function
- Normal Approximation
- Python Code
- Swed and Eisenhart table
Introduction
Consider the sequence ABBAAABBABABBB. A subsequence of consecutive same letters is called a run. For our example, we use vertical bars to separate each run: A|BB|AAA|BB|A|B|A|BBB consists of 8 runs. This sequence contains 6 A's and 8 B's which we denote by (m,n)= (6,8). We would be suspicious of people claiming that the sequences AAAAAAABBBBBBB or ABABABABABABAB are random. The runs test was developed precisely to test the null hypothesis whether a sample of a sequence of binary symbols is randomly generated. The minimum number of runs of two symbols $m$ of one kind and $n$ of another kind is 2. On the other hand, the maximum possible number of runs is $2\, min(m, n)$ plus 1 if both m and n are different.
The probability mass function of the distribution of runs
The distribution of the number of runs has been "derived" or explained in [Mood, Graybill and Boes,
"Introduction to Statistics" McGraw Hill, 3e, 1974] p.519-521. The probability mass function is given for any even number of runs $(z = 2k)$ as \[ prob(Z = z)=\frac{2\binom{m-1}{k-1}\binom{n-1}{k-1}}{\binom{m+n}{m}}\] and for an odd number of runs, $(z = 2k+1)$ as \[ prob(Z = z)=\frac{\binom{m-1}{k}\binom{n-1}{k-1} + \binom{m-1}{k-1} \binom{n-1}{k}}{\binom{m+n}{m}}\] Recall that $\binom{n}{r}$ is the number of combinations of $n$ things taken $r$ at a time.It can be computed as $\binom{n}{r} = \frac{n!}{r!(n-r)!}$ but this is inefficient, and instead we use the formula $\binom{n}{r} = \frac{n(n-1)\cdots (n-r+1)}{1\cdot 2\cdots r}$ It would be foolish to do by hand even with aid of a non-programmable scientific calculators, especialy when calculating the cumulative mass distribution function and is the reason we are writing Python codes.
The normal approximation to the distribution of runs
When m and n are both large, say m and n are both greater than 10, then an asymptotic normal approximation might be good enough. The mean is given by \[\mu = \frac{2mn}{m+n} +1\] and the variance would be given by \[\sigma^2 = \frac{2mn(2mn -m -n)}{(m+n)^2 (m+n-1}\]
Python codes
Here is our latest version as of April 6. It is more complete now but as usual, users beware. It has not been thoroughly debugged.
Function | Description |
nCr(n, r) | binomial coefficient |
maxruns(m,n) | maximum runs is (m,n) sequence |
pmfruns(z, m, n) | probability mass function |
cmftableruns(m, n) | cumulative mass function table |
pvalue(z, m, n, side=-1) | $p-$value of test statistic z |
normrunstest(z, m, n, alpha, side) | normal approximation test |
exactrunstest(z,m,n,alpha,side=0) | exact test for runs |
swedeisenhart | Swed and Eisenhart (m,n)cumulative distribution table |
isvalidruns(S) | return (m,n) if S is a valid runs sequence, else None |
# -*- coding: utf-8 -*-
"""
file testofruns.py
author Ernesto P. Adorio
version 0.0.1 April 5, 2010
"""
from math import sqrt
from basictests import ztest, ttest
def nCr(n, r):
"""
Computes the binomial coefficient, number of combinations of n things taken r at a time.
"""
if r == 1: return n
if r == 0 or r == n: return 1
if n < r or n < 0 or r < 0: return 0
if r > n-r: # find value of less looping.
r = n-r
f = 1
for i in range(1, r+1):
f *= (n - i+1)
f /= i
return f
def maxruns(m,n):
# maximum runs with m symbols of the first kind and n symbols of the second.kind.
return 2 * min(m, n) + (1 if m!=n else 0)
def pmfruns(z, m, n):
"""
Computes the probability mass function for the distribution of runs.
"""
if z < 2:
raise ValueError, "in pmfruns, z must satisfy z >= 2"
if z % 2 ==0: #even z.
k = z // 2
return 2.0 * nCr(m-1, k-1) * nCr (n-1, k-1)/ nCr(m+n, m)
else : #odd z
k = (z-1) // 2
return float(nCr(m-1, k) * nCr(n-1, k-1) + nCr(m-1, k-1) * nCr(n-1, k)) / float(nCr(m + n, m))
def cmftableruns(m, n):
"""
Returns a cumulative masss function for the distribution of runs.
You must note that that the random variable z starts at 2 !
"""
tot = 0
result=[]
maxz = maxruns(m, n)
for i in range(2, maxz+1):
tot += pmfruns(i, m, n)
result.append(tot)
return result
def pvalue(z, m, n, side=-1):
"unchecked. to be tested first!"
upperz = maxruns(m,n)
if not (2 <= z <= maxruns) :
raise ValueError, "in pvalue(), z must lie in (2, %d)" % upperz
table = cmftableruns(m,n)
if side == 0:
p = table[z-2]
if p > 0.5:
p = 1-0.5
return p * 2
elif side == -1:
return table[z-2]
elif side == 1:
return 1 - table[z-2]
def normrunstest(z, m, n, alpha, side):
#Test using normal approximation, when both m and n are greater than 10.
mean = 2.0 * m *n / (m+n) + 1
variance = 2. * m * n * (2 *m *n -m -n)/((m+n)**2 * (m+n - 1))
return ztest(z - mean, sqrt(variance/n), alpha, side)
def swedeisenhart():
print
print """
P(Z <= a) when H_0 is true in the Runs test
(m, n) | """ for i in range(2, 11): print '%d | ' % i, print "
" for m in range(2, 11): for n in range(m, 11): rowvals = cmftableruns(m, n) maxcol = len(rowvals) print "
(%d,%d) | " % (m,n), for j in range(min(maxcol, 9)): print "%5.3f | " % rowvals[j], for j in range(maxcol, 9): print " | ", print "
" print "
"
def exactrunstest(z, m, n, alpha, side=0):
#returns pvalue, test stat z and lower and upper limits.
cmf = cmftableruns(m,n)
print cmf,
a = b = None
if side ==0:
alpha /= 2.0
alpha1 = alpha
alpha2 = 1-alpha1
elif side == 1:
alpha2 = 1-alpha
elif side == 0:
alpha1 = alpha
for i in range(len(cmf)):
if cmf[i] <= alpha1:
a = i+2
if cmf[i] >= alpha2:
b = i+2
p = pvalue(z, m,n, side)
if side == 0:
return (p, z, (a,b))
elif side == 1:
return (p, z, b)
elif side == -1:
return (p, z, a)
def isvalidruns(S):
D = {}
for s in S:
if s in D:
D[s] +=1
else:
D[s] = 1
if len(D) != 2:
raise ValueError, "in isvalidruns(): more than two symbols"
counts = []
for (i,s) in enumerate(D):
counts.append(D[s])
a,b = counts[0], counts[1]
return min(a, b), max(a,b)
def Test():
swedeisenhart() # decomment line to turn on table generation test.
#print "Exact runs test for z = 5, (m, n)= (5,7)"
#print exactrunstest(5, 5,7, 0.05, side=0)
"""
print pmfruns(3, 5,4)
print normrunstest(5, 4,9,0.05, -1)
print "Table for runs test."
m, n = 3,5
print cmftableruns(m,n)
"""
if __name__ == "__main__":
Test()
The Python module above calls the ztest which was already described in previous posts. We will add all files required in one zip file for convenience later.
Swed and Eisenhart Table
Here is an html table obtained by running swedeisenhart() routine.
$P(Z <= a)$ when $H_0$ is true in the Runs test
(m, n) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
(2,2) | 0.333 | 0.667 | 1.000 | | | | | | |
(2,3) | 0.200 | 0.500 | 0.900 | 1.000 | | | | | |
(2,4) | 0.133 | 0.400 | 0.800 | 1.000 | | | | | |
(2,5) | 0.095 | 0.333 | 0.714 | 1.000 | | | | | |
(2,6) | 0.071 | 0.286 | 0.643 | 1.000 | | | | | |
(2,7) | 0.056 | 0.250 | 0.583 | 1.000 | | | | | |
(2,8) | 0.044 | 0.222 | 0.533 | 1.000 | | | | | |
(2,9) | 0.036 | 0.200 | 0.491 | 1.000 | | | | | |
(2,10) | 0.030 | 0.182 | 0.455 | 1.000 | | | | | |
(3,3) | 0.100 | 0.300 | 0.700 | 0.900 | 1.000 | | | | |
(3,4) | 0.057 | 0.200 | 0.543 | 0.800 | 0.971 | 1.000 | | | |
(3,5) | 0.036 | 0.143 | 0.429 | 0.714 | 0.929 | 1.000 | | | |
(3,6) | 0.024 | 0.107 | 0.345 | 0.643 | 0.881 | 1.000 | | | |
(3,7) | 0.017 | 0.083 | 0.283 | 0.583 | 0.833 | 1.000 | | | |
(3,8) | 0.012 | 0.067 | 0.236 | 0.533 | 0.788 | 1.000 | | | |
(3,9) | 0.009 | 0.055 | 0.200 | 0.491 | 0.745 | 1.000 | | | |
(3,10) | 0.007 | 0.045 | 0.171 | 0.455 | 0.706 | 1.000 | | | |
(4,4) | 0.029 | 0.114 | 0.371 | 0.629 | 0.886 | 0.971 | 1.000 | | |
(4,5) | 0.016 | 0.071 | 0.262 | 0.500 | 0.786 | 0.929 | 0.992 | 1.000 | |
(4,6) | 0.010 | 0.048 | 0.190 | 0.405 | 0.690 | 0.881 | 0.976 | 1.000 | |
(4,7) | 0.006 | 0.033 | 0.142 | 0.333 | 0.606 | 0.833 | 0.955 | 1.000 | |
(4,8) | 0.004 | 0.024 | 0.109 | 0.279 | 0.533 | 0.788 | 0.929 | 1.000 | |
(4,9) | 0.003 | 0.018 | 0.085 | 0.236 | 0.471 | 0.745 | 0.902 | 1.000 | |
(4,10) | 0.002 | 0.014 | 0.068 | 0.203 | 0.419 | 0.706 | 0.874 | 1.000 | |
(5,5) | 0.008 | 0.040 | 0.167 | 0.357 | 0.643 | 0.833 | 0.960 | 0.992 | 1.000 |
(5,6) | 0.004 | 0.024 | 0.110 | 0.262 | 0.522 | 0.738 | 0.911 | 0.976 | 0.998 |
(5,7) | 0.003 | 0.015 | 0.076 | 0.197 | 0.424 | 0.652 | 0.854 | 0.955 | 0.992 |
(5,8) | 0.002 | 0.010 | 0.054 | 0.152 | 0.347 | 0.576 | 0.793 | 0.929 | 0.984 |
(5,9) | 0.001 | 0.007 | 0.039 | 0.119 | 0.287 | 0.510 | 0.734 | 0.902 | 0.972 |
(5,10) | 0.001 | 0.005 | 0.029 | 0.095 | 0.239 | 0.455 | 0.678 | 0.874 | 0.958 |
(6,6) | 0.002 | 0.013 | 0.067 | 0.175 | 0.392 | 0.608 | 0.825 | 0.933 | 0.987 |
(6,7) | 0.001 | 0.008 | 0.043 | 0.121 | 0.296 | 0.500 | 0.733 | 0.879 | 0.966 |
(6,8) | 0.001 | 0.005 | 0.028 | 0.086 | 0.226 | 0.413 | 0.646 | 0.821 | 0.937 |
(6,9) | 0.000 | 0.003 | 0.019 | 0.063 | 0.175 | 0.343 | 0.566 | 0.762 | 0.902 |
(6,10) | 0.000 | 0.002 | 0.013 | 0.047 | 0.137 | 0.287 | 0.497 | 0.706 | 0.864 |
(7,7) | 0.001 | 0.004 | 0.025 | 0.078 | 0.209 | 0.383 | 0.617 | 0.791 | 0.922 |
(7,8) | 0.000 | 0.002 | 0.015 | 0.051 | 0.149 | 0.296 | 0.514 | 0.704 | 0.867 |
(7,9) | 0.000 | 0.001 | 0.010 | 0.035 | 0.108 | 0.231 | 0.427 | 0.622 | 0.806 |
(7,10) | 0.000 | 0.001 | 0.006 | 0.024 | 0.080 | 0.182 | 0.355 | 0.549 | 0.743 |
(8,8) | 0.000 | 0.001 | 0.009 | 0.032 | 0.100 | 0.214 | 0.405 | 0.595 | 0.786 |
(8,9) | 0.000 | 0.001 | 0.005 | 0.020 | 0.069 | 0.157 | 0.319 | 0.500 | 0.702 |
(8,10) | 0.000 | 0.000 | 0.003 | 0.013 | 0.048 | 0.117 | 0.251 | 0.419 | 0.621 |
(9,9) | 0.000 | 0.000 | 0.003 | 0.012 | 0.044 | 0.109 | 0.238 | 0.399 | 0.601 |
(9,10) | 0.000 | 0.000 | 0.002 | 0.008 | 0.029 | 0.077 | 0.179 | 0.319 | 0.510 |
(10,10) | 0.000 | 0.000 | 0.001 | 0.004 | 0.019 | 0.051 | 0.128 | 0.242 | 0.414 |
If you have any suggestions, or corrections, or features request do not hesitate to email the author at ernesto.adorio @ gmail . com.
Bottom
I have encountered the html problem: Your HTML cannot be accepted: Closing tag has no matching opening tag: A ???!!! I wonder where its !!
where is the basictest module? did you create it or is it a python module? "from basictests import ztest, ttest"
ReplyDelete