Top | Intro | Pmf | Python | Normal | Swed | Bottom |

### Introduction

Consider the sequence ABBAAABBABABBB. A subsequence of consecutive same letters is called a run. For our example, we use vertical bars to separate each run: A|BB|AAA|BB|A|B|A|BBB consists of 8 runs. This sequence contains 6 A's and 8 B's which we denote by (m,n)= (6,8). We would be suspicious of people claiming that the sequences AAAAAAABBBBBBB or ABABABABABABAB are random. The runs test was developed precisely to test the null hypothesis whether a sample of a sequence of binary symbols is randomly generated. The minimum number of runs of two symbols $m$ of one kind and $n$ of another kind is 2. On the other hand, the maximum possible number of runs is $2\, min(m, n)$ plus 1 if both m and n are different.

### The probability mass function of the distribution of runs

The distribution of the number of runs has been "derived" or explained in [Mood, Graybill and Boes,

*"Introduction to Statistics"*McGraw Hill, 3e, 1974] p.519-521. The probability mass function is given for any even number of runs $(z = 2k)$ as \[ prob(Z = z)=\frac{2\binom{m-1}{k-1}\binom{n-1}{k-1}}{\binom{m+n}{m}}\] and for an odd number of runs, $(z = 2k+1)$ as \[ prob(Z = z)=\frac{\binom{m-1}{k}\binom{n-1}{k-1} + \binom{m-1}{k-1} \binom{n-1}{k}}{\binom{m+n}{m}}\] Recall that $\binom{n}{r}$ is the number of combinations of $n$ things taken $r$ at a time.It can be computed as $\binom{n}{r} = \frac{n!}{r!(n-r)!}$ but this is inefficient, and instead we use the formula $\binom{n}{r} = \frac{n(n-1)\cdots (n-r+1)}{1\cdot 2\cdots r}$ It would be foolish to do by hand even with aid of a non-programmable scientific calculators, especialy when calculating the cumulative mass distribution function and is the reason we are writing Python codes.

Top | Intro | Pmf | Normal | Python | Swed | Bottom |

### The normal approximation to the distribution of runs

When m and n are both large, say m and n are both greater than 10, then an asymptotic normal approximation might be good enough. The mean is given by \[\mu = \frac{2mn}{m+n} +1\] and the variance would be given by \[\sigma^2 = \frac{2mn(2mn -m -n)}{(m+n)^2 (m+n-1}\]

Top | Intro | Pmf | Normal | Python | Swed | Bot |

### Python codes

Here is our latest version as of April 6. It is more complete now but as usual, users beware. It has not been thoroughly debugged.

Function | Description |
---|---|

nCr(n, r) | binomial coefficient |

maxruns(m,n) | maximum runs is (m,n) sequence |

pmfruns(z, m, n) | probability mass function |

cmftableruns(m, n) | cumulative mass function table |

pvalue(z, m, n, side=-1) | $p-$value of test statistic z |

normrunstest(z, m, n, alpha, side) | normal approximation test |

exactrunstest(z,m,n,alpha,side=0) | exact test for runs |

swedeisenhart | Swed and Eisenhart (m,n)cumulative distribution table |

isvalidruns(S) | return (m,n) if S is a valid runs sequence, else None |

# -*- coding: utf-8 -*- """ file testofruns.py author Ernesto P. Adorio version 0.0.1 April 5, 2010 """ from math import sqrt from basictests import ztest, ttest def nCr(n, r): """ Computes the binomial coefficient, number of combinations of n things taken r at a time. """ if r == 1: return n if r == 0 or r == n: return 1 if n < r or n < 0 or r < 0: return 0 if r > n-r: # find value of less looping. r = n-r f = 1 for i in range(1, r+1): f *= (n - i+1) f /= i return f def maxruns(m,n): # maximum runs with m symbols of the first kind and n symbols of the second.kind. return 2 * min(m, n) + (1 if m!=n else 0) def pmfruns(z, m, n): """ Computes the probability mass function for the distribution of runs. """ if z < 2: raise ValueError, "in pmfruns, z must satisfy z >= 2" if z % 2 ==0: #even z. k = z // 2 return 2.0 * nCr(m-1, k-1) * nCr (n-1, k-1)/ nCr(m+n, m) else : #odd z k = (z-1) // 2 return float(nCr(m-1, k) * nCr(n-1, k-1) + nCr(m-1, k-1) * nCr(n-1, k)) / float(nCr(m + n, m)) def cmftableruns(m, n): """ Returns a cumulative masss function for the distribution of runs. You must note that that the random variable z starts at 2 ! """ tot = 0 result=[] maxz = maxruns(m, n) for i in range(2, maxz+1): tot += pmfruns(i, m, n) result.append(tot) return result def pvalue(z, m, n, side=-1): "unchecked. to be tested first!" upperz = maxruns(m,n) if not (2 <= z <= maxruns) : raise ValueError, "in pvalue(), z must lie in (2, %d)" % upperz table = cmftableruns(m,n) if side == 0: p = table[z-2] if p > 0.5: p = 1-0.5 return p * 2 elif side == -1: return table[z-2] elif side == 1: return 1 - table[z-2] def normrunstest(z, m, n, alpha, side): #Test using normal approximation, when both m and n are greater than 10. mean = 2.0 * m *n / (m+n) + 1 variance = 2. * m * n * (2 *m *n -m -n)/((m+n)**2 * (m+n - 1)) return ztest(z - mean, sqrt(variance/n), alpha, side) def swedeisenhart(): print print """""" for i in range(2, 11): print '' % i, print " " for m in range(2, 11): for n in range(m, 11): rowvals = cmftableruns(m, n) maxcol = len(rowvals) print " " % (m,n), for j in range(min(maxcol, 9)): print "" % rowvals[j], for j in range(maxcol, 9): print "", print " " print "

(m, n) | %d | |
---|---|---|

(%d,%d) | %5.3f |

The Python module above calls the ztest which was already described in previous posts. We will add all files required in one zip file for convenience later.

Top | Intro | Pmf | Python | Normal | Swed | Bot |

### Swed and Eisenhart Table

Here is an html table obtained by running swedeisenhart() routine.(m, n) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|

(2,2) | 0.333 | 0.667 | 1.000 | ||||||

(2,3) | 0.200 | 0.500 | 0.900 | 1.000 | |||||

(2,4) | 0.133 | 0.400 | 0.800 | 1.000 | |||||

(2,5) | 0.095 | 0.333 | 0.714 | 1.000 | |||||

(2,6) | 0.071 | 0.286 | 0.643 | 1.000 | |||||

(2,7) | 0.056 | 0.250 | 0.583 | 1.000 | |||||

(2,8) | 0.044 | 0.222 | 0.533 | 1.000 | |||||

(2,9) | 0.036 | 0.200 | 0.491 | 1.000 | |||||

(2,10) | 0.030 | 0.182 | 0.455 | 1.000 | |||||

(3,3) | 0.100 | 0.300 | 0.700 | 0.900 | 1.000 | ||||

(3,4) | 0.057 | 0.200 | 0.543 | 0.800 | 0.971 | 1.000 | |||

(3,5) | 0.036 | 0.143 | 0.429 | 0.714 | 0.929 | 1.000 | |||

(3,6) | 0.024 | 0.107 | 0.345 | 0.643 | 0.881 | 1.000 | |||

(3,7) | 0.017 | 0.083 | 0.283 | 0.583 | 0.833 | 1.000 | |||

(3,8) | 0.012 | 0.067 | 0.236 | 0.533 | 0.788 | 1.000 | |||

(3,9) | 0.009 | 0.055 | 0.200 | 0.491 | 0.745 | 1.000 | |||

(3,10) | 0.007 | 0.045 | 0.171 | 0.455 | 0.706 | 1.000 | |||

(4,4) | 0.029 | 0.114 | 0.371 | 0.629 | 0.886 | 0.971 | 1.000 | ||

(4,5) | 0.016 | 0.071 | 0.262 | 0.500 | 0.786 | 0.929 | 0.992 | 1.000 | |

(4,6) | 0.010 | 0.048 | 0.190 | 0.405 | 0.690 | 0.881 | 0.976 | 1.000 | |

(4,7) | 0.006 | 0.033 | 0.142 | 0.333 | 0.606 | 0.833 | 0.955 | 1.000 | |

(4,8) | 0.004 | 0.024 | 0.109 | 0.279 | 0.533 | 0.788 | 0.929 | 1.000 | |

(4,9) | 0.003 | 0.018 | 0.085 | 0.236 | 0.471 | 0.745 | 0.902 | 1.000 | |

(4,10) | 0.002 | 0.014 | 0.068 | 0.203 | 0.419 | 0.706 | 0.874 | 1.000 | |

(5,5) | 0.008 | 0.040 | 0.167 | 0.357 | 0.643 | 0.833 | 0.960 | 0.992 | 1.000 |

(5,6) | 0.004 | 0.024 | 0.110 | 0.262 | 0.522 | 0.738 | 0.911 | 0.976 | 0.998 |

(5,7) | 0.003 | 0.015 | 0.076 | 0.197 | 0.424 | 0.652 | 0.854 | 0.955 | 0.992 |

(5,8) | 0.002 | 0.010 | 0.054 | 0.152 | 0.347 | 0.576 | 0.793 | 0.929 | 0.984 |

(5,9) | 0.001 | 0.007 | 0.039 | 0.119 | 0.287 | 0.510 | 0.734 | 0.902 | 0.972 |

(5,10) | 0.001 | 0.005 | 0.029 | 0.095 | 0.239 | 0.455 | 0.678 | 0.874 | 0.958 |

(6,6) | 0.002 | 0.013 | 0.067 | 0.175 | 0.392 | 0.608 | 0.825 | 0.933 | 0.987 |

(6,7) | 0.001 | 0.008 | 0.043 | 0.121 | 0.296 | 0.500 | 0.733 | 0.879 | 0.966 |

(6,8) | 0.001 | 0.005 | 0.028 | 0.086 | 0.226 | 0.413 | 0.646 | 0.821 | 0.937 |

(6,9) | 0.000 | 0.003 | 0.019 | 0.063 | 0.175 | 0.343 | 0.566 | 0.762 | 0.902 |

(6,10) | 0.000 | 0.002 | 0.013 | 0.047 | 0.137 | 0.287 | 0.497 | 0.706 | 0.864 |

(7,7) | 0.001 | 0.004 | 0.025 | 0.078 | 0.209 | 0.383 | 0.617 | 0.791 | 0.922 |

(7,8) | 0.000 | 0.002 | 0.015 | 0.051 | 0.149 | 0.296 | 0.514 | 0.704 | 0.867 |

(7,9) | 0.000 | 0.001 | 0.010 | 0.035 | 0.108 | 0.231 | 0.427 | 0.622 | 0.806 |

(7,10) | 0.000 | 0.001 | 0.006 | 0.024 | 0.080 | 0.182 | 0.355 | 0.549 | 0.743 |

(8,8) | 0.000 | 0.001 | 0.009 | 0.032 | 0.100 | 0.214 | 0.405 | 0.595 | 0.786 |

(8,9) | 0.000 | 0.001 | 0.005 | 0.020 | 0.069 | 0.157 | 0.319 | 0.500 | 0.702 |

(8,10) | 0.000 | 0.000 | 0.003 | 0.013 | 0.048 | 0.117 | 0.251 | 0.419 | 0.621 |

(9,9) | 0.000 | 0.000 | 0.003 | 0.012 | 0.044 | 0.109 | 0.238 | 0.399 | 0.601 |

(9,10) | 0.000 | 0.000 | 0.002 | 0.008 | 0.029 | 0.077 | 0.179 | 0.319 | 0.510 |

(10,10) | 0.000 | 0.000 | 0.001 | 0.004 | 0.019 | 0.051 | 0.128 | 0.242 | 0.414 |

Bottom

Top | Intro | Pmf | Python | Normal | Swed | Bot |

I have encountered the html problem: Your HTML cannot be accepted: Closing tag has no matching opening tag: A ???!!! I wonder where its !!

where is the basictest module? did you create it or is it a python module? "from basictests import ztest, ttest"

ReplyDelete