Sunday, May 16, 2010

Python, Statistics: The nonparametric Mann-Whitney Test

Draft! Untested. Do not use yet.

def mannwhitney(S1, S2):
    """
    Returns the Mann-Whitney U statistic of two samples S1 and S2.
    """
    # Form a single array with a categorical variable indicate the sample
    X = [(s, 0) for s in S1]
    X.extend([(s,1) for s in S2])
    R = Rank(X)

    # Compute needed parameters.
    n1 = len(S1)
    n2 = len(S2)

    # Compute total ranks for sample 1.          
    R1 = sum([R[i] for i, (x,j) in enumerate(X) if j == 0])
    u1 = R1 - (n1 + (n1+1)/2.0)
    u2 = n1 * n2 - u1
    U = min(u1, u2)

    mU     = n1 * n2 / 2.0
    sigmaU = sqrt((n1 *n2)*(n1 + n2 + 1)/12.0)
    return U, mu, sigmaU

Still needs to find resources for computing the discrete distribution function of the Mann-Whitney test. Blogger will appreciate any help. Failing this, the scipy module has a mannwhitneyu function which returns the U statistic and the p-value of the test.

No comments:

Post a Comment