Friday, April 2, 2010

Two sample tests for difference of means

We are interested in testing the difference of means $d_0 = \mu_1- \mu_2$ given samples obtained from two normal populations.

The following table will guide our writing of a Python function to test the difference of two means.

Condition Test Statistic
$\sigma_1 = \sigma_2$, variances known$z=\frac{ \overline{x_1}-\overline{x_2}-d_0 }{ \sqrt{ \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }}$

$\sigma_1 = \sigma_2$, variances unknown $t=\frac{\overline{x_1}-\overline{x_2}-d_0}{Sp\sqrt{1/n_1 + 1/n_2}}$
$df = n_1 + n_2 - 2$,
$S_p = \frac{(n_1 - 1) s_1^2 + (n_2-1) s_2^2}{n_1 + n_2 - 2}$
$\sigma_1 \ne \sigma_2$ variances unknown $t=\frac{\overline{x_1}-\overline{x_2}-d_0}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}$,
$df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1 -1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$

from math import *
"""
file....  testofmeans.py

author... dr.ernesto p. adorio
          UP at Clarkfield
          Angeles, Pampanga

version   0.0.1  april 3, 2010

license.. educational use only.
"""

from rstats import *
from numpy import *


def ztest(samplestat, se, alpha =0.05, side=0):
    """
    Normal test of a sample statistic.
    Arguments:
     teststat- teststatistic
     se -    standard error of sample statistic
     alpha - significance level
     side  - -1 left-sided test
             0  double sided test
             +1 right-sided test
    """
    Ztest = samplestat/se
    if side ==0:
       pvalue = pnorm(Ztest)
       if Ztest > 0.0:
           pvalue = 1.0 -pvalue
       pvalue *= 2.0
       zcrit1 = qnorm(alpha/2) 
       zcrit2 = qnorm(1-alpha/2.0)
       return pvalue, Ztest, (zcrit1,zcrit2)
    elif side == -1:
       pvalue = pnorm(Ztest)
       zcrit = qnorm(alpha)
       return pvalue, Ztest, zcrit
    else:
       pvalue = 1- pnorm(Ztest)
       zcrit  = qnorm(1.0-alpha) 
       return pvalue, Ztest, zcrit

def ttest(samplestat, se, df, alpha= 0.05, side=0):
    """
    T-test of a sample statistic.
    Arguments:
     samplestat- sample statistic
     se -    standard error of sample statistic
     df      degree of freedom
     alpha - significance level
     side  - -1 left-sided test
             0  double sided test
             +1 right-sided test
    """
    Ttest = samplestat/se
    if side ==0:
        pvalue = pt(Ttest, df)
        if Ttest > 0:
            pvalue = 1-pvalue
        pvalue *=2
        tcrit1 =    qt(alpha/2,   df) 
        tcrit2 =    qt(1-alpha/2.0,df)
        return pvalue, Ttest, (tcrit1,tcrit2)
    elif side == -1:
       pvalue = pt(Ttest,  df)
       tcrit = qt(alpha,  df)
       return pvalue, Ttest, tcrit
    else:
       pvalue = 1- pt(Ttest,  df)
       tcrit  = qt(1.0-alpha,  df) 
       return pvalue, Ttest, tcrit

def xmeanttest(X, mu0, alpha= 0.05, side=0):
    xbar = mean(X)
    xvar = svar(X)
    df = len(X)-1
    return ttest(xbar- mu0, sqrt(xvar/len(X)), df,  alpha, side)



def diffmeanztest(xbar1, var1, n1,
                  xbar2, var2, n2,
                  d0= 0.0, alpha = 0.5,
                  side= 0):
    # the case where both population variances are
    # known and are given.
    samplestat = xbar1 - xbar2 - d0
    se = sqrt(var1/n1 + var2/n2)
    return ztest(samplestat, se, alpha, side)


def diffmeanttest(xbar1, svar1, n1,
                  xbar2, svar2, n2,
                  d0= 0.0, alpha = 0.5,
                  side= 0, isvarequal=False):
      if isvarequal:
         df = n1 + n2 - 2
         Sp = ((n1 -1.0)*svar1+(n2-1.0)*svar2)/df
         se = sqrt(Sp * (1.0/n1 + 1.0/n2))
      else:
         df = (svar1/n1 + svar2/n2)**2 
         df/= ((svar1/n1)**2/(n1-1) + (svar2/n2)**2/ (n2-1))
         df = int(ceil(df))
         se = sqrt(svar1/n1 + svar2/n2)
      Tstat = (xbar1 - xbar2 - d0)
      return ttest(Tstat,  se, df, alpha, side) 


def xydiffmeanttest(X,Y, d0=0.0, alpha= 0.05, side= 0, 
                   isvarequal=False,ispaired=False):
    """
    Arguments:
     X,Y  - input samples
     popmean0- assumed population mean, 
     alpha - significance level.
     side  -1 - left sided test
            0 - double sided test
            1 - right sided test
       0 - z-test
    Return value:
     p-value, ztest, critical values.
    """
    if not ispaired:
      n1 = len(X)
      n2 = len(Y)
      xbar1 = mean(X)
      xbar2 = mean(Y)
      svar1 = svar(X)
      svar2 = svar(Y)
      return diffmeanttest(xbar1, svar1, n1,
                           xbar2, svar2, n2,
                           svar1, svar2,
                           d0,
                           alpha,side,
                           isvarequal)

    else:# paired t-test,
      n = len(X)
      diffs = [x-y for x, y in zip(X,Y)]
      dbar = mean(diffs) 
      dvar = svar(diffs)
      se    = sqrt(dvar/n)
      df    = n - 1
      return ttest(dbar-d0, se, df, alpha, side) 


def Walpole_Tests():
    """
    test cases from Walpole.
    """
    print "example 3, page.310"
    mu0 = 8
    xbar = 7.8
    n = 50
    sd = 0.5
    df = n-1
    alpha = 0.01
    side = 0
    #print ttest(7.8 - 8, sd/sqrt(50), df,  alpha, side) 
    print ztest(7.8 - 8, sd/sqrt(50), alpha, side) 
    
    print "example 4, page.312"
    mu0 = 70
    xbar = 71.8
    alpha = 0.05
    se = 8.9/ sqrt(100)
    print ztest(xbar- mu0,  se,  alpha,  side= 1)
    
    print "example 5, page. 312"
    mu0 = 50
    xbar = 42
    sd = 11.9
    n   = 12
    se = sd/ sqrt(n)
    df = n-1
    side = -1
    alpha = 0.01
    print "alpha =",  alpha,  ttest(xbar -mu0,  se, df,  alpha,  side)
    alpha = 0.05
    print "alpha =",  alpha,  ttest(xbar -mu0,  se, df,  alpha,  side)

    print "example 6, page.313"
    xbar1 = 85
    sd1 = 4
    svar1 = sd1 * sd1
    n1 = 12
    xbar2 = 81
    sd2 = 5
    svar2 = sd2 * sd2
    n2 = 10
    isvarequal =True
    alpha = 0.10
    d0 = 0
    side  = 0
    print diffmeanttest(xbar1, svar1,  n1,  xbar2,  svar2,  n2,  d0,  alpha,   side,  isvarequal=True )
    
    print 
    print     "Example 7, p.314"
    X = [2.0, 2.0,  2.3,  2.1,  2.4]
    Y = [2.2, 1.9,  2.5,  2.3,  2.4]
    d0 = 0.0
    print xydiffmeanttest(X, Y, d0,  alpha=0.025,  side=-1, isvarequal=True,   ispaired=True)

if __name__ == "__main__":
   Walpole_Tests()

We have included some problem instances from Walpole's text, "Introduction to Statistics",3e, Prentice Hall, paperback reprint. The output of the above program is as follows(edited):


Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55)
[GCC 4.4.1] on toto-laptop, Standard
>>> example 3, page.310
(0.0046777349810472298, -2.8284271247461925, (-2.5758293035489008, 2.5758293035489004))

example 4, page.312
(0.021563811339088912, 2.0224719101123565, 1.6448536269514722)

example 5, page. 312
alpha = 0.01 (0.01997646866681814, -2.3288078084959691, -2.7180791835355569)
alpha = 0.05 (0.01997646866681814, -2.3288078084959691, -1.7958848187036696)

example 6, page.313
(0.049963892471710292, 2.0863255923855117, (-1.7247182429207863, 1.7247182429207857))

Example 7, p.314
(0.094501829227587694, -1.5811388300841891, -2.7764451051977996)



I spent a long time wondering why I could not get exactly intermediate values in Ex. 7. It turns out that
the numpy.var function is population variance not sample variance!

Note that our routines are now much clearer. We have emphasized in the above code that tests for population means should focus on the test statistic (not necessarily the sample statistic) and the standard error of that statistic. For example , to test the Null hypothesis
$\mu = \mu_0$, the test statistic is $\overline{x} - \mu_0$ with standard error $\sigma/\sqrt(n)$.

All test functions return a triple: the (p-value, distribution test statistic, critical value(s)).
The tests accepts a parameter side which is -1 for a left-sided test, 0 for two-sided test and
1 for right-sided test.


Two types of input parameter sets are covered: summary statistics and sample arrays together with the
specification of significance level, the kind of test (side), and for the t-test, whether to perform a
paired test.


Visit soon for our Python codes for test of variances. And put comments if you have any problems with the code or if you have requests for features or to file bug reports.

No comments:

Post a Comment