Friday, April 2, 2010

t-test for means

Whenever the variance of the normal population is unknown, then the t-test is appropriate for testing the Null hypothesis that the population mean is some value $\mu_0$. The variance itself is estimated from the finite sample:

$$ s^2 = \frac{\sum_{i=1}^n (x_i - \overline{x})^2}{n-1}$$

The null hypothesis : $$ H_0 = \mu = \mu_0$$
The alternative hypostheses (select only 1!) are :$$\mu < \mu, \mu \ne \mu_0, \mu > \mu_0$$
The test statistic : $$t_{test}= \frac{\overline{x} - \mu_0}{\sqrt{s^2 /n}}$$

The critical values are found by consulting the t distribution tables with an additional parameter df for degrees of freedom or simply computed.

The following Python function ttestcomp performs the necessary computations given the hypothesized population mean, the significance level alpha, the kind of test.

It should return the pvalue, the test statistic and the critical value(s). For a two sided kind of test,the critical values form a pair, otherwise only one critical value is returned.


from math import *

from math import *
from rstats import *


def ttestcomp(X, popmean0, alpha= 0.05, side= 0):
    """
    Arguments:
     X  - input array
     popmean0- assumed population mean, 
     alpha - significance level
     side  -1 - left sided test
            0 - double sided test
            1 - right sided test
    Return value:
     p-value, ztest, critical values.
    """
    n = len(X)
    xbar = sum(X)/ float(n)
    s2   = sum([(x - xbar)**2  for x in X])/float(n)
     
    ttest = (xbar - popmean0) / sqrt(s2/(n-1.0))
    df = n -1      

    if side ==0:
       pvalue = pt(ttest, df)/2.0
       tcrit1 = qt(alpha/2, df) 
       tcrit2 = qt(1-alpha/2.0, df)
       return pvalue, ttest, (tcrit1,tcrit2)
    elif side == -1:
       pvalue = pt(ttest, df)
       tcrit  = qt(alpha)
       return pvalue, ttest, tcrit
    else:
       pvalue = 1- pt(ttest, df)
       tcrit  = qt(1-alpha,df) 
       return pvalue, ttest, tcrit

if __name__ == "__main__":
   popmean0 = 7
   popvar  = 36
   samplesize = 20
   alpha = 0.05
   side = 0
   X= rnorm(10, sqrt(popvar), size = samplesize) # note the difference in arguments with R rnomrm

   print "t-test for means"
   print "finished generating X"
   print X
   print "two-sided:", ttestcomp(X, popmean0, alpha, side)
   print "right-sided:", ttestcomp(X, popmean0, alpha, side=1)


The test code included in our Python code performs the ttest for both two-sided and right-sided alternatives. It first generates a random sample of size 20 from a normal distribution with mean 7 and variance 36.

When the above program is run, the output is


python ttestcomp.py
t-test for means
finished generating X
[ 4.7861463 17.75238035 9.82834098 20.48616187 16.22260671
20.04664846 6.02676326 18.18174814 10.20050244 -3.87442282
2.58404396 10.68370551 10.27102987 15.81674596 17.14011521
18.65564402 18.28997982 9.6119956 7.5671266 11.56864725]
two-sided: (0.49936270026320678, 3.4725570605535814, (-2.0930240544082634, 2.093024054408263))
right-sided: (0.0012745994735864352, 3.4725570605535814, 1.7291328115213671)

Note that for the two-sided test, we accept the Null hypothesis that the population mean is 7 whereas for the right-sided test, we REJECT the Null hypothesis that the population mean is 7. This is not unexpected since we generated a random sample with mean 10.

The rstats.py may be obtained from our Wordpress blog at rstats.py.


We hasten to add that the t-test may already be available in scipy.stats but we present our own for pedagogical purposes.

No comments:

Post a Comment