## Thursday, April 1, 2010

### Z Test for means.

I was looking at the index to statistics when I discovered there were no codes for basic hypothesis testing!

Now the mean of a sample of size $n$ has best estimate value or test statistic $\overline{x} = \sum x_i$ with a standard error of $\srt{\sigma / n}$ where $\sigma^2$ is the known population variance. If the variance of the population is unknown, then it can be estimated from the values of the sample.

$s^2 = \frac{\sum_{i=1}^n (x_i - \overline{x})^2}{n-1}$

The assumption is that the sample is obtained from a normal population with variance known.

The null hypothesis : $H_0 = \mu = \mu_0$
The alternative hypostheses (select only 1!) are :$\mu < \mu, \mu \ne \mu_0, \mu > \mu_0$
The test statistic : $Z_{test} \frac{\overline{x} - \mu_0}{\sqrt{\sigma^2 /n}}$

The critical values are found by consulting the normal distribution tables or simply computed.
For a two-sided test, the test alpha ($\alpha$) is half that of the specified value.

The following Python code performs the necessary computations given the hypothesized population mean,
the population variance, the significance level alpha, the kind of test and the test statistic.
The test function performs the ztest for both two-sided and right-sided alternatives.

It should return the pvalue, the test statistic and the critical value(s). For a two sided kind of test,the critical values form a pair, otherwise only one critical value is returned.

from math import *

from rstats import *
def ztestcomp(popmean0, popvar, samplesize, xbar, alpha= 0.05, side= 0):
"""
Arguments:
xbar  - sample mean
popmean0- assumed population mean,
popvar - population variance
alpha - significance level.
side  -1 - left sided test
0 - double sided test
1 - right sided test
xbar - sample mean
Return value:
p-value, ztest, critical values.
"""
ztest = (xbar - popmean0) / sqrt(popvar/(samplesize-1.0))

if side ==0:
pvalue = pnorm(abs(ztest))/2.0
zcrit1 = qnorm(alpha/2)
zcrit2 = qnorm(1-alpha/2.0)
return pvalue, ztest, (zcrit1,zcrit2)
elif side == -1:
pvalue = pnorm(ztest)
zcrit = qnorm(alpha)
return pvalue, ztest, zcrit
else:
pvalue = 1- pnorm(ztest)
zcrit  = qnorm(1-alpha)
return pvalue, ztest, zcrit

if __name__ == "__main__":
popmean0 = 5
popvar  = 36
samplesize = 20
alpha = 0.10
side = 0
xbar = 7

print "two-sided:", ztestcomp(popmean0, popvar, samplesize, xbar, alpha, side)

print "right-sided:", ztestcomp(popmean0, popvar, samplesize, xbar, alpha, side=1)



When the above program is run, the output is

~/Blogs/statistics$python tests.py toto@toto-laptop:~/Blogs/statistics$ python tests.py
two-sided: (0.42067237303427146, 1.0, (-1.9599639845400545, 1.959963984540054))
right-sided: (0.15865525393145707, 1.0, 1.6448536269514722)

Since the $p$-value is greater than the specified significance level (5 %= 0.05), we accept, or "do not reject" the Null hypothesis that the true population mean is 5.

Textbooks recommend the ztest only for n > 30. Our sample size is smaller, and is only used for illustration.

The rstats.py may be obtained from our Wordpress blog at rstats.py.

The ztestcomp() above only performs the necessary computations. A separate routine should be written if an input input sample array is desired, and is left as an exercise.

We hasten to add that the z-test may already be available in scipy.stats but we present our own for pedagogical purposes.