The following table will guide our writing of a Python function to test the difference of two means.
Condition | Test Statistic |
---|---|
$\sigma_1 = \sigma_2$, variances known | $z=\frac{ \overline{x_1}-\overline{x_2}-d_0 }{ \sqrt{ \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }}$ |
$\sigma_1 = \sigma_2$, variances unknown | $t=\frac{\overline{x_1}-\overline{x_2}-d_0}{Sp\sqrt{1/n_1 + 1/n_2}}$ $df = n_1 + n_2 - 2$, $S_p = \frac{(n_1 - 1) s_1^2 + (n_2-1) s_2^2}{n_1 + n_2 - 2}$ |
$\sigma_1 \ne \sigma_2$ variances unknown | $t=\frac{\overline{x_1}-\overline{x_2}-d_0}{\sqrt{s_1^2/n_1 + s_2^2/n_2}}$, $df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1 -1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$ |
from math import * """ file.... testofmeans.py author... dr.ernesto p. adorio UP at Clarkfield Angeles, Pampanga version 0.0.1 april 3, 2010 license.. educational use only. """ from rstats import * from numpy import * def ztest(samplestat, se, alpha =0.05, side=0): """ Normal test of a sample statistic. Arguments: teststat- teststatistic se - standard error of sample statistic alpha - significance level side - -1 left-sided test 0 double sided test +1 right-sided test """ Ztest = samplestat/se if side ==0: pvalue = pnorm(Ztest) if Ztest > 0.0: pvalue = 1.0 -pvalue pvalue *= 2.0 zcrit1 = qnorm(alpha/2) zcrit2 = qnorm(1-alpha/2.0) return pvalue, Ztest, (zcrit1,zcrit2) elif side == -1: pvalue = pnorm(Ztest) zcrit = qnorm(alpha) return pvalue, Ztest, zcrit else: pvalue = 1- pnorm(Ztest) zcrit = qnorm(1.0-alpha) return pvalue, Ztest, zcrit def ttest(samplestat, se, df, alpha= 0.05, side=0): """ T-test of a sample statistic. Arguments: samplestat- sample statistic se - standard error of sample statistic df degree of freedom alpha - significance level side - -1 left-sided test 0 double sided test +1 right-sided test """ Ttest = samplestat/se if side ==0: pvalue = pt(Ttest, df) if Ttest > 0: pvalue = 1-pvalue pvalue *=2 tcrit1 = qt(alpha/2, df) tcrit2 = qt(1-alpha/2.0,df) return pvalue, Ttest, (tcrit1,tcrit2) elif side == -1: pvalue = pt(Ttest, df) tcrit = qt(alpha, df) return pvalue, Ttest, tcrit else: pvalue = 1- pt(Ttest, df) tcrit = qt(1.0-alpha, df) return pvalue, Ttest, tcrit def xmeanttest(X, mu0, alpha= 0.05, side=0): xbar = mean(X) xvar = svar(X) df = len(X)-1 return ttest(xbar- mu0, sqrt(xvar/len(X)), df, alpha, side) def diffmeanztest(xbar1, var1, n1, xbar2, var2, n2, d0= 0.0, alpha = 0.5, side= 0): # the case where both population variances are # known and are given. samplestat = xbar1 - xbar2 - d0 se = sqrt(var1/n1 + var2/n2) return ztest(samplestat, se, alpha, side) def diffmeanttest(xbar1, svar1, n1, xbar2, svar2, n2, d0= 0.0, alpha = 0.5, side= 0, isvarequal=False): if isvarequal: df = n1 + n2 - 2 Sp = ((n1 -1.0)*svar1+(n2-1.0)*svar2)/df se = sqrt(Sp * (1.0/n1 + 1.0/n2)) else: df = (svar1/n1 + svar2/n2)**2 df/= ((svar1/n1)**2/(n1-1) + (svar2/n2)**2/ (n2-1)) df = int(ceil(df)) se = sqrt(svar1/n1 + svar2/n2) Tstat = (xbar1 - xbar2 - d0) return ttest(Tstat, se, df, alpha, side) def xydiffmeanttest(X,Y, d0=0.0, alpha= 0.05, side= 0, isvarequal=False,ispaired=False): """ Arguments: X,Y - input samples popmean0- assumed population mean, alpha - significance level. side -1 - left sided test 0 - double sided test 1 - right sided test 0 - z-test Return value: p-value, ztest, critical values. """ if not ispaired: n1 = len(X) n2 = len(Y) xbar1 = mean(X) xbar2 = mean(Y) svar1 = svar(X) svar2 = svar(Y) return diffmeanttest(xbar1, svar1, n1, xbar2, svar2, n2, svar1, svar2, d0, alpha,side, isvarequal) else:# paired t-test, n = len(X) diffs = [x-y for x, y in zip(X,Y)] dbar = mean(diffs) dvar = svar(diffs) se = sqrt(dvar/n) df = n - 1 return ttest(dbar-d0, se, df, alpha, side) def Walpole_Tests(): """ test cases from Walpole. """ print "example 3, page.310" mu0 = 8 xbar = 7.8 n = 50 sd = 0.5 df = n-1 alpha = 0.01 side = 0 #print ttest(7.8 - 8, sd/sqrt(50), df, alpha, side) print ztest(7.8 - 8, sd/sqrt(50), alpha, side) print "example 4, page.312" mu0 = 70 xbar = 71.8 alpha = 0.05 se = 8.9/ sqrt(100) print ztest(xbar- mu0, se, alpha, side= 1) print "example 5, page. 312" mu0 = 50 xbar = 42 sd = 11.9 n = 12 se = sd/ sqrt(n) df = n-1 side = -1 alpha = 0.01 print "alpha =", alpha, ttest(xbar -mu0, se, df, alpha, side) alpha = 0.05 print "alpha =", alpha, ttest(xbar -mu0, se, df, alpha, side) print "example 6, page.313" xbar1 = 85 sd1 = 4 svar1 = sd1 * sd1 n1 = 12 xbar2 = 81 sd2 = 5 svar2 = sd2 * sd2 n2 = 10 isvarequal =True alpha = 0.10 d0 = 0 side = 0 print diffmeanttest(xbar1, svar1, n1, xbar2, svar2, n2, d0, alpha, side, isvarequal=True ) print print "Example 7, p.314" X = [2.0, 2.0, 2.3, 2.1, 2.4] Y = [2.2, 1.9, 2.5, 2.3, 2.4] d0 = 0.0 print xydiffmeanttest(X, Y, d0, alpha=0.025, side=-1, isvarequal=True, ispaired=True) if __name__ == "__main__": Walpole_Tests()
We have included some problem instances from Walpole's text, "Introduction to Statistics",3e, Prentice Hall, paperback reprint. The output of the above program is as follows(edited):
Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55)
[GCC 4.4.1] on toto-laptop, Standard
>>> example 3, page.310
(0.0046777349810472298, -2.8284271247461925, (-2.5758293035489008, 2.5758293035489004))
example 4, page.312
(0.021563811339088912, 2.0224719101123565, 1.6448536269514722)
example 5, page. 312
alpha = 0.01 (0.01997646866681814, -2.3288078084959691, -2.7180791835355569)
alpha = 0.05 (0.01997646866681814, -2.3288078084959691, -1.7958848187036696)
example 6, page.313
(0.049963892471710292, 2.0863255923855117, (-1.7247182429207863, 1.7247182429207857))
Example 7, p.314
(0.094501829227587694, -1.5811388300841891, -2.7764451051977996)
I spent a long time wondering why I could not get exactly intermediate values in Ex. 7. It turns out that
the numpy.var function is population variance not sample variance!
Note that our routines are now much clearer. We have emphasized in the above code that tests for population means should focus on the test statistic (not necessarily the sample statistic) and the standard error of that statistic. For example , to test the Null hypothesis
$\mu = \mu_0$, the test statistic is $\overline{x} - \mu_0$ with standard error $\sigma/\sqrt(n)$.
All test functions return a triple: the (p-value, distribution test statistic, critical value(s)).
The tests accepts a parameter side which is -1 for a left-sided test, 0 for two-sided test and
1 for right-sided test.
Two types of input parameter sets are covered: summary statistics and sample arrays together with the
specification of significance level, the kind of test (side), and for the t-test, whether to perform a
paired test.
Visit soon for our Python codes for test of variances. And put comments if you have any problems with the code or if you have requests for features or to file bug reports.
No comments:
Post a Comment