$$ r =\frac{\sum_{i = 1}^n (x_i - \overline{x} ) (y_i - \overline{y})}{S_x S_y} $$
where $S_x$ and $S_y$ are the standard deviations of the X and Y samples. We have For $S_x = \sqrt{\frac{\sum (x_i -\overline{x})^2}{n-1}}$ with a similar formula for $S_y$. The value of $r$ varies from -1 (perfect negative correlation) to +1 (perfect positive correlation). A negative correlation means that as one variable increases, the other other variable decreases, or in other words, both variables varies in opposite directions. On the other hand a positive correlation arises when both variables varies together in the same direction: if one variable increases, the other also increases and if one variable decreases, the other also decreases.
Correlation is generally used when it is difficult to determine the causative order. Does X causes Y or does Y causes X? It is heavily used in the Psychology field for example. In this post, we describe how to test the significance of the computed correlation coefficient.
The sample test statistic $t= \frac{r-\rho}{\sqrt{\frac{1-r^2}{n-2}}}$ follow assymptotically the t-distribution with $df=n-2$ degree of freedom and tables of the t-distribution may be used. Thus to write our Python program to use the ttest routine described previously in our blog, we specify the sample test statistic and the standard error.
def corrtest( rho, r, n, alpha = 0.5, side = 0): stattest = (r - rho) se = sqrt( (1-r*r)/(n-2.0)) return ttest(r-rho, se, alpha, side)
Neat, is not it? We obtain the p-value, the distribution test statistic and the critical limits(values). For side = 0, a double sided test, the two limits do not strictly form the true confidence interval. Here is the complete testforcorr.py Python file where we included the ttest function. It calls the rstats.py module we have written before to use R-names for the Python statistical functions.
# -*- coding: utf-8 -*- """ file testofcorr.py author Ernesto P. Adorio ernesto.adorio @ gmail.com UP at Clarkfield Angeles, Pampanga version 0.0.1 april 4, 2010 """ from math import * from rstats import * def ttest(samplestat, se, df, alpha= 0.05, side=0): """ T-test of a sample statistic. Arguments: samplestat- sample statistic se - standard error of sample statistic df degree of freedom alpha - significance level side - -1 left-sided test 0 double sided test +1 right-sided test """ Ttest = samplestat/se if side ==0: pvalue = pt(Ttest, df) if Ttest > 0: pvalue = 1-pvalue pvalue *=2 tcrit1 = qt(alpha/2, df) tcrit2 = qt(1-alpha/2.0,df) return pvalue, Ttest, (tcrit1,tcrit2) elif side == -1: pvalue = pt(Ttest, df) tcrit = qt(alpha, df) return pvalue, Ttest, tcrit else: pvalue = 1- pt(Ttest, df) tcrit = qt(1.0-alpha, df) return pvalue, Ttest, tcrit def cov(X,Y): n = len(X) if n != len(Y): raise "ArgumentError", "in cov: len(X) != len(Y)" xbar = float(sum(X))/ n ybar = float(sum(Y))/n return sum([(x -xbar)*(y-ybar) for x,y in zip(X,Y)])/(n-1) def Sx(X): xbar = float(sum([x for x in X]))/n return sqrt( sum([(x - xbar)**2 for x in X]) /(n-1.0)) def cor(X,Y): # correlation coefficient of X and Y. return cov(X,Y)/(Sx(X)*Sx(Y)) def corrtest( rho, r, n, alpha = 0.5, side = 0): stattest = (r - rho) se = sqrt( (1-r*r)/(n-2.0)) return ttest(r-rho, se, n-2, alpha, side) def xycorrtest(X,Y, rho, alpha = 0.5, side= 0): if len(X) != len(Y): raise ArgumentError, "in xycorrtest: X and Y must have same length" r = cor(X,Y) return corrtest(rho, r, len(X), alpha, side) def Test(): #Berenson , p. 546 rho = 0 r = 0.951 alpha = 0.05 side = 1 n = 14 print "Berenson's example, p. 546" print corrtest( rho, r, n, alpha, side) if __name__ == "__main__": Test()
When we invoke the Python interpreter it will run the Berenson, p. 546 example. The output is
toto@toto-laptop:~/Blogs/statistics$ python testofcorr.py
Berenson's example, p. 546
(8.9865653363219167e-08, 10.654779470653486, 1.7822875556491591)
The extremely low p-value means that we can reject the Null hypothesis that the correlation coefficient is zero.
To support this decision, the obtained distribution test statistic is greater then 1.78 critical value limit, i.e.,
is in the rejection region.
I need to specify a darker background for <code> blocks and a central download repository for
the files. Stay tuned. We will add more useful technical contents soon!
No comments:
Post a Comment