# How to compute p-values for Spearman correlation

Hello,

I have tried to compute p-values for Spearman correlations by using the StatsSpearmanRhoCDF function. However, when I compare the results to the output of other software or online calculators, I find some differences. It seems that some programs use an approximation while others claim to use an exact solution. And with my insufficient knowledge about statistics I can't get Igor to join any of these groups. I would appreciate any help with this. Wouldn't it be nice to include this information also in the "Live results" in the "Correlation Window?

Thank you,

Klaus

Typically you would use 1-cdf(alpha,degreesOfFreedom) to obtain the p-value.  Sometimes you need to modify your approach depending on the tail/tails assumption.

Hello Igor,

I think that I did what you proposed. But my knowledge in Statistics is not really solid, so I may have made a mistake somewhere: Probably an example helps:

I have two datasets (n=15):

```base128[0]= {31,68.9,60.4,53.3,49.9,52.4,43.6,39.7,65.9,52.5,48.6,44.4,85.2,22.4,42.3}
diff128[0]= {-4.7,-26.7,-16.3,-21.2,-10.3,-16.1,-6.7,-3.5,-20.1,-11.9,-27.3,-17.4,-41.8,-14.7,11.7}```

And here are the results from Igor Pro 8.04 (assuming that degrees of freedom is n-2=13):

StatsRankCorrelationTest base128,diff128
n = 15
sumDi2 = 966
sumTx = 0
sumTy = 0
SpearmanR = -0.725
Critical = 0.530273
print 1-statsspearmanRhoCDF(-0.725, 13)
0.00889036

This p-value is one sided, the two sided value would be 0.01778.

When I test the same data with R, I get the following result:

```> cor.test(base128, diff128, alternative = "two.sided", method = "spearman", exact = TRUE)

Spearman's rank correlation rho

data:  base128 and diff128
S = 966, p-value = 0.00313
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.725 ```

The difference in the p-values is quite substantial although the values for R are identical. What am I doing wrong here?

First a disclaimer: I am not a fan of P-values.  You can find some references (books, papers and presentations) by Geoff Cumming that explain the problem well.

Next, I note that the number of samples that you are using (15) is relatively small.  At this value there is a marked difference between the CDF calculated via approximation methods or the "exact" that is implied by your R calculation.  Note that when N is a large number one computes the probability using Student's t-distribution.  When the number of samples is "small", the normal approximations are not appropriate and in some instances one can consider computing the exact probabilities.  In this case, computing the exact probabilities involves evaluating all possible permutations (factorial(n)).

I actually downloaded and installed R on my machine to test it with your data.  It was interesting to see the extent of the difference between the methods used in R alone (I tried using exact=FALSE).

I'm not sure what method is used by R to provide the "exact" result.  The current implementation in Igor is using the Edgeworth series expansion to approximate the result.  I was unable to determine if one can use R to compute a similar approximation.  The literature indicates that this approximation is accurate within ~5e-4.

Looking at the code I see that exact calculation in this case involves iterations on the order of factorial(15).  I am going to attempt running this.  Please contact me through support@wavemetrics.com for more information.

Actually, after running this for a while I realized that based on current performance it is unrealistic to complete this calculation in reasonable time (in a single thread).

Forum

Support

Gallery