How to compute p-values for Spearman correlation

Typically you would use 1-cdf(alpha,degreesOfFreedom) to obtain the p-value. Sometimes you need to modify your approach depending on the tail/tails assumption.

Log in or register to post comments

January 28, 2020 at 05:21 pm - Permalink

Klaus

Hello Igor,

I think that I did what you proposed. But my knowledge in Statistics is not really solid, so I may have made a mistake somewhere: Probably an example helps:

I have two datasets (n=15):

base128[0]= {31,68.9,60.4,53.3,49.9,52.4,43.6,39.7,65.9,52.5,48.6,44.4,85.2,22.4,42.3}
diff128[0]= {-4.7,-26.7,-16.3,-21.2,-10.3,-16.1,-6.7,-3.5,-20.1,-11.9,-27.3,-17.4,-41.8,-14.7,11.7}

And here are the results from Igor Pro 8.04 (assuming that degrees of freedom is n-2=13):

•StatsRankCorrelationTest base128,diff128
  n = 15
  sumDi2 = 966
  sumTx = 0
  sumTy = 0
  SpearmanR = -0.725
  Critical = 0.530273
•print 1-statsspearmanRhoCDF(-0.725, 13)
  0.00889036

This p-value is one sided, the two sided value would be 0.01778.

When I test the same data with R, I get the following result:

> cor.test(base128, diff128, alternative = "two.sided", method = "spearman", exact = TRUE)

	Spearman's rank correlation rho

data:  base128 and diff128
S = 966, p-value = 0.00313
alternative hypothesis: true rho is not equal to 0
sample estimates:
   rho 
-0.725

The difference in the p-values is quite substantial although the values for R are identical. What am I doing wrong here?

Log in or register to post comments

January 29, 2020 at 03:20 am - Permalink

Igor

First a disclaimer: I am not a fan of P-values. You can find some references (books, papers and presentations) by Geoff Cumming that explain the problem well.

Next, I note that the number of samples that you are using (15) is relatively small. At this value there is a marked difference between the CDF calculated via approximation methods or the "exact" that is implied by your R calculation. Note that when N is a large number one computes the probability using Student's t-distribution. When the number of samples is "small", the normal approximations are not appropriate and in some instances one can consider computing the exact probabilities. In this case, computing the exact probabilities involves evaluating all possible permutations (factorial(n)).

I actually downloaded and installed R on my machine to test it with your data. It was interesting to see the extent of the difference between the methods used in R alone (I tried using exact=FALSE).

I'm not sure what method is used by R to provide the "exact" result. The current implementation in Igor is using the Edgeworth series expansion to approximate the result. I was unable to determine if one can use R to compute a similar approximation. The literature indicates that this approximation is accurate within ~5e-4.

Looking at the code I see that exact calculation in this case involves iterations on the order of factorial(15). I am going to attempt running this. Please contact me through support@wavemetrics.com for more information.

Actually, after running this for a while I realized that based on current performance it is unrealistic to complete this calculation in reasonable time (in a single thread).

Log in or register to post comments

January 29, 2020 at 05:25 pm - Permalink

How to compute p-values for Spearman correlation

Igor Pro 9

Igor XOP Toolkit

Igor NIDAQ Tools MX