RankCorrelation

Hello,

The following code gives the results shown below, which are unpredictable (right?):

Function test_rank_correlation()
    make/o wave1={12,15,32,54,77}
    make/o wave2={35,26,48,13,32}
    StatsrankCorrelationTest wave1,wave2

    make/o wave1={12,15,32,NaN,77}
    make/o wave2={35,26,48,13,32}
    StatsrankCorrelationTest wave1,wave2

    make/o wave1={12,15,32,54,77}
    make/o wave2={35,26,48,NaN,32}
    StatsrankCorrelationTest wave1,wave2

    make/o wave1={12,15,32,NaN,77}
    make/o wave2={35,26,48,NaN,32}
    StatsrankCorrelationTest wave1,wave2

    make/o wave1={12,NaN,32,54,77}
    make/o wave2={35,26,48,NaN,32}
    StatsrankCorrelationTest wave1,wave2

    make/o wave1={12,NaN,32,NaN,77}
    make/o wave2={35,NaN,48,NaN,32}
    StatsrankCorrelationTest wave1,wave2
End
  n = 5
  sumDi2 = 26
  sumTx = 0
  sumTy = 0
  SpearmanR = -0.3
  Critical = 0.849999

  n = 5
  sumDi2 = 26
  sumTx = 0
  sumTy = 0
  SpearmanR = -0.3
  Critical = 0.849999

  n = 5
  sumDi2 = 2
  sumTx = 0
  sumTy = 0
  SpearmanR = 0.9
  Critical = 0.849999

  n = 5
  sumDi2 = 2
  sumTx = 0
  sumTy = 0
  SpearmanR = 0.9
  Critical = 0.849999

  n = 5
  sumDi2 = 2
  sumTx = 0
  sumTy = 0
  SpearmanR = 0.9
  Critical = 0.849999

  n = 5
  sumDi2 = 20
  sumTx = 0
  sumTy = 0
  SpearmanR = 0
  Critical = 0.849999

This would be annoying, particularly when calculating a large pairs of data with occasional NaN. The StatsKendalltauTest reports an error, in contrast.

Any reason for these behaviors?

Hello Fusao,

I'm not sure that I fully understand your expectations.  You are applying a statistical test that sorts and ranks the data and you include in the data NaN values. 

The first question that one should ask is how do you expect to sort and rank NaN values?

FWIW, the operation performs the computation using STL vectors.  The default vector sort leaves the wave point containing a NaN value at its original position in the wave.  In general, if one wants to have the NaNs sorted to the start, end or some other position in the array, one needs to modify the default sorting comparison function -- something that was never considered appropriate for this operation.

One of the fastest tests to determine if your wave contains one or more NaNs is, e.g.,

numType(sum(waveName))

I hope this helps,

 

AG