Judging randomness

Hi Igorians,

I'm currently diving a bit into statistics.
The reason is I want to create a noise distribution with some frequencies removed.

But for that I would like to compare the randomness of the noise before and after filtering so that I can be sure not to mess up everything.

The following code tries to do that:
Function GetUserCDF(inX) : CDFFunc
    Variable inX
    return StatsNormalCDF(inX,0,1)
End

Function NewStuff()

    Make/O/N=1000 data

    SetRandomSeed/BETR=1 4711

    data = gnoise(1, 1)
    StatsKSTest/CDFF=GetUserCDF data
    print "#############################"

    SetRandomSeed/BETR=1 4711

    data = gnoise(1, 2)
    StatsKSTest/CDFF=GetUserCDF data
    print "#############################"

        // data has 200kHz resolution
        // cut off above 30Hz
    FilterFIR/LO={30/200e3, 30/200e3, 999} data

    StatsKSTest/CDFF=GetUserCDF data
    print "#############################"
End


This gives me

•Newstuff() alpha = 0.05 N = 1000 D = 0.0259096 Critical = 0.0427766 PValue = 0.256714 ############################# alpha = 0.05 N = 1000 D = 0.0305389 Critical = 0.0427766 PValue = 0.15174 ############################# alpha = 0.05 N = 1000 D = 0.500762 Critical = 0.0427766 PValue = 3.63349e-265 #############################

And from http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm I would say, because D is larger than Critical after the filtering, that the hypothesis that the numbers come from a gaussian distribution is now rejected.

Does that make remotely sense?
Can I expect to keep the randomness with filtering?
Thomas,
Let me propose a different approach, if your goal is to achieve filtered Gaussian noise. I have done something roughly similar but for specific cases of white noise filtered by various causal impulse responses (which act as smoothing, or low-pass filters). If you can cast your frequency removal in the form of a causal impulse response, then find its auto-correlation function. Use the auto-correlation function to calculate a large, symmetric covariance matrix, which can be used as described in an Exchange posting to generate correlated Gaussian variates (http://www.igorexchange.com/node/5141). That example was for a bivariate distribution, but I extended the concept to a much larger multi-variate dimension size matching the row (or column) dimension of the covariance matrix. Finally, feed in uncorrelated white Gaussian noise and use Bech's Cholesky decomposition procedure with the large covariance matrix to generate filtered Gaussian noise.

There may be problems involving the size of the matrix you require, the ease of finding the Cholesky decomposition, and possible edge effects. On the other hand, if you can show your frequency filter is realizable, any such linear filtering (if properly implemented!) must preserve Gaussian randomness.
Hello Thomas,

If you are looking for Gaussian distributions you may want to use StatsJBTest. In general I would stay away from the KS tests because there are complications in figuring out proper critical values etc. Read Stephens reference for more information or contact me directly for recent practical examples.

A.G.
WaveMetrics, Inc.
Thanks for your replies Stephen and AG.

@Stephen: I'm not sure I can follow your approach 100%. I need to filter with one or multiple low/high-pass filters large waves (>10^5 points). I guess the symmetric covariance matrix is squared the size of the required points?

@AG: Thanks I'll look into that.

@Stephen: I'm not sure I can follow your approach 100%. I need to filter with one or multiple low/high-pass filters large waves (>10^5 points). I guess the symmetric covariance matrix is squared the size of the required points?


Yes, that is correct. For such large waves the method I suggested would be impractical. The covariance matrix storage requires only upper (or lower) triangular storage because of symmetry, but even so the Cholesky numerical computation would probably have many difficulties. I repeat my earlier claim: if you can prove your filtering is a realizable linear operation on the original wave, the filtered wave statistics must remain Gaussian. The single-point probability distribution function will be Normal. However, the filtered correlation properties will change.