fast linear fit for large number of small data sets

I have N datasets, each one 257 real-valued numbers. N is typically ~50 million. I extract the slope (I don't need to know the y-intercept, which is different for each dataset) of each dataset through a curvefit as shown below.
// phasediff: 256xN matrix
// maskwv, weightwv: 256 pt wave
// delaysT, sigmaDelayst: N pt wave
for (i=0; i < DimSize(Phasediff, 1) ; i +=1)
    matrixop/o pt=col(phasediff, i); copyscales phasediff, pt
    CurveFit/NTHR=0/Q line  pt  /W=weightwv /I=0 /M=maskwv
    delaysT[i] = w_coef[1]/(2*pi)
    sigmaDelaysT[i] = V_chisq/V_npnts
endfor


This obviously takes some time (~5 minutes/1e6 datasets) and I would like to speed it up. It seems like the global fit package would be useful, but before spending too much time on it, I wanted to know your opinion on how to best speed up this code.

Thanks,
Fabrizio
Here is an off-the-wall suggestion. Since you are looking for a linear phase behavior, for each section extract a 256-pt complex wave with unit amplitude, and your phase data. Convert to rect coordinates, take its FFT and find the peak location, which ought to be a delay. Given that FFTs are fast (and the MatrixOP version may also help), this could be quick. A potential slow-down might occur in actually finding the peak; WaveStats V_maxloc might be a drag. Statistics might be inferred from the width of the FFT peak.
fabrizio wrote:

This obviously takes some time (~5 minutes/1e6 datasets) and I would like to speed it up.


CurveFit should be plenty fast for doing something like this, especially since fitting a line doesn't require iteration.

The first step in optimizing CurveFit is to suppress updates and particularly the progress window. To do this, make sure that you are running a version of Igor >= 6.21, and add /W=2 to your command before the fit type:
CurveFit/NTHR=0/Q /W=2 line  pt /W=weightwv /I=0 /M=maskwv
This should give you a pretty serious speedup.

Also, to make things a bit more efficient, avoid the explicit MatrixOP and SetScale by using subrange syntax (as used in the multithreaded example below).

Next, make sure that you're not hit by this bug, which might have caused slower fitting, but I'm not sure if it even applies when fitting a line. Note that it is very unlikely that your Igor will have this bug unless you download nightly builds or are using 6.30b01.

The next step would be to use multithreading. Here's a solution:
Function RunMyFit(phasediff, weightwv, maskwv, delaysT, sigmaDelaysT)
    wave phasediff, weightwv, maskwv    // input
    wave delaysT, sigmaDelaysT           // output - will be filled by this function.
                                             // You need to provide these waves with the correct number of points!
   
    // make a dummy wave so that we can use MultiThread.
    // Multithread wants a wave assignment so we convert our problem into one. This is a
    // general trick, but can be confusing to novice users.
    Make /D/FREE /N=(DimSize(phasediff, 1)) W_Dummy
    MultiThread W_Dummy = MyFitWorker(phasediff, weightwv, maskwv, delaysT, sigmaDelaysT, p)
   
End

ThreadSafe Function MyFitWorker(phasediff, weightwv, maskwv, delaysT, sigmaDelaysT, index)
    wave phasediff, weightwv, maskwv    // input
    wave delaysT, sigmaDelaysT           // output - will be filled by this function
    variable index                      // the particular fit that we will be performing in this call to this function
    CurveFit /Q/W=2 line phasediff[][index] /W=weightwv /I=0 /M=maskwv  // note use of subrange syntax here
    wave W_Coef
    delaysT[index] = W_Coef[1] / (2 * pi)       // store results
    sigmaDelaysT[index] = V_chisq / V_npnts
   
    return 0
End


Note that the multithreaded version is more complex and difficult to understand for beginning or even intermediate programmers. Note also that I didn't test the code. To use this function, replace the entire for loop in your example with RunMyFit(phasediff, weightwv, maskwv, delaysT, sigmaDelaysT).
Nice explanation from 741!
What is the default value for the /NTHR flag?
If it is by default running multithreaded one can try turning that off explicitly.
CurveFit .. /NTHR=1 ..

Doing one line fit (which is just a linear regression) itself on multiple cores should not be very performant, at least for 256 points. And you want to do the multi threading for the 10e6 datasets.

You can also try the additonal flags /N/M=0 and see if that makes it even faster.
To do the point masking on all data before and then fit without maskwave might be worthwhile too.
As 741 speculated, a line fit uses totally different code from an iterated nonlinear fit. It ignores the /NTHR flag and it is never threaded. The /W=2 flag is not needed because line fits don't iterate and don't use the curve fit progress window. So really your only hope of speed-up is threading.

You might also look at the operation StatsLinearRegression. It will do multiple simultaneous linear regressions (I don't know about threading) and might do it faster than running the CurveFit overhead for every fit. It does lots more than you need, but does what you want on collections of waves. Oh, wait- I see you're using a mask wave and a weight wave. I guess you really do need CurveFit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
You could run multiple instances of Igor and let each instance work on a subset of the entire batch.
Thanks a lot for all those interesting comments!

I played around with simple linear curve fitting (following numerical recipes in xxx). To get the same results as curvefit ... /w=.../m=... I multiply the weights (1/sigma^2) with the mask. This speeds up my code immensely as I replace ~100'000 256 points curvefits in a for loop with a handful of matrixops on a 100'000x256 matrix.


However, I have a numerical discrepancy between slopes obtained with matrixop vs curvefit methods which is low (<10%) but still higher than what I would like to see, especially as the mean of the differences for a large dataset is non-zero. Let me know if it would be helpful for me to post an example experiment (not with 100's of millions of points ;) demonstrating this. Or maybe this is to be expected...?

This is the curvefitting code:
function getslopes1(dat[,SNR])
    wave dat, SNR
    if (paramisdefault(SNR))
        duplicate/o dat,SNR; SNR=100
    endif
    redimension/d SNR, dat
    variable i
    make/o/d/n=(dimsize(dat,1)) sl1=nan
    make /o/d/n=2 w_coef
    variable v_fitcoef=4+16
    matrixop/o maskmat=greater(snr,12)
    matrixop/o filt=sumcols(maskmat)^t; redimension/n=-1 filt
    for (i=0; i < DimSize(dat, 1) ; i +=1)
        if (filt[i] >20) // only fit if more than 20 pt have snr>12
            CurveFit/NTHR=0/Q/w=2/N/M=0 line  dat[][i]  /w=snr[][i] /m=maskmat[][i]
            sl1[i] = w_coef[1]
        endif
    endfor
end


And the matrixop code:
function getslopes2(dat[,SNR])
    wave dat, SNR
    if (paramisdefault(SNR))
        duplicate/o dat,SNR; SNR=100
    endif
    matrixop/o weight = greater(SNR, 12)*SNR // only consider data with snr>12
    redimension/d weight, dat
    matrixop/o filt=sumcols(greater(SNR, 12))^t; redimension/n=-1/d filt
    filt = filt[p]>20 ? 1 : nan // only consider fit if more than 20 pt have snr>12
    duplicate/o/r=[][0] dat, x1;copyscales dat, x1; x1=x
    matrixop/o datx=colrepeat(x1, numcols(dat))
    matrixop/o SS=sumcols(weight)
    matrixop/o Sxy=sumcols(dat*datx*weight)
    matrixop/o Sy=sumcols(dat*weight)
    matrixop/o Sx=sumcols(datx*weight)
    matrixop/o Sxx=sumcols(powr(datx,2)*weight)
    matrixop/o delta=SS*Sxx-powr(Sx,2)
    matrixop/o sl2=((SS*Sxy-Sx*Sy)/Delta)^t
    redimension/n=-1/d sl2
    sl2*=filt
end


The resulting slopes for a given subset of data and the difference between both are displayed in the attached images.


I'm using Igor 64 6.3.0.1 build 14102 on windows 7

Thanks again,
Fabrizio