How to make a batch fit program run on multiple cores?

Hello,
Below I have pasted code for a batch fitting program. It fits a bunch of waves to a Gaussian. I am analyzing large data sets, 500+ waves each over 200 pts, and am running into the program taking like 10mins to complete. I am using an intel dual core i5 processor and I am wondering if there is anyway to work the code to run on multiple cores so that it finishes faster? Perhaps some way to split the waves in half and run each set on a different core in parallel?


function BatchGauss(Nmax)
variable Nmax
variable ii
wave W_coef,W_sigma
 
make /N=(Nmax) W_A
make /N=(Nmax) W_Aerr
make /N=(Nmax) W_width
make /N=(Nmax) W_widtherr
make /N=(Nmax) W_y0
make /N=(Nmax) W_y0err
 
 
    for(ii=0;ii<Nmax;ii+=1)
   
    string loopwave="wave"+num2str(ii)

    CurveFit/M=2/W=0 gauss, $loopwave/D
   
    W_A[ii]=W_coef[1]
    W_Aerr[ii]=W_sigma[1]
    W_width[ii]=W_coef[0]
    W_widtherr[ii]=W_sigma[0]
    W_y0[ii]=W_coef[3]
    W_y0err[ii]=W_sigma[3]
   
    endfor

end


Did you try how much improvement you get using the /NTHR = 0 flag on CurveFit?
Hi,

I guess there is not so much profit from doing each curve fit in parallel with only 200 points.
The fact that you have 500+ fits to perform suggest that you want to do multiple curvefits in parallel, each on its own core.

So you might want to have a look at
DisplayHelpTopic "ThreadSafe Functions and Multitasking"

which is a bit involved.

For your current appraoch I would also add the flags "/N/Q" to CurveFit.
Have you checked that single precision coefficients are precious enough?
If not you might want to add "/D" to the Make calls to use double precision.

Hope that helps,
Thomas
Yeah, especially with a built-in fit function you won't get any help from multiple cores on a single fit. In fact, the overhead of threading would probably make it take longer.

The solution would be to do multiple fits in parallel. We ship an example experiment that does just that. You can probably adapt it to your situation fairly easily. To view it, select File->Example Experiments->Curve Fitting->MultipleFitsInThreads.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
I used the recommendations to use multiple threads and I've been trying to get it to work, but keep getting errors about the thread index/ID.

Exact error is: "while executing a user function, the following error occurred: Invalid Thread Group ID or index"

Here is the code

ThreadSafe Function lingaussFit(coefwave,data)
wave data, coefwave

    FuncFit/NTHR=0 linguass coefwave  data /D

End

function BatchlinGauss(Nmax)
    variable Nmax
    variable ii,jj
    wave W_coef1,W_sigma1
    wave W_coef2,W_sigma2
    Variable Nmid=Nmax/2   //split total number in half to run on separate cores
   
    Variable tgID= ThreadGroupCreate(2)
 
     //Create batch coeficent waves
    make /N=(Nmax) W_A
    make /N=(Nmax) W_Aerr
    make /N=(Nmax) W_width
    make /N=(Nmax) W_widtherr
    make /N=(Nmax) W_y0
    make /N=(Nmax) W_y0err
    make /N=(Nmax) W_B
    make /N=(Nmax) W_Berr
    make /N=(Nmax) W_c
    make /N=(Nmax) W_cerr
 
    for(ii=0;ii<Nmid;ii+=1)
   
    jj=ii+Nmid
   
    string loopwave1="wave"+num2str(ii)
    string loopwave2="wave"+num2str(jj)

    //Parallel on different cores
    ThreadStart tgID, 0, lingaussFit(W_coef1,$loopwave1)
    ThreadStart tgID, 1, lingaussFit(W_coef2,$loopwave2)
   
    //Write results from first thread
    W_A[ii]=W_coef1[0]
    W_Aerr[ii]=W_sigma1[0]
    W_width[ii]=W_coef1[3]
    W_widtherr[ii]=W_sigma1[3]
    W_y0[ii]=W_coef1[4]
    W_y0err[ii]=W_sigma1[4]
    W_B[ii]=W_coef1[1]
    W_Berr[ii]=W_sigma1[1]
    W_c[ii]=W_coef1[2]
    W_cerr[ii]=W_sigma1[2]
   
    //write results from second thread.
    W_A[jj]=W_coef2[0]
    W_Aerr[jj]=W_sigma2[0]
    W_width[jj]=W_coef2[3]
    W_widtherr[jj]=W_sigma2[3]
    W_y0[jj]=W_coef2[4]
    W_y0err[jj]=W_sigma2[4]
    W_B[jj]=W_coef2[1]
    W_Berr[jj]=W_sigma2[1]
    W_c[jj]=W_coef2[2]
    W_cerr[jj]=W_sigma2[2]
   
    endfor

end



Function linguass(w,x) : FitFunc
    Wave w
    Variable x

    //CurveFitDialog/ These comments were created by the Curve Fitting dialog. Altering them will
    //CurveFitDialog/ make the function less convenient to work with in the Curve Fitting dialog.
    //CurveFitDialog/ Equation:
    //CurveFitDialog/ f(x) = a*x+b*exp(-(x-c)^2/d^2)+y_0
    //CurveFitDialog/ End of Equation
    //CurveFitDialog/ Independent Variables 1
    //CurveFitDialog/ x
    //CurveFitDialog/ Coefficients 5
    //CurveFitDialog/ w[0] = a
    //CurveFitDialog/ w[1] = b
    //CurveFitDialog/ w[2] = c
    //CurveFitDialog/ w[3] = d
    //CurveFitDialog/ w[4] = y_0

    return w[0]*x+w[1]*exp(-(x-w[2])^2/w[3]^2)+w[4]
End

<pre><code class="language-igor">
I think you need to call ThreadGroupWait at the end of the for loop, just before the endfor. Otherwise you will be trying to start a thread while it is still running from the previous iteration of the loop.

Something like this:
    // Wait for all threads to finish before starting another batch of threads
    do
        Variable threadGroupStatus = ThreadGroupWait(tgID,100)
    while (threadGroupStatus != 0)
Here's how I would do this (note - untested)
Function MultiThreadIt(Nmax)
    variable nMax
 
    Make /N=(Nmax) /D /O W_A, W_Aerr, W_width, W_widtherr, W_y0, w_y0err
 
    Make /N=(Nmax) /FREE W_Dummy
    // W_Dummy exists only so the MultiThread keyword can be used.
    // since MultiThread expects a wave assignment, we give it one,
    // even though its contents will be discarded
 
 
    // pack the data waves into a 'wave wave'. I use a wave wave simply
    // because I like it, but it's entirely optional
    Make /WAVE /FREE W_DataWaves
    variable i
    for (i = 0; i < Nmax; i+=1)
        W_DataWaves[i] = $("wave" + num2istr(i))
    endfor
 
    MultiThread W_Dummy = RunAFit(W_DataWaves, W_A, W_Aerr, W_width, W_widtherr, W_y0, w_y0err, p)
End
 
 
ThreadSafe Function RunAFit(dataWaves, W_A, W_Aerr, W_width, W_widtherr, W_y0, w_y0err, index)
    wave /WAVE dataWaves
    wave W_A, W_Aerr, W_width, W_widtherr, W_y0, w_y0err
    variable index
 
    wave thisData = dataWaves[index]
    CurveFit/M=2/W=0 gauss, thisData /D
 
    wave W_coef
    wave W_sigma
 
    W_A[index]=W_coef[1]
    W_Aerr[index]=W_sigma[1]
    W_width[index]=W_coef[0]
    W_widtherr[index]=W_sigma[0]
    W_y0[index]=W_coef[3]
    W_y0err[index]=W_sigma[3]
 
    return 0    // make the multithread wave assignment happy
End
Thank you all for your comments. I have gotten the program to work with the threads fairly well, but now am having an issue with the error wave.

I have a different coefficient wave for each thread but the fits try to write to the same error wave and I keep getting problems from that.

Is there a way to specify a different error wave when fitting? I know I can specify the coefficient wave, but the error wave I cant figure out.
You specify the weighting wave using the /W flag. From the CurveFit help:
Quote:
/W=wghtwaveName: wghtwaveName contains weighting values applied during the fit, and must have the same length as waveName. These weighting values can be either the reciprocal of the standard errors, or the standard errors. See the /I parameter above for details.


But I suspect by "error wave" you mean "destination wave". For that you need /D=destwaveName.

Or perhaps you mean "residual wave". For that you need /R=residwaveName.
I expect he means the W_sigma wave. There is no option to write that information to a different wave. Under ordinary circumstances, I would tell you to copy W_sigma to a new wave after the fit is done in order to preserve it. I confess that I haven't thought about implications for curve fitting.

I *think* that if you do curve fits in separate threads using the (cumbersome) thread group coding, then the curve fit will take place in its own virtual data folder, and W_sigma will be distinct for each thread. I'm not sure how this applies to using the Multithread keyword, though. I would try adding a sigma wave to the input parameters of 741's RunAFit function, and inside that function copy W_sigma to that wave. To access W_sigma, you must add *after the call to CurveFit* a WAVE statement to connect with W_sigma.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Quote:
I would try adding a sigma wave to the input parameters of 741's RunAFit function, and inside that function copy W_sigma to that wave.


It is permitted to write to a thread worker input function's parameter wave but you can not redimension it.

It's up to the programmer to make sure that each thread gets a separate wave to write to.

I don't see how you can do this with Multithread.
Just a note that I corrected some typos in my code sample above. It now actually compiles.

hrodstein wrote:
I don't see how you can do this with Multithread.


For W_sigma RunAFit() would either have to passed a 2D wave so that each iteration can access its own row or column in the wave, or it could take a wave wave. I'd probably use a 2D wave.
As a quick 'n' dirty workaround, I simply load multiple instances of Igor and ran each fit in the batch in its own instance. This works quite well unless you have a larger batch than you have cores, at that point, it's advisable to use the multithreader project here: http://www.igorexchange.com/project/multithreader , which will run your large batch a few at a time in parallel.
So using all the suggestions I got code that is "working". What I mean is that when I run the code using a small sample set(10-50 waves) the code executes perfectly with no problems. Every time I try to run it on the larger set(500+) that I need it for igor closes to desktop with no warning. It doesn't freeze or that, it just closes. If it froze up I would just assume I don't have enough free memory, but it doesn't do that, just closes to the desktop.

I am running the latest version of igor on a Macbook pro OS X 10.7, 2.8ghz i7, with 4GB of ram.
sleepingawake86 wrote:
So using all the suggestions I got code that is "working". What I mean is that when I run the code using a small sample set(10-50 waves) the code executes perfectly with no problems. Every time I try to run it on the larger set(500+) that I need it for igor closes to desktop with no warning. It doesn't freeze or that, it just closes. If it froze up I would just assume I don't have enough free memory, but it doesn't do that, just closes to the desktop.

If you can reproduce the crash, please email support@wavemetrics.com with detailed instructions we need to reproduce the crash and any Igor experiments, procedures, etc. we need to follow your directions. If you can't reproduce the crash, please at least send us the Macintosh crash log. This might give us some clues as to what is happening.

Given that you are using Igor's multithreading features, you must follow the "rules" with regards to what you do with data in threads, etc. Without at least seeing your code it is impossible to help you solve your problem.

Also, you say you are using the latest version of Igor. If you are not using the nightly build available at http://www.wavemetrics.net/Downloads/latest/ then you are not actually using the latest version. If necessary please download the latest build and check that you still get a crash. We have fixed many bugs since the latest official release of Igor (6.22A).