When it's worth threading in a fitfunction

I've done some investigation of threading in one my fit functions which can vary in its level of complexity. Threading has overhead from creation and ending of the thread. This means that if the fit function isn't complex, or you don't have a lot of points, then you can lose a lot of time by threading. Fig1 is a graph that shows how long a single call to an all at once fit function (from Abeles.xop) takes, as a function of the number of points in the dataset. The dashed lines are unthreaded (LNx) calculations and the solid lines are threaded (LNt). Please note that the threading is done _within_ the XOP using pthreads, but the approach should be applicable to igor threads.
It's easy to see that for a few points e.g. <100 it's not worth threading as there is a constant overhead of creating the threads. For many points you get speedup when you start threading.

The N different series in that graph are for models with an increasing level of complexity in the fit function. As you increase the complexity the threshold number of points at which threading becomes worthwhile decreases. i.e. if the fit function spends a lot of time calculating a point, then threading is more attractive. I plotted this threshold number of points against complexity in fig2. This threshold number of points is inversely proportional to the level of complexity. I use this graph in my XOP to predict when I should thread the fitfunction calculation - if the number of points and the number of layers lies below the fit curve I use a single thread, otherwise use as many as the computer has.
fig1.png fig2.png
Hi, Andy-

Do you always use two threads? How about machines with more than two processors (like my dual-quad core Mac Pro :) ?

Just thought I might complicate your life a bit!

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
andyfaff wrote:
I've done some investigation of threading in one my fit functions which can vary in its level of complexity. ...


Wow! Thanks.

What is the metric of "complexity" (layers???) if I might be so naive? And could you provide the formula for the test that you do, presumably in an if-then-else type statement of the form ...

if (layers*points > 1)
   // multithread here
else
  // single thread here
end


--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
jjweimer wrote:

Wow! Thanks.
What is the metric of "complexity" (layers???) if I might be so naive? And could you provide the formula for the test that you do, presumably in an if-then-else type statement of the form ...


The fit function in question is the AbelesCalcAll function at http://svn.igorexchange.com/viewvc/packages/abeles/trunk/src/RefCalcula…. The level of complexity is related to how many times a loop is called. In that loop one has complex arithmetic, complex exponentiation, complex matrix multiplication, etc. It's certainly a lot slower than all the built in fitfunctions.

The number of CPUs is determined at startup and is a static variable. I create NUM_CPUS-1 pthreads and split the points equally between those threads. If you have NUM_CPUS = 2 then you only need to create 1 thread because the main thread takes a portion.

To answer Johns question, I don't have a Quad or 8 core processor, so I haven't tried it in IGOR. But the XOP is designed to make as many threads as there are processors.

To test the single thread vs N threads I simply fix NUM_CPUS=1 and recompile the XOP. I then have a wrapper function that uses startmstimer to do the timing. I then time how long different numbers of points take to calculate, for differing levels of complexity.

To see if it's worth threading I use the inverse proportionality in fig2, as follows.
<br />
    //this relationship was worked out for a dualcore machine.<br />
    //I worked out how long it took for a certain number of points and a certain number of layers<br />
    //then I calculated the cross over for when it was worth threading, rather than not.<br />
    //i.e. for a given number of layers, how many points were required for multithreading to be worthwhile.<br />
    //I plotted the number of points (y) vs the number of layers (x), giving the following relationship.<br />
    isItWorthThreading = 3.382 + 641. *pow(coefP[0], -0.73547);<br />
    if((float) npoints < isItWorthThreading)<br />
        threadsToCreate = 1;<br />
    else<br />
        threadsToCreate = NUM_CPUS;<br />


This gives me optimal behaviour. i.e. I swap from single threading to multithreading at the cross over point, meaning I always get linear calculation time with increasing number of points.
andyfaff wrote:
To answer Johns question, I don't have a Quad or 8 core processor, so I haven't tried it in IGOR. But the XOP is designed to make as many threads as there are processors.


If you can package it up appropriately, I would be willing to run it on my machine. One wrinkle- my dual quad-core processor has "hyperthreading" which is should be renamed "hypethreading". At least as a Macintosh it reports 16 processors, only half of which should really be used for something like this. I haven't found a way to ask the system how many real processors the machine has.

I think the crossover point might depend on the number of processors.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com