When it's worth threading in a fitfunction

Hi, Andy-

Do you always use two threads? How about machines with more than two processors (like my dual-quad core Mac Pro :) ?

Just thought I might complicate your life a bit!

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

May 10, 2010 at 09:23 am - Permalink

jjweimer

andyfaff wrote:
I've done some investigation of threading in one my fit functions which can vary in its level of complexity. ...

Wow! Thanks.

What is the metric of "complexity" (layers???) if I might be so naive? And could you provide the formula for the test that you do, presumably in an if-then-else type statement of the form ...

if (layers*points > 1)
   // multithread here
else
  // single thread here
end

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville

Log in or register to post comments

May 10, 2010 at 10:02 am - Permalink

andyfaff

jjweimer wrote:

Wow! Thanks.
What is the metric of "complexity" (layers???) if I might be so naive? And could you provide the formula for the test that you do, presumably in an if-then-else type statement of the form ...

The fit function in question is the AbelesCalcAll function at http://svn.igorexchange.com/viewvc/packages/abeles/trunk/src/RefCalcula…. The level of complexity is related to how many times a loop is called. In that loop one has complex arithmetic, complex exponentiation, complex matrix multiplication, etc. It's certainly a lot slower than all the built in fitfunctions.

The number of CPUs is determined at startup and is a static variable. I create NUM_CPUS-1 pthreads and split the points equally between those threads. If you have NUM_CPUS = 2 then you only need to create 1 thread because the main thread takes a portion.

To answer Johns question, I don't have a Quad or 8 core processor, so I haven't tried it in IGOR. But the XOP is designed to make as many threads as there are processors.

To test the single thread vs N threads I simply fix NUM_CPUS=1 and recompile the XOP. I then have a wrapper function that uses startmstimer to do the timing. I then time how long different numbers of points take to calculate, for differing levels of complexity.

To see if it's worth threading I use the inverse proportionality in fig2, as follows.

<br />
    //this relationship was worked out for a dualcore machine.<br />
    //I worked out how long it took for a certain number of points and a certain number of layers<br />
    //then I calculated the cross over for when it was worth threading, rather than not.<br />
    //i.e. for a given number of layers, how many points were required for multithreading to be worthwhile.<br />
    //I plotted the number of points (y) vs the number of layers (x), giving the following relationship.<br />
    isItWorthThreading = 3.382 + 641. *pow(coefP[0], -0.73547);<br />
    if((float) npoints < isItWorthThreading)<br />
        threadsToCreate = 1;<br />
    else<br />
        threadsToCreate = NUM_CPUS;<br />

This gives me optimal behaviour. i.e. I swap from single threading to multithreading at the cross over point, meaning I always get linear calculation time with increasing number of points.

Log in or register to post comments

May 10, 2010 at 11:14 pm - Permalink

johnweeks

andyfaff wrote:
To answer Johns question, I don't have a Quad or 8 core processor, so I haven't tried it in IGOR. But the XOP is designed to make as many threads as there are processors.

If you can package it up appropriately, I would be willing to run it on my machine. One wrinkle- my dual quad-core processor has "hyperthreading" which is should be renamed "hypethreading". At least as a Macintosh it reports 16 processors, only half of which should really be used for something like this. I haven't found a way to ask the system how many real processors the machine has.

I think the crossover point might depend on the number of processors.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

May 11, 2010 at 09:49 am - Permalink