About parallel processing

Hi,

I'm trying to use multi-tasking to speed up one of my igor program, which is using IntegrateODE to solve stiff problems. Normally (single-thread) it works fine, but takes rather long time, probable because the "derivFunc" (required by IntegrateODE) needs to be called many times (e.g. thousands or more).

So I wonder if I could use multi-tasking to speed up the "derivFunc". My original (single-thread) derivFunc looks like this:
For (i=0; i<NumberOfSpecies; i+=1)
      GetDerivitives(i, kr, Conc, YDOT)
Endfor
Return 0

The GetDerivitives(i, kr, Conc, YDOT) basically fills "YDOT[i]" (which is a wave, carries the derivatives): you give it an i, it returns a YDOT[i]. This works pretty good.

I'm not an expert in programming... To use the multi-tasking thing, I currently could only follow the examples provided in the igor manual. So the multi-tasking version of the derivFunc becomes:
  Variable i, j, nthreads = ThreadProcessorCount
  Variable mt = ThreadGroupCreate(nthreads)
  For (i=0; i<NumberOfSpecies; )
        For (j=0; j<nthreads; j+=1)
              ThreadStart mt, j, GetDerivitives(i, kr, Conc, YDOT)
              i = i + 1
              If (i>=NumberOfSpecies)
                 Break
              Endif
        Endfor
        Do
          Variable tgs = ThreadGroupWait(mt, 100)
        While (tgs!=0)
  Endfor
  Variable dummy= ThreadGroupRelease(mt)
  Return 0


Well, it doesn't work this way. Igor returns " ** a user function gave error: Invalid Thread Group ID or index." And it seems that the YDOT[i] wave is not filled with anything, likely the function GetDerivitives(i, kr, Conc, YDOT) has never been successfully called.

So, any suggestions for my problem here? Did I make any stupid mistake? Or multi-tasking cannot been used in such manner?

Thanks in advance!

Best regards
WSY
I don't get that error. Here is what I tried in Igor Pro 6.22A:

ThreadSafe Function GetDerivitives(i, kr, Conc, YDOT)
    Variable i, kr, Conc, ydot
End

Function Test()
    Variable kr, Conc, ydot
   
    Variable NumberOfSpecies = 10

    Variable i, j, nthreads = ThreadProcessorCount
    Variable mt = ThreadGroupCreate(nthreads)
    For (i=0; i<NumberOfSpecies; )
        For (j=0; j<nthreads; j+=1)
            ThreadStart mt, j, GetDerivitives(i, kr, Conc, YDOT)
            i = i + 1
            If (i>=NumberOfSpecies)
                Break
            Endif
        Endfor
        Do
            Variable tgs = ThreadGroupWait(mt, 100)
        While (tgs!=0)
    Endfor
    Variable dummy= ThreadGroupRelease(mt)
    Return 0
End


If you can't find the problem, post a complete example that generates the error.
IntegrateODE is not threadsafe, and cannot call a threadsafe derivative function. You could perhaps use threaded functions or a Multithread keyword to fill the ydot wave in your derivative function, but it is unlikely to help. Usually the size of the derivative wave is pretty small; to get any speed-up from threading you need usually thousands of elements in the wave you are filling, and using it on a small wave can actually hurt performance. You don't tell us the value of the variable NumberOfSpecies, so I can't tell how big your ydot wave is.

IntegrateODE resists threading because each iteration depends on the result of the previous iteration, and each column depends on the results of all the other columns. Threading works best with computations that have few dependencies so that it can be broken into a number of independent computations.

IntegrateODE might benefit if it were threadsafe only if you could run multiple separate integrations in different threads simultaneously. You can do that right now by running multiple instances of Igor and running each integration in a separate instance of the application.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
johnweeks wrote:
IntegrateODE is not threadsafe, and cannot call a threadsafe derivative function. You could perhaps use threaded functions or a Multithread keyword to fill the ydot wave in your derivative function...

I'm not calling IntegrateODE in a threadsafe function. Instead, I'm calling a user-defined function (GetDerivitives) in the derivFunc, which is required by IntegrateODE. The user-defined function, GetDerivitives(i, kr, Conc, YDOT), I added the ThreadSafe keyword and hopefully this could make it "threadsafe". By the way, all functions used in GetDerivitives(i, kr, Conc, YDOT) are thread-safe: CmpStr, StringFromList, WhichListItem, .... This function also accesses a few global variables and strings, and I don't know if this causes the problem.


johnweeks wrote:
...but it is unlikely to help. Usually the size of the derivative wave is pretty small; to get any speed-up from threading you need usually thousands of elements in the wave you are filling, and using it on a small wave can actually hurt performance. You don't tell us the value of the variable NumberOfSpecies, so I can't tell how big your ydot wave is.

I understand the problem. The size of YDOT in this example is 44, actually not a big number. The thing is, usually users are supposed to provide explicit expressions of derivatives within the "derivFunc" as required by IntegrateODE. This is a tough job for human beings, especially when the system is complicated (say, a chemical system with thousands of species and reactions). IntegrateODE uses CVODE code, so I assume it could handle stiff and complicated systems. (I use LSODE for Fortran to handle such complicated systems and it works pretty good. LSODE is probably the Forttran version of CVODE, so I assume CVODE should also be okay with such systems).

But here I played some tricks: instead of providing expressions of derivatives by the users, I wrote a small routine to generate the derivatives. This makes the whole solver much more user-friendly. But this cost some time compared with the one with user-provided derivatives. Especially, since the "derivFunc" will be called hundreds or even thousands of times by IntegrateODE, the overall program works more than 10 times slower than the one with user-provided derivatives. This is a trade-off. But nowadays PCs are usually equipped with dual-core or quad-core, so I assume the performance would be improved a little bit if we do this in a multi-tasking manner :)

My thinking is like this: in the "derivFunc", a number of derivatives will be generated, in this case, YDOT(i). But the derivatives are relatively independent. So I think it might be possible to do this in a few paralleled ways. For example, YDOT(0-10), YDOR(11-20), YDOT(21-30), ... I think this is technically possible.

Thank you very much Howard and John, I really appreciate your timely reply :)
baroques_solari wrote:

So I wonder if I could use multi-tasking to speed up the "derivFunc". My original (single-thread) derivFunc looks like this:
For (i=0; i<NumberOfSpecies; i+=1)
      GetDerivitives(i, kr, Conc, YDOT)
Endfor
Return 0



You can also try
MultiThread YDOT = GetDerivitives(p, kr, Conc, YDOT)


Which is a bit weird since YDOT is on both sides. So what you'd want to do is modify GetDerivitives() so that it simply returns the value you're interested in, and then use something of the form
MultiThread YDOT = GetDerivitives(p, kr, Conc)


If GetDerivitives() does more than compute a single number, simply turn YDOT into a wave wave (see Make /WAVE). I do this all the time and it works fine.

Usually I'll use MultiThread for simplicity, except in special cases or when I want to provide progress updates. Then I'll use the explicit constructs such as ThreadGroupCreate and friends.
741 wrote:
You can also try
MultiThread YDOT = GetDerivitives(p, kr, Conc, YDOT)


Which is a bit weird since YDOT is on both sides.

Since we don't have access to the definition of GetDerivitives, we can't tell why YDOT is needed in the call. It is common for derivatives in an ODE system to depend on other derivatives. If that is the case here, then the system cannot be parallelized, because you can't count on the dependencies to be computed in the correct order.

baroques_solari wrote:
My thinking is like this: in the "derivFunc", a number of derivatives will be generated, in this case, YDOT(i). But the derivatives are relatively independent. So I think it might be possible to do this in a few paralleled ways. For example, YDOT(0-10), YDOR(11-20), YDOT(21-30), ... I think this is technically possible.

You've clearly given this quite a bit of thought. As long as the various elements of YDOT don't depend on each other, you can use the Multithread keyword as suggested by 741. But the presence of YDOT in the inputs to GetDerivitives() suggests a problem that can't be parallelized.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Thanks 741 and John, it works the way as suggested by 741.

But it seems that the improvement is rather small (e.g., less than 5%). Besides, igor crashes sometimes (out of memory?), especially when smaller tolerance is used. I'll let it go... The single-thread version is not totally unbearable :)

Later I'll upload my staff here as a project. If someone is interested, we could work on this together to make it better. It should be useful for people who is probing chemical mechanisms.
baroques_solari wrote:
Thanks 741 and John, it works the way as suggested by 741.

But it seems that the improvement is rather small (e.g., less than 5%). Besides, igor crashes sometimes (out of memory?), especially when smaller tolerance is used. I'll let it go... The single-thread version is not totally unbearable :)

Later I'll upload my staff here as a project. If someone is interested, we could work on this together to make it better. It should be useful for people who is probing chemical mechanisms.


It's hard for us to make more detailed suggestions without knowing the specifics. It's entirely possible that you'll get out-of-memory errors in the multithreaded code simply because you're effectively multiplying the allocations by the number of threads. But I would expect that to happen only if you have either a lot of threads and/or large allocations.

To increase performance I'd suggest starting by profiling the code (http://www.igorexchange.com/project/FuncProfiling). Note that you'll have to use the single-threaded version to do the profiling. Then post a snippet that shows the trouble spot(s) and whatever else we need to know, and we'll see if we can come up with suggestions.
baroques_solari wrote:

Besides, igor crashes sometimes (out of memory?), especially when smaller tolerance is used. I'll let it go... The single-thread version is not totally unbearable :)


If you are able to crash Igor reproducibly, please send instructions we can follow too get the crash as well as any experiment and/or procedure files we need to reproduce the crash to support@wavemetrics.com. Please also let us know what version of Igor you are using and your OS. If you are using a Mac please also send us your crash logs.

If you aren't already using the latest version of Igor (6.22A), I recommend you get that by choosing the Help->Updates menu item. After you do that I also recommend that you download the latest nightly build of the executable from http://www.wavemetrics.net/Downloads/latest/. We have fixed several crashes since the last official release.