 
    
    
    
    Multithreading in accessing the same wave
I want to speed up my program, maybe by using threadsafe and multithreading. However, my code with a threadsafe structure only speeds up by 10%. I guess the bottleneck is that the worker needs to wait until the other worker finishes accessing the wave w. Do you have any suggestions on how to enable real multithreading in my situation?
Function Test() make/n=1000/O/D w, tt w = 0 // initialize tt = 1.05^p // create a log spaced t variable t1 = startMSTimer variable i for(i=1; i<100000; i++) worker(w, tt, i) endfor print stopMSTimer(t1) End ThreadSafe Function worker(w, tt, i) wave w, tt variable i w += 1/i^2*exp(-i^2*tt/10) End

 
You are iterating explicitly on wave tt over i and summing implicitly into wave w over p.
A few fixes might increase speed in a minor way.
* feed the function i^2 rather than calculating i^2 inside the function
* feed the function tt/10 rather than calculating tt/10 inside the function
I imagine this might solved by a clever approach using MatrixOP and a matrix. I see a SUM(WAVE) function may also be useful. Finally, perhaps the order of operations should be reversed. Sum the function over i into a single value V(tt[p], i) and fill that sum (implicitly) into w[p].
March 24, 2025 at 09:13 pm - Permalink
Here is a 57% speed gain as JJ predicted:
Maybe using MatrixOp in an intelligent way could speed this up even more. The bottleneck is surely the per-point wave assignment now.
March 25, 2025 at 12:50 am - Permalink
Thank you, JJ and Chozo. I implemented the fixes, and the execution speed indeed increased by more than 50%. The problem happens when I increase the iteration number of the for loop. The multithreading on wave assignment is not enough to keep the execution time in a reasonable range. I will try to implement the MatrixOP with the sum(wave). Thank you for the advice.
March 25, 2025 at 03:35 am - Permalink
Yes, if your wave has only 1000 points using multithreaded wave assignment is not very efficient. You need to parallelize the for loop. Below is my code thar runs 6 times faster on my pc. However I have a question (for WM): since all threads use the same assignment:
it will only be correct if the "+=" uses a lock (or an atomic operation) during the assignment at each point of the wave, is that how it works ? (I guess this should be the meaning of "threadsafe ?)
Edit: Just out of curiosity I made a test of simultaneous wave assignment by different threads, it looks like there are some artifacts of a race condition.
March 25, 2025 at 06:32 am - Permalink
> it will only be correct if the "+=" uses a lock (or an atomic operation) during the assignment at each point of the wave, is that how it works ? (I guess this should be the meaning of "threadsafe ?)
Exactly this "+=" assignment has me entirely sideways. I believe the faster (and more secure) approach might instead be a) create a set of columns to store either each explicit i iteration step or each implicit p iteration step and b) sum the columns along rows to recover the "w +=" equivalent behavior. IOW, rather than lumping everything into one step (aka python expert mode) to try to multi-thread, chunk the major processing into two steps and stream-line + multi-thread each one.
March 25, 2025 at 09:52 am - Permalink
You are correct to worry about writing to the same wave in multiple threads. That is likely at some point to cause a crash if two threads write to the same element of the wave simultaneously. If you haven't seen a crash, it's because you got lucky and the threads didn't write simultaneously to the same wave element.
The Multithread keyword divides up the destination wave into chunks and assigns each chunk to a thread. That way, the threads all operate on different wave elements and they cannot write simultaneously to the same wave element. (There are ways to violate that, of course: if your worker function uses creative wave indexing, maybe w[i+1], then you will likely cause a crash.
DisplayHelpTopic "Automatic Parallel Processing with MultiThread"
Look at the end of that topic for "If the right-hand expression involves calling user-defined functions, those functions must be threadsafe (see ThreadSafe Functions) and must also follow these rules:"
The right way to write your function would be to make a separate copy of `w` for each thread and pass the copies to the worker. After the threads have all finished their assigned work, you would need more code to add each of the worker waves into the final result wave. It might be possible to speed up that last bit with Multithread. Naturally, making multiple waves would involve overhead. Making them free wave would save some time, though as of Igor 8 it might not be much time. At Igor 8 I implemented fast wave name lookup using hash tables instead of walking the list of waves.
March 25, 2025 at 10:04 am - Permalink
Hi John,
Thanks you for your quick reply and explanations !
"Look at the end of that topic for "If the right-hand expression involves calling user-defined functions, those functions must be threadsafe (see ThreadSafe Functions) and must also follow these rules:"
Yes, I had looked at it, but it was still not clear for me how it works when multiple threads want to access the same data, in my opinion the manual lacks some explanations about it. At some point I thought that in a "threadsafe" function a statement
w += ...
would be compiled as :
Interlocked w += ...
(where I'm using a non-existent Igor keyword, just invented it :) but could be implemented maybe ? )
best
Pawel
March 25, 2025 at 11:13 am - Permalink
Here again, I sense that you should split out the "w += " portion of the calculations. Use more memory to build calculations that put results into rows or columns in a matrix, where results for any one position in each row or column are independent of results for an equivalent position in every other row or column. Multi-thread those calculations to the maximum. At the end, sum the rows or columns appropriately (i.e. collapse the matrix by one dimension along rows or columns) to create your "w += " equivalent.
March 25, 2025 at 01:29 pm - Permalink
That would require locking and unlocking a mutex for pretty much every element in the wave, since we don't have a good idea of what a user function might do. Locking and unlocking a mutex is a very expensive thing to do, so it would probably be a performance reducer.
From the cited list of rules:
Changing number type or redimensioning both most likely involve moving the wave's data in memory, which invalidates any pointers to the wave held by another thread. We don't have a way for a thread's wave references to know they need to refresh what they point to. Reducing the number of points in a wave probably leaves the data in place in memory, but other threads don't know that they can't index all the way to what used to be the end of the wave. Text waves are special- they either are allocated as a huge blob of bytes or as many pointers to the text bytes. Either way, writing new text will invalidate pointers to the text that are being used by other threads.
A variable passed by reference is a pointer to the variable's data, so if two threads simultaneously write to that variable, they are contending for the same bytes in memory. At least a scramble of bytes will result, in many cases a crash.
To avoid lots of issues with threads accessing waves, each thread has it's own data folder tree, which also means it's own lists of waves. If a function creates a global wave or variable, since it is in the thread-specific data folder tree, no other threads can find it. And when the thread is finished (you or Multithread has call ThreadGroupWait) the data folder tree is destroyed, so that wave or variable is also destroyed. Likewise, WAVE, NVAR or SVAR have access only to the thread's private data folder tree, and no access to the root folder's tree. This architecture was designed to prevent access to waves and variables from multiple threads simultaneously, which is sure to result in a crash. And waves that might appear in a graph should not be messed with by a thread!
I hope my little tutorial on thread safety is helpful. I can't say I'm any kind of expert on parallel processing, but I've been dealing with in Igor's internals for quite a while. Parallel processing is difficult to understand and do right! It has all the complexity of regular programming, except that your code running in multiple threads has potential for that code to interact with itself in ways that break the linear execution order you rely on.
March 25, 2025 at 02:13 pm - Permalink
That would require locking and unlocking a mutex for pretty much every element in the wave, since we don't have a good idea of what a user function might do. Locking and unlocking a mutex is a very expensive thing to do, so it would probably be a performance reducer.
From the cited list of rules:
Changing number type or redimensioning both most likely involve moving the wave's data in memory, which invalidates any pointers to the wave held by another thread. We don't have a way for a thread's wave references to know they need to refresh what they point to. Reducing the number of points in a wave probably leaves the data in place in memory, but other threads don't know that they can't index all the way to what used to be the end of the wave. Text waves are special- they either are allocated as a huge blob of bytes or as many pointers to the text bytes. Either way, writing new text will invalidate pointers to the text that are being used by other threads.
A variable passed by reference is a pointer to the variable's data, so if two threads simultaneously write to that variable, they are contending for the same bytes in memory. At least a scramble of bytes will result, in many cases a crash.
To avoid lots of issues with threads accessing waves, each thread has it's own data folder tree, which also means it's own lists of waves. If a function creates a global wave or variable, since it is in the thread-specific data folder tree, no other threads can find it. And when the thread is finished (you or Multithread has call ThreadGroupWait) the data folder tree is destroyed, so that wave or variable is also destroyed. Likewise, WAVE, NVAR or SVAR have access only to the thread's private data folder tree, and no access to the root folder's tree. This architecture was designed to prevent access to waves and variables from multiple threads simultaneously, which is sure to result in a crash. And waves that might appear in a graph should not be messed with by a thread!
I hope my little tutorial on thread safety is helpful. I can't say I'm any kind of expert on parallel processing, but I've been dealing with in Igor's internals for quite a while. Parallel processing is difficult to understand and do right! It has all the complexity of regular programming, except that your code running in multiple threads has potential for that code to interact with itself in ways that break the linear execution order you rely on.
March 25, 2025 at 02:14 pm - Permalink
I should say, "pointer to a wave's data" is the same as Igor "wave reference". It is literally an internal variable that contains the index in the machine's memory where the wave's data starts.
March 25, 2025 at 02:16 pm - Permalink
ok, so as I understand that's the rule that has been infringed since a wave is always passed by reference. Still I think the manual is not very clear about that. Actually we can write to a wave point as long as we make sure that other threads do not write to the SAME point of the wave, the rule is then respected because different points behave like different variables passed by reference.... Aaaah, even though I have quite a lot of experience with multithreading in .NET (C#) this formulation of rules was somewhat unclear for me, sorry for my misunderstanding - and thanks again for the explanations.
In .NET there are primitives for atomic incrementation of a variable, I'm not sure now but I thought they were lock-free, using only memory barriers (but maybe I'm wrong).
March 25, 2025 at 03:04 pm - Permalink
I think that would require that every element of a wave (and two for a complex element) would have to be an atomic variable. Since we write in C++, that would be std::atomic<something>. The correct memory access would be std::memory_order_seq_cst. That isn't as bad as locking and unlocking a mutex, but not great. And making every single point of a wave an atomic is stretch.
March 25, 2025 at 05:07 pm - Permalink
I am a bit confused by Pawel and John's discussion. To prevent program crashes, should I avoid using multithreading with "w +=" for wave assignment in Igor Pro?
I am trying to code with JJ's suggestion. Is the following scheme consistent with the comment?
March 25, 2025 at 05:11 pm - Permalink
The row goes over index p. The column goes over the index i. Suppose the matrix is value[npts][nis]. Calculate as
Perhaps the above can also be a MatrixOP/O value = ... calculation. Perhaps the step can be wrapped in a thread safe function, whereby passing (q^2) as a prefactor into the function would avoid the squaring calculation being done at every npts*nis cell in the matrix. After the last step, sum value along the q direction (perhaps with a MatrixOP) to obtain the equivalent of the "w += " step.
March 25, 2025 at 07:43 pm - Permalink
No, that's fine. The Multithread keyword breaks up any wave assignment into independent blocks, so the threads write to memory locations that are all distinct. My warning about a user function the right-hand side was that you should not write a right-hand side user function that accesses wave elements other than p, q, r, s. The indices represented by p, q, r, s are part of the block assigned to a given thread. The indices p+1, p-1, etc., at the edges of the assignment are outside the block assigned to the thread.
March 26, 2025 at 09:40 am - Permalink