ImageTransform vs Concatenate execution time

Function MakeTheWaves(variable nn) variable i for(i = 0;i < nn; i++) Make/O/N=(4096,4096) $("wname_" + num2str(i)) endfor return 0 End Function MakeTheWaveRefs(variable nn) variable i Make/O/N=(nn)/WAVE mxpwaveRef for(i = 0;i < nn; i++) mxpwaveRef[i] = $("wname_" + num2str(i)) endfor return 0 End Function [variable tN0, variable tP0, variable tN1, variable tP1] timerFunction(variable nn) MakeTheWaves(nn) MakeTheWaveRefs(nn) WAVE/T mxpwaveRef variable tm00=stopmstimer(-2) ImageTransform/NP=(nn) stackImages $"wname_0" variable tm01=stopmstimer(-2) variable tm10=stopmstimer(-2) Concatenate/O/NP=2 {mxpwaveRef}, M_Concatenate variable tm11=stopmstimer(-2) tN0 = (tm01-tm00)/1e6 tN1 = (tm11-tm10)/1e6 // Now all stacks are pre-allocated, redo the operations tm00=stopmstimer(-2) ImageTransform/NP=(nn) stackImages $"wname_0" tm01=stopmstimer(-2) tm10=stopmstimer(-2) Concatenate/O/NP=2 {mxpwaveRef}, M_Concatenate tm11=stopmstimer(-2) tP0 = (tm01-tm00)/1e6 tP1 = (tm11-tm10)/1e6 KillWaves/A/Z return [tN0, tP0, tN1, tP1] // 0 = ImageTransform, 1 = Concatenate End Function RunTheTest(int startN, int stopN, int stepN) // Entry point variable nn, i, tN0, tP0, tN1, tP1 string dataFolderName = "TestResults_" + num2str(startN) + "_" + num2str(stopN) + "_" + num2str(stepN) NewDataFolder/O $dataFolderName DFREF dfr = $dataFolderName int sizeN = (stopN - startN)/stepN + 1 Make/N=(sizeN) dfr:results_CN /WAVE=wcn Make/N=(sizeN) dfr:results_IT /WAVE=wit Make/N=(sizeN) dfr:results_CN_Pre /WAVE=wcnP Make/N=(sizeN) dfr:results_IT_Pre /WAVE=witP SetScale/P x, startN, stepN, "Nr. Images", wcn, wcnP, wit, witP for(nn = startN; nn < stopN + 1; nn += stepN) [tN0, tP0, tN1, tP1] = timerFunction(nn) wcn[i] = tN1 wit[i] = tN0 wcnP[i] = tP1 witP[i] = tP0 i++ endfor End

aclight

The fact that your Concatenate "pre-allocated" case is slower than your "Created" case is due to you assuming that Concatenate/O works differently than it does.

I think you are assuming that if the output wave already exists and is the correct size, memory doesn't need to be reallocated and therefore the operation should complete faster. That's not a crazy assumption, but it's wrong. Surprisingly, the actual behavior is even documented in the Details section of the Concatenate topic:

If destWave does not already exist or, if the /O flag is used, destWave is created by duplication of the first source wave. Waves are concatenated in order through the list of source waves. If destWave exists and the /O flag is not used, then the concatenation starts with destWave.

Perhaps "destWave is created..." should be changed to "destWave is [re]created..." to make it slightly more clear what is happening.

ImageTransform stackImages works differently in the case when the output wave already exists. It calls an internal command to make the output wave of the correct size and type, but if the output wave already exists and is already the correct size and type this is almost a no-op.

Your findings with the ~100 layer large images are also explainable. A 4096x4096x100 image of single precision floating point type is about 6.7 GB. That's just for the output, and you have the same number of bytes in your input waves. So that's over 13 GB of memory needed. According to your system info you have 16 GB RAM, so you're likely getting into the need for virtual memory somewhere around 75 layers.

On my machine (Windows 11, Intel 12900KS processor, 64GB RAM), I'm getting times that are close to 10x faster than yours. But I'm also seeing that the "Clean" case of ImageTransform is slower than the "clean" Concatenate case (tested only with 20 and 100 layers). I think your last graph shows the "clean" case of both operations, and if so you don't see quite the same thing. Looking at the code that executes it makes sense that ImageTransform would be slower in the "clean" case, but the explanation is getting into the weeds and some of the difference might be due to how memory allocation works on macOS vs. Windows.

June 29, 2023 at 09:19 am - Permalink