MatrixOP ... convolve does not work


i am struggling with the "MatrixOP convolve" for some hours now and cant get it to work. Here is what i want: We acquire data with an 80x80pixel high speed camera and i would like to smooth a stack of this data (~ 1000 - 5000 frames) with a 3x3x3 filter matrix. I tried this with the MatrixConvolve function which works nicely. But then i read that the MatrixOP probably could do this much faster and this ran me into trouble :-)

This is the code that i wrote:
function np_convolution2()
    NVAR frame_int, rows, columns, numberofframes
    wave 3ddata
    make /O /N=(rows, columns, numberofframes) M_convolution2
    make /W /o coeff={{{0,1,0},{1,2,1},{0,1,0}},{{1,2,1},{2,4,2},{1,2,1}},{{0,1,0},{1,2,1},{0,1,0}}}
//  make /I /o coeff={{1,2,1},{2,4,2},{1,2,1}}
    MatrixOP /O M_Convolution2 = convolve(3ddata[] [] [0,numberofframes],coeff [] [] [0,2],-1)
    SetScale/P z 0,(frame_int/1000),"s", M_Convolution2
    M_Convolution2/=28                              //Adjust to sum of matrix values of coeff
    redimension /y=16 M_Convolution2
    DisplayStack("Convoluted2", "M_Convolution2")

3ddata and coeff have the same data type (I16) and 3ddata is a stack with the dimensions 80x80x5000.
When i run the function, Igor thinks for a while and then gives me the error message:"While executing MatrixOP, the following error occured: Inappropriate or out of range input parameter". M_Convolution2 is converted to a very long 1d wave (instead of being a 80x80x5000 matrix). And i am confused.
I tried different options for convolve (-1, -2, 1) and also the various range parameters for 3ddata and coeff with no success.
Can anybody tell me what i have overlooked?

Thank you very much for any help!

Klaus wrote:

MatrixOP /O M_Convolution2 = convolve(3ddata[] [] [0,numberofframes],coeff [] [] [0,2],-1)

Based on your variable names I assume that 3ddata has 'numberofframes' planes, which means that the valid indices for these planes run over [0, numberofframes - 1]. Your code above runs up to numberofframes, which is an off-by-one error.

More fundamentally, however, I'm not sure that MatrixOP supports 3D convolutions. The syntax you're using, e.g. 3ddata[][][0, nFrames - 1], is designed to make MatrixOP loop over all planes in the subrange, treating each plane independently. Based on that I'd guess that MatrixConvolve and friends are your best option.

There is no reason that these operations should be slower than MatrixOP, in fact they may be a bit faster, provided that they have been equally well optimized. However, the size of the kernel will be crucial in determining whether a direct or FFT-based approach is faster.
Hello Klaus,

The response above from "741" is in the right direction: MatrixOP is not appropriate for your application since it operates on a layer-by-layer basis. You may want to try ImageFilter or MatrixConvolve, both of which support real 3D convolutions.

WaveMetrics, Inc.
Dear 741, A.G.,

thank you for your replies. MatrixConvolve works quite well, i just hoped that the MatrixOP would be faster.

Greetings, Klaus

Klaus wrote:
i just hoped that the MatrixOP would be faster.

You could consider multithreading. MatrixConvolve is threadsafe, which means that it can be called from Igor-created threads. Let's say your wave has 8000 frames, and your computer has 2 CPUs (extension to more CPUs is straightforward). You could divide the work over two threads: the first filters planes 0 through 3999, the second filters 4000 to 7999. Of course, the boundary between the two volumes (where the different calculations "meet") would not be entirely correct, so you can work around that by having the calculation include some extra margins.

Just for fun, here's something to get you started:

Function DoMyConvolution(dataWave, kernel)
    wave dataWave, kernel
    variable nImages = DimSize(dataWave, 2)
    Duplicate /O dataWave, outputWave
    // the multithreaded convolution
    variable threadID, dummy
    variable dividingPlane = floor(nImages / 2)
    threadID = ThreadGroupCreate(2)
    ThreadStart threadID, 0, SubrangeConvolve(dataWave, kernel, 0, dividingPlane, outputWave)
    ThreadStart threadID, 1, SubrangeConvolve(dataWave, kernel, dividingPlane + 1, nImages - 1, outputWave)
    dummy = ThreadGroupWait(threadID, inf) // wait until the threads have finished
    dummy = ThreadGroupRelease(threadID)

ThreadSafe Function SubrangeConvolve(dataWave, kernel, firstPlane, lastPlane, outputWave)
    wave dataWave   // your 3D wave
    wave kernel     // your kernel
    variable firstPlane, lastPlane  // the indices of the planes this function should consider
    wave outputWave // you must provide this wave with the same dimensions as the data wave. The output will be stored here
    // take some margin to avoid edge effects
    variable firstPlaneWithMargin, lastPlaneWithMargin
    if (firstPlane > DimSize(kernel, 2))
        firstPlaneWithMargin = firstPlane - DimSize(kernel, 2)
        firstPlaneWithMargin = firstPlane
    if (lastPlane + DimSize(kernel, 2) < DimSize(dataWave, 2))
        lastPlaneWithMargin = lastPlane + DimSize(kernel, 2)
        lastPlaneWithMargin = lastPlane
    // make a copy containing only the select subranges
    MatrixOP /O M_Subset = dataWave[][][firstPlaneWithMargin, lastPlaneWithMargin]
    MatrixConvolve kernel, M_Subset
    wave M_Convolution
    // store the convolution in the outputWave
    // but remember to take into account the margins that were added
    variable i, offset = firstPlane - firstPlaneWithMargin
    for (i = firstPlane; i <= lastPlane; i+=1)
        ImageTransform /P=(i - firstPlane + offset) getPlane, M_Convolution
        wave M_ImagePlane
        ImageTransform /P=(i) /D=M_ImagePlane setPlane, outputWave

I didn't test this in any way, so be sure to do some checking. Fortunately it's easy enough to test; the result should be identical to simply running MatrixConvolve for the entire dataset. I'd be interested to hear what kind of performance difference you see.
Hi 741,

thanks for your suggestion! I will definitely try it out (maybe on a slightly different problem). Without this i wouldnt have considered multithreading. But after looking at your example it seems to be not as complicated as i thought before.
I will let you know,

I tried a 80x80x1000 convolution using the non-multithreaded approach and it executes pretty fast, so the overhead of the multithreading may not be worth it.

I also experimented with the multithreaded code and it seems to throw an error for reasons that I have not been able to figure out. It only seems to appear when running in an Igor created thread. To reproduce (using the SubrangeConvolve function from above):

Make /N=(80,80,1000) /W/U M_data = enoise(2000) + 2000
Make /W /O coeff={{{0,1,0},{1,2,1},{0,1,0}},{{1,2,1},{2,4,2},{1,2,1}},{{0,1,0},{1,2,1},{0,1,0}}}
DoMyConvolution(M_data, coeff)

Function DoMyConvolution(dataWave, kernel)
    wave dataWave, kernel
    variable nImages = DimSize(dataWave, 2)
    Duplicate /O dataWave, outputWave
    // uncomment this line to do the convolution in the main thread
    // this works fine on my computer
    //SubrangeConvolve(dataWave, kernel, 0, nImages - 1, outputWave)

    //uncomment these lines to do the convolution in a single Igor-created thread
    // this throws an error: "ImageTransform error: one or more of the input waves are not supported"
//    variable threadID, dummy
//    threadID = ThreadGroupCreate(1)
//    ThreadStart threadID, 0, SubrangeConvolve(dataWave, kernel, 0, nImages -1, outputWave)
//    dummy = ThreadGroupWait(threadID, inf) // wait until the threads have finished
//    dummy = ThreadGroupRelease(threadID)
Hi 741,

i am not an experienced programmer, so it takes me quite some time to get through...
I am still trying a "non-multithreaded" code but now i am stuck with something strange: I thought that the following two statements would be equivalent (if stop= (number of layers -1)):
    Multithread 3dwave_temp1 = 3dwave
    MatrixOP /O 3dwave_temp2 = 3dwave[][][0, stop]

3dwave is 80x80x5000
When i look at 3dwave_temp1 and 3dwave_temp2 with the data browser, they look identical. But when i then try to convolve both with my 3x3x3 filter kernel i will get different results (i did not try both in one program, only consecutively). The result that i get with 3dwave1_temp looks better.
Maybe i can give a more detailed description tomorrow, Unfortunately i cant spend too much time on this (although this is more interesting than many of my other obligations).

Kind regards, Klaus
Easy enough to test:
Function WavesAreEqual(wave1, wave2)
    wave wave1, wave2
    Duplicate /FREE/O wave1, diff
    diff = abs(wave1 - wave2)
    return (sum(diff) == 0)

Function DoTest()
    // make some fake data
    Make /O/N=(80,80,5000) /W/U data = enoise(2000) + 2000
    Duplicate /O data, temp1, temp2
    // try different approaches
    temp1 = data
    MultiThread temp2 = data
    MatrixOP temp3 = data[][][0, 4999]
    if (WavesAreEqual(temp1, temp2))
        Print "temp1 and temp2 are equal"
        Print "temp1 and temp2 differ"
    if (WavesAreEqual(temp1, temp3))
        Print "temp1 and temp3 are equal"
        Print "temp1 and temp3 differ"

  temp1 and temp2 are equal
  temp1 and temp3 are equal

As one would expect.

I don't think the difference that you observed was caused by a difference between those two statements. I think something else was at play. And even if there were subtle differences between them (due to e.g. floating-point round-off in some intermediate calculation) it is unlikely that the differences would be observable by eye. Usually there is some other effect at play.
Hi 741,

i figured out what went wrong! My 3d datawave is integer (I16). When i copy this into a destination wave 3ddestination it will be FP32 (because i created 3ddestination with the default settings). This FP32 wave can be nicely convolved with my kernel and gives a correct result.
When i use "MatrixOP 3ddestination = 3ddatawave", the destination wave will become I16 (because its format is determined by the format of 3ddatawave). A I16 wave does not convolve correctly with my kernel. I should have suspected something like this right from the start - the resulting wave looked "clippy", with sharp edges etc. Indeed, the error was not in the computer but in front of it - sorry to bother you with this (but it teached me a lesson).
Now i can again tackle the Multithreaded version, lets see what i will learn next...

Regards, Klaus