How would speed this up?

Hi,

I am working on some image analysis specifically looking at textures within an image. I can't share an image because they are proprietary, but think of them as a materials TEM (transmission Electron Microscopy)  image with different particles and regions.  The first order request is to quantify the areas of the different phases, though I think there is benefit in making the quantification more explicit and to that end I am using the ImageGCLM function.

The basic idea I have an image 4096x4096 pixels and I am looking at different regions and in general I am binning the image into 64x64 pixel regions and running the ImageGCLM function. This function works and seems to produce the results anticipated.  The downside is that it takes ~90+ seconds to run on Mac with M1 and ~85 sec on Mac 3.8GHz i7 Intel CPU.  This function gets run which each image loaded so the user is sitting twiddling thumbs for a while.

Are there ways to speed this up perhaps with multi-threading?  I have not used multi-threading so I am completely unfamiliar withers use. Other ideas?

function calc_GCLM(wave input, variable binsize)

    variable index,maxindex,starttime
    starttime = datetime
    maxindex=(dimsize(input,0)/binsize)^2
    Make/O /n=(maxindex,16) GCLM
    setlabels("x;y;cluster;angular;contrast;correlation;squares;inv_Diff;Sum_Ave;sum_var;sum_entropy;entropy;dif_var;dif_entropy;info_cor1;info_cor2",GCLM,1)
    Make/FREE /N=(binsize,binsize) tempimage
    GCLM[][0]= floor(p/binsize)
    GCLM[][1]= mod(p,binsize)
    variable rs,re,cs,ce
    for(index=0;index<maxindex;index+=1)
        rs = gclm[index][0]*binsize
        re = rs+binsize-1
        cs = gclm[index][1]*binsize
        ce = cs+binsize-1
        matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
        imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
        GCLM[index][3,]= tempGCLM[q-3]
    endfor
    print (datetime-starttime)
end

 

Andy

From the top of my head and untested:
 

threadsafe Function DoAnalysis(WAVE input, WAVE gclm, variable index)
    variable rs,re,cs,ce

    rs = gclm[index][0]*binsize
    re = rs+binsize-1
    cs = gclm[index][1]*binsize
    ce = cs+binsize-1
    matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
    imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
    GCLM[index][3,]= tempGCLM[q-3]
End

Make/FREE/N=(maxindex) indexHelper

MultiThread indexHelper[] = DoAnalysis(input, gclm, p)

It is a bit nasty that DoAnalysis writes into the shared GCLM wave, but as it is using different row indizes it is safe. An alternative would be to let DoAnalysis return tempGCLM, store it into a wave ref wave and extract the parameters from there.

Hi Thomas,

It looks like a speed up of 6X from 94 seconds to 16 seconds.

Here is what I have per your guidance. Basically replacing the for loop with the multithread assignment.

function calc_GCLM(wave input, variable binsize)

    variable index,maxindex,starttime
    starttime = datetime
    maxindex=(dimsize(input,0)/binsize)^2
    Make/O /n=(maxindex,16) GCLM
    setlabels("x;y;cluster;angular;contrast;correlation;squares;inv_Diff;Sum_Ave;sum_var;sum_entropy;entropy;dif_var;dif_entropy;info_cor1;info_cor2",GCLM,1)
    Make/FREE /N=(binsize,binsize) tempimage
    GCLM[][0]= floor(p/binsize)
    GCLM[][1]= mod(p,binsize)
    variable rs,re,cs,ce
    Make/FREE/N=(maxindex) indexHelper
    MultiThread indexHelper[] = DoAnalysis(input, gclm, binsize, p)
    print (datetime-starttime)
end

threadsafe Function DoAnalysis(WAVE input, WAVE gclm, variable bin size, variable index)
    variable rs,re,cs,ce

    rs = gclm[index][0]*binsize
    re = rs+binsize-1
    cs = gclm[index][1]*binsize
    ce = cs+binsize-1
    matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
    imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
    GCLM[index][3,]= tempGCLM[q-3]
End

Thank you very much.

Andy

Nice! Another thing to look out for is if the image is needlessly in FP64 where FP32 would be enough.

Hi,

In this case the image is an unsigned Byte 8 bit - gray scale image transformed from a jpg, but it is good to keep that in mind.

Also I tested the new code on my iMac with 3.8GHz i7 Intel chip and the improvement was even greater from 91 sec to 10 sec.

Andy

Try using Redimension/RMD instead of MatrixOp subrange. The two use different memory allocation strategies and one might be faster than the other in certain situations.

Hi,

 

Compared:

matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)

To

duplicate/free /RMD=[rs,re][cs,ce] input, subimage

 

To create to subimage in the multithreaded function and the results were identical.  I think it is the imageGCLM that is the main limiter in the overall time.

Andy

Hi,

 

Buttressed with a wee bit knowledge, I have tried to apply multithreading to a different problem, obviously without success or I wouldn't be posting.

Here is what I have been running.

from the command line:

summary_100=time_above_temperature(y)[p]

where:

function/wave time_above_temperature(variable min_temp)

    wave zero_point, temperature, r_time_dif
    //add a point that reflects the end of data
    duplicate/free zero_point, zp
    insertPoints /v=(numpnts(temperature)) numpnts(temperature),1,zp
    variable index,maxindex
    maxindex = numpnts(zero_point)
    duplicate/Free r_time_dif, time_interval
    time_interval = r_time_dif*(temperature>min_temp)
    duplicate/FREE zero_point,time_at_temp
    for(index=0;index<maxindex;index+=1)
        time_at_temp[index]=sum(time_interval,zp[index],(zp[index+1]-1))
    endfor
   
    return time_at_temp
end

This seems to work, but is on the slow side taking 75 secs.

 

My attempts to speed this up include a base function to replace the command line call

function speed_test(wave input)

    variable starttime=datetime
    make/FREE/N=(dimsize(input,1)) indexhelper
    multithread indexhelper[] = time_temperature(input,p)
    print datetime-starttime
end

and a new function that writes the results in the function

threadsafe function time_temperature(wave input, variable coltemp)

    variable min_temp
    min_temp =indexToScale(input,coltemp,1)
    wave zero_point, temperature, r_time_dif
    //add a point that reflects the end of data
    duplicate/FREE zero_point, zp
    insertPoints /v=(numpnts(temperature)) numpnts(temperature),1,zp
    variable index,maxindex
    maxindex = numpnts(zero_point)
    duplicate/FREE r_time_dif, time_interval
    time_interval = r_time_dif*(temperature>min_temp)
    duplicate/FREE zero_point,time_at_temp
    for(index=0;index<maxindex;index+=1)
        input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
    endfor
   
end

The error is in the duplicate where it cannot find wave.  So how do I do this correctly?

Andy

Hi,

 

Simplified the thread safe function.

threadsafe function time_temperature(wave input, variable coltemp)

    variable min_temp
    min_temp =indexToScale(input,coltemp,1)
    wave zp, temperature, r_time_dif
    variable index,maxindex
    maxindex = numpnts(zp)-1
    duplicate/FREE r_time_dif, time_interval
    time_interval = r_time_dif*(temperature>min_temp)
    for(index=0;index<maxindex;index+=1)
        input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
    endfor
   
end

Still getting error: While executing Duplicate, the following error occurred: expected wave name.  I checked and duplicate is shown as thread safe, but not automatically multithreaded.  Do I need to do something here and if so what?

Andy

If you are executing code in preemptive threads, this is what multithread normally does, you can't access the main threads datafolder hierarchy. So duplicate fails because r_time_dif is a null wave reference.

In reply to by thomas_braun

Thanks Thomas,

With that insight I refactored the code to put the wave references into the calling function and pass it to the thread safe function.

function speed_test(wave input)

    wave tr =r_time_dif
    wave zp,temperature
    variable starttime=datetime
    make/FREE/N=(dimsize(input,1)) indexhelper
    multithread indexhelper[] = time_temperature(input,tr,zp,temperature,p)
    print datetime-starttime
    doupdate
end

threadsafe function time_temperature(wave input,wave time_ref,wave zp,wave temperature, variable coltemp)

    variable min_temp
    min_temp =indexToScale(input,coltemp,1)
    variable index,maxindex
    maxindex = numpnts(zp)-1
    duplicate/FREE time_ref, time_interval
    time_interval = time_ref*(temperature>min_temp)
    for(index=0;index<maxindex;index+=1)
        input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
    endfor
   
end

it seems to work almost insanely fast going from 75 sec to 0.1 sec.

However I am now seeing some what strange behavior.

Prior to running the function I set the the receiving wave, summary_100, to zero.  I want make sure some previous step didn't put data in so I can see a change.

I have a table open with summary_100 showing and it is all zeros.

I run the multithreaded function and the table still shows all zeros and as I am on a Mac that table is not the front most window because I needed to use the command window to execute the command.  If I now bring that window to the front just by clicking on the title bar all the values in the table go from zero to their calculated values.  Is this expected behavior and if so is there a way to automatically refresh.  Some of the graphs I have created with summary_100 also are not refreshed.

Andy

There is no summary_100 in your code in the last post, but I guess this is input? Writing into waves from preemptive threads might skip some updating logic in order to speed it up. I would add a `input[0][0] += 0` in the non-threadsafe function to force an update.

The for loop can also be replaced with p,q,r,s indexing in time_temperature.