If any faster solution to this nested loop

Hi everyone,

I was wondering if anyone can help me with this piece of nested loop posted below. I am working with 4d matrices and this loop is taking about 50 secs to calculate the result and this part of the whole program needs to run through many iterations, which is taking a large time to get me the final results.

If there is a better solution/algorithm which can run faster would be appreciated.

for(d=0;d<d3;d+=1) 
  for(m=0;m<f3;m+=1) 				
    for(l=0;l<f3;l+=1) 					
    s=0	 				
      for(j=0;j<ysize3;j+=1) 						
        for(i=0;i<xsize3;i+=1) 							
           for(k=0;k<zsize;k+=1) 							
            s+=res2[i+l][j+m][k][d] 							
           endfor 						
        endfor 					
      endfor 					
      for(n=0;n<n3;n+=1) 					
        grd_L_ker3[l][m][d][n]=s 					
      endfor 				
    endfor 			
  endfor 		
endfor 	
grd_L_ker3*=err

Thanks in advance.

Shuvam

  

Hi,

Not sure I fully understand the code to suggest exactly what to do. It looks like you are summing groups of voxels and storing the result in grd_L_ker3. So the j, I and k loops could be cut out and written as a function where your l and m offsets are arguments and the function extracts the relevant voxels from the 4D wave and returns its sum. In fact, the chunk specified by n would be the same for all chunks, if I'm not mistaken, so that loop is redundant too. You can probably do the whole thing as a direct assignment to grd_L_ker3 with the right expression (one-liner)

It would have been helpful to have some background for this application.  Otherwise (as noted by the previous comment), the code does not make much sense when the same sum is set for all chunks.  Off hand this looks like some kind of a convolution with a constant kernel or a strange 3D boxcar averaging. 

The fact that the index d appears in the sum in the last dimension and then in the grd_L_ker3 in the third dimension combined with the fact that the code does not show any indication of how it handles the boundaries of the matrix (possibly in d3, f3) makes me wonder if it performs its intended task.  If it turns out that the code is correct as shown here then I would suggest breaking the sum loops using MatrixOP to extract chunk d of the wave res2, then replace the sum with MatrixFilter with the keyword avg (you get the actual sum by multiplying the result with factor=xsize3*ysize3*zsize3 which you would hopefully compute once outside the loop.  Note that MatrixFilter does have code to handle the boundaries of the array so you may want to make sure that it works for your application.

AG

In reply to by Igor

Dear AG,

Your assumptions are right, I am working with kernels and its responses (here res2) and the loss function (of Convolutional Neural Network). In grd_L_ker3, I am calculating the gradients of the loss functions wrt the kernel weights. 

The res2's last dimension and grd_L_ker3's 3rd dimension no.s are the same, so I put that way. 

Thanks for your suggestion, this would be useful. I will try this.

 

BTW: I am also calculating the convolutions using loops as I found that the default convolve operation in MatrixOP handles the convolution with either zero-padding/reflection padding. But in this case, I actually do not want any kind of padding which means the response will be in smaller in size than the input wave. Is there any option in Igor so that I can do this convolution without any padding? 

 

 

Hello shuvam,

I don't think MatrixOP convolution is useful here because it operates on a layer by layer basis and this is probably not what you need. 

Depending on the size of your kernel you should consider implementing the 3D convolution using the FFT operation.

If you must use explicit loops then try to multithread your code.  That should give you an extra factor of ~n in speed (n is the number of available threads).

 

AG