How to use JointHistogram with /W=weightWave more efficient/fast?

Dear all

What is below can simply be pasted into Igor and it will display what I am doing and the questions will pop up in the command window. The goal of this question is to see what I should do in order to speed up this process. In the example it is trivial since the waves are short and there is only one dataset - in reality there are several datasets and long waves, so any speed improvement will be important...

Thanks a lot for your time!

Johan Söderström

 

// Dear all
// I have a problem that I really think is solved in Igor by default, 
// but I cannot find the solution.
// I have x and y data that will generate an "image" as follows
// Please note that here I only show the general idea, all of this is later on 
// implemented in functions and even a form of GUI - in order to keep it simple (?) 
// I only show the main critical parts

SetDataFolder root:; NewDataFolder/O/S JS_Test_Delete_Later
make/O/N=11 x_wave, y_wave
x_wave = {10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
y_wave = {10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
JointHistogram /BINS={11, 11, 0, 0} /C /DEST=test_2D /P=0 x_wave, y_wave
NewImage/K=1 test_2D

// This means that in coordinate (x,y) we have one "hit" as you will see in the image
// In addition the the x-y-data I have several waves containing "constraints" on the data. 
// This means that some of the data (noise) should not be included in the final 2D image.
// Here I will examplify this with eg. the limits of what is "allowed" values of x and y 
// as well as allowed times when the hits are recorded. 

make/O/N=11 t_wave
t_wave = {4, 4, 1, 4, 4, 12, 10, 4, 4, 1, 4}

variable/G x_min = 12
variable/G x_max = 18
variable/G y_min = 10
variable/G y_max = 19
variable/G t_min = 3
variable/G t_max = 5

// The number of constraints can be quite large, but here I use only x, y and t.
// Ok, the main idea now is that I really want to create a projection of the 2D 
// image along y (think ImageLineProfile)


// With only one dataset this is quite easy, however I have several datasets, and the 
// waves can easily contain 100 000+ datapoints - this for-loops are out of the question
// My current solution to this would be to create several "Weight_waves" one for each 
// constraint, as per follows

// EDITED a few minutes after I posted the question: Can I somehow speed up the below process perhaps?

make/O/N=11 x_lim, y_lim, t_lim
x_lim := x_wave>x_min && x_wave<x_max ? 1 : 0
y_lim := y_wave>y_min && y_wave<y_max ? 1 : 0
t_lim := t_wave>t_min && t_wave<t_max ? 1 : 0

Edit/K=1 x_wave, y_wave, t_wave, x_lim, y_lim, t_lim

// As you can see I need dynamic linking here - I want to be able to try different limits 
// and see in real-time what this means to the 2D image, and also to the projected image

// Now we have all the constraints as "1 = keep this value" and "0 = discard this value" 
// and it is trivial to make a weight_wave

make/O/N=11 All_lim
All_lim := x_lim * y_lim * t_lim

// Now I can make a NEW 2D image as per follows

// EDITED a few minutes after I posted the question: Can I perhaps speed up the below process by being more clever? 

JointHistogram /BINS={11, 11, 0, 0} /W=All_lim /C /DEST=test_2D_Selected_Points x_wave, y_wave
NewImage/K=1 test_2D_Selected_Points
Make/O/N=11 Projection
Projection := test_2D_Selected_Points[p][x]
Display/K=1 Projection

// Although I have never used it before I am quite certain that I can implement a hook
// (I think that's what I want) that runs the JointHistogram and thus updates the projection
// whenever one of the limits is changed. However I get a feeling that I am doing this in 
// a fashion that is way too slow and not using enough of what is already built-in into Igor.
// I wonder if there is a quicker way to do this - since I will be updating 100's of datasets
// once I change the parameters it would be sad if I do something that slows this down if it
// can be sped up... Any ideas on how to improve this?
 

Hi Johan, (happy to see you in the forum. Hope you are doing well)

Looks like you are trying to write some code for TOF data. I have also the experience that things can be quite slow with 100.000s of hits to process.

The first question would be if you really, really need dynamic linking, or whether an update function will do which is periodically called. In the latter case your slow wave assignments could be reformulated with much faster functions such as MatrixOP. Here is how you can generate your weight wave:

MatrixOP/O All_lim = greater(x_wave,x_min)*greater(x_max,x_wave)*greater(y_wave,y_min)*greater(y_max,y_wave)*greater(t_wave,t_min)*greater(t_max,t_wave)

This should be MUCH faster, since it is only one assignment and also MatrixOP is way more optimized. You can also choose to use multiple CPU threads here using the /NTHR flag. You do not seem to use All_lim later, but maybe this part has been omitted here.

As for this part:

Projection := test_2D_Selected_Points[p][x]

I am not sure what you are trying to achieve here. This looks to me very much like you are just copying the data (make with some linear shift). Why not use test_2D_Selected_Points for Display in a graph directly? Could you please tell a bit more about what you want to achieve here?

I am not sure JointHistogram can be sped up further, but maybe someone else has an idea.

In reply to by chozo

Hi Stephan!

Long time no see, I hope you are doing well! As far as I know you are now living quite far away from where we met... 

I've been seeing you here every now and then and also have one of your macros installed in Igor here... You are way more advanced in Igor than I am... I might even use parts of your marco in mine, the part where you "straighten out" curved lines on a 2D image... Let's see when I get there... since my data is in x,y format and not an image perhaps I will need to write my own - I really want to keep it in x-y to limit the size of the Igor experiment as the 2D image quickly grows in size when you have too many datasets..

 

The data is actually RIXS spectra but from a data-process point of view perhaps somewhat similar to ToF data...

Thanks A LOT for the MatrixOP, this is (one of) the things I was looking for - that will probably speed up the procedure significantly... I will test this out tonight when I have time...

As for the All_lim I use it in the JointHistogram /W=All_Lim - this is my way of omitting "bad" datapoints - this was the fastest way I could find. I am really only using it as a mask for what data is considered good (within the limits I set) and bad (outside the limits).

 

As for the projection I am projecting the 2D matrix along one axis. If we have a matrix as follows:

1, 2, 3

4, 5, 6

I am simply making the a wave containing three elements and assigning them to 1+4, 2+5, 3+6 (summing vertically). I am sure there is a MatrixOP for this as well that will speed it up...

 

My current plan would be to create a function (let's call this JS_UpdateAll() for simplicity) with your MatrixOP as well as a function to do the equivalent of Projection := test_2D_Selected_Points[p][x] and then I will call this function with a userHook whenever any of the limits are changed - I think that is doable...

So basically if I change any of the variables (or waves) that contain my limits (in the example above this would be eg. x_min and x_max) I will simply re-calculate the MatrixOP thus creating my "mask" (All_lim), then I will perform the JointHistogram and lastly the "projection" (or whatever it is called mathematically - I really need to brush up on math!). This will probably be somewhat slow but hopefully "fast enough"... Otherwise I will need to have an "update button" that calls such a function...

While I have you here - I've been looking into hooks yesterday (I never used them) and I wonder if you have readily an example that looks for changes in certain waves/variables or even a datafolder (all settings are saved in a folder). So whenever I update a value in that folder/wave/variable then the function "JS_UpdateAll()" will be run. That way I can "dynamically" update the waves whenever needed... I have also seen that hooks can be turned on/off so I can enable this hook when I need dynamic updating and disable it when I do not need it...

Thanks a lot for your time and help, it is greatly apprechiated!!!

 

 

Hi Johan,

Great to hear from you again. Yes, I am a bit far away now, but still also somewhat close (virtually). Let me know if you need specific help with some of your other endeavors with Igor, either here or via email. So this is RIXS data then. I was just thinking of TOF data because of the x,y,t (time?) variables. As you write, data analysis is not so different across spectroscopies.

Anyway, it seems I didn't read your initial text properly, since I overlooked the use of All_lim with JointHistogram. You are using All_lim as a 'hard' (binary) weighting wave, but JointHistogram still has to work on the full data in the end (and decide on the fly if each point is useful or not). Maybe it is possible to first reduce the data volume (temporarily) to work with a somewhat reduced data size in JointHistogram. I will think about it.

Regarding the Projection wave, I think your code currently doesn't do what you want.

Projection := test_2D_Selected_Points[p][x]

is the same as:

Projection := test_2D_Selected_Points[p][p]

if the Projection wave is not scaled (I assume it is not). This simply gives you the diagonal elements of the input (1,5,...) in your example. If you want to sum all elements in vertical direction (or in other words sum the whole column), then you can use this simple command:

MatrixOP/O Projection = sumcols(test_2D_Selected_Points)^t

Appending ^t makes sure that the result is in the correct orientation.

As for auto-updating the calculation, there is no hook function which can directly watch folders / waves etc. (as far as I know). You could cheat by (again) creating a dependency of some wave or variable, but this is a rather dangerous and confusing approach. I assume you (the user) changes one or multiple parameters 'by hand' anyway. Then it is easy to trigger the update function from this user interaction.

You can assign hooks to graph updates, cursor movement, panel interaction, user controls etc. No need to have Igor watch the folder for changes. This depends a bit on how you would like to implement the change in variables. It is very easy when you have something like SetVariable controls: simply call JS_UpdateAll() each time a variable has been touched. But other methods are imaginable as well, as long as you have some graph or panel to interact with. You then can also have a checkbox to toggle 'auto updates'. Let me / us know how you would like to proceed from here, then we can give a bit more detailed tips.

In case anyone would end up reading this I have re-written this and will eventually publish how I did this - in the end it was quite quick... As long as I can figure out how to access MatrixOP within a function as 

string s_cmd = "Output = expression(...)"
MatrixOP/O $s_cmd

I will be set, right now I end up doing

string s_cmd = "MatrixOP/O Output = expression(...)"
Execute s_cmd

which does not work since my string can easily be too long...

P.S. I am happy I figured out how to write Igor code in here...

Hello Johan,

It would help if you could explain why you need to express a MatrixOP command as a string.

You got good advice from Chozo with one minor clarification: the /NTHR flag does not really play a role unless you are processing multiple layers.

A.G.

Hi A.G.

You are correct, I got brilliant help from Chozo - and I know him from earlier so we are in contact via e-mail regarding some of these issues.

The reason I want to use MatrixOP as a string is the following (somewhat simplified for this particular example):

I have a large dataset where one parameter is time, and the time is a periodic function. I want to discard the data that falls outside the "time-windows" of interest. In order to do this I folded (thanks https://www.wavemetrics.com/code-snippet/wave-folding) the periodic data and obtained an equation where the time-window is. The equation in question is simple. Below is snippets of my code that attempts to do this. Since it is not working it is not tested, but if the string is shorter (I do this elsewhere in my code) I can get it to work, so in principle it should work

Here is an example of the periodic time. I have other waves of the same length and I want to discard all data that is not on the same rows as the "peaks" below - the relevant data is only in the peaks.

Here is how the above wave looks if I fold it onto itself with the correct boundaries

We see that the low limit is 700 and the high limit is 1000, and we see (or know) that this repeats with a periodicity of 1458. So I want to extract the rows in the range 

i*1458 + 700 -> i*1458 + 1000
where i is an integer

Below is a code-snipped (not tested since I fail in figuring out how to use the string when it becomes too long - but it should work I think) that attempts at doing this... There are other steps (divide by the output from  MatrixOP and zipNaNs) that I later need to do in order to get to the final wave - but those are details...

 

v_time_constant = 1458
v_time_start = 700
v_time_end = 1000

//This means that I can formulate a periodic condition using a for-loop that I can later-on use with MatrixOP

variable v_Previous_Low_Time_Limit = 0
variable v_Previous_High_Time_Limit = 0
variable i
for (i=0; i<condition; i+=1)
    v_Current_Low_Time_Limit = v_time_constant*i + v_time_start
    v_Current_High_Time_Limit = v_time_constant*i + v_time_end
       
    s_temp = ":"+s_DataFolder+":JS_VERITAS:Settings:s_Time_Exclude"
    SVAR s_Command_String = $s_temp

//The problem arises here - the string s_Command_String becomes too long due the the periodic function that I am applying and looping over
//Here I add the current periodic boundary conditions into a very long string that I later try to execute with MatrixOP - this is what is causing problems at present. I have a similar issue in a different location with the exception that until now this has not been problematic since the conditions are not long enough - but they will likely be.
//One solution would be to run MatrixOP once for each for-loop but that would slow things down. Another would be to be able to give the function as parameters to "greater()", but as far as I know that is not possible.

    if (!SVAR_Exists(s_Command_String)) //If this does not exists, make it
        string /G $s_temp
        SVAR s_Command_String = $s_temp
        s_Command_String += "(greater("+s_Path_To_DataColumn+","+num2str(v_Previous_High_Time_Limit)+")*greater("+num2str(v_Current_Low_Time_Limit)+","+s_Path_To_DataColumn+") -1)"
    else
            s_Command_String += "*(greater("+s_Path_To_DataColumn+","+num2str(v_Previous_High_Time_Limit)+")*greater("+num2str(v_Current_Low_Time_Limit)+","+s_Path_To_DataColumn+") -1)"  
        endif

    //Save the current limits as the previous limits...
    v_Previous_Low_Time_Limit = v_Current_Low_Time_Limit
    v_Previous_High_Time_Limit = v_Current_High_Time_Limit
endfor


//After this I attempt to call MatrixOP with this string, and the only way I can figure out how to do this is to

string s_to_MatrixOP = "MatrixOP/O Time_Exclusions = "
string s_temp2 = ":"+s_DataFolder+":JS_VERITAS:Settings:s_Time_Exclude"
SVAR s_Exclude_Command = $s_temp2

if (SVAR_Exists(s_Exclude_Command))
    s_to_MatrixOP += s_Exclude_Command
else
//      print "FYI: No limits defined for column "+num2str(i)
endif

//Just as a sanity check I print the command
print s_to_MatrixOP
Execute s_to_MatrixOP
//This tells me the string is too long...

 

I hope that this was possible to understand. I tried to write as little as possible (!) and still make it possible to understand... Let's hope for the best...

Thanks!

Johan

 

 

Just for the record, I am also in email contact and have received some test data as well, so I might be in the unique position to have a complete picture of this particular problem. As Johan mentions, the goal is to select a limited 'region of interest' from a repeating data series. The region of interest is in the time data of an x,y,time data set (i.e., three or more separate waves) which record 'random' events in time on any x,y plane. For this reason, a periodic ROI section has to be tested against the time wave to create a 'mask' for the x,y or any other data which was associated with this time series. The code fraction above is for the purpose of creating this mask. I will work on it, but if somebody has some idea how to solve this in a neat way then please let me know.

Hi Johan,

Thanks for the detailed explanation.  I think I can see what you are trying to do with MatrixOP.

I'd never dream of using MatrixOP like that.  The approach is appropriate when you have one or two ranges but not many.  Instead, in your case, I prefer to create a 1D mask wave which should efficiently select the relevant segments.

A.G.

Dear all.

I have now solved this, much thanks to all who helped out. Since the wave I want to exclude is periodic I simply created a new wave that was set to 

new_Wave = mod(old_wave, factor)

This is now a "complete" project but strictly geared towards data from a particular spectrometer.

I will see how I go about to make this project available for the general public - if it is even interesting beyond a small group of people... Until then anyone who has data in the format of 

x,y,t,....,N-1, N... columns and want to reduce this to:

1. An image of x,y pairs

2. Filtering on time (or other colums)

3. (Optional) Create a 1D spectrum of the 2D image

might benefit from my code - feel free to write here and I will happily share my code. At present it is ugly but working, in the long run it might become beautiful and working...Maybe...

Best,

Johan Söderström