Averaging Over a Matrix

Hello all,

The data I receive from my instrument is loaded into Igor as a matrix containing 322 columns (each corresponding to a chemical mass) and usually 100,000+ rows (each row corresponds to one second of the experiment). I am trying to average parts of this matrix using Averag from the Jimenez Group's 2019 General Macros.

Averag only works with a 1D wave while my matrix is not. I have coded the following in an effort to pull single columns out of the matrix, have them be averaged using a key of 1's and 0's to denote start-stop, then placed into a new matrix wave (e.g. data_avg) with the corresponding averages for that column.

function get_all_avgs(data_wave)
    wave data_wave
    wave ncps_key

    string oldname = NameofWave(data_wave)
    string NewName = oldname+"__avg"

    variable n = numpnts(data_wave), i
    variable j
    variable c = Dimsize(data_wave,1)

    duplicate/o data_wave data_dummy
    duplicate/o data_dummy data_index

    for(j=0;j<c;j+=1)
        MatrixOP/o index = col(data_wave, j)
        averag(data_index, index, ncps_key,20)
        data_dummy[][j] = data_index
    endfor

    rename data_dummy $NewName
end

For the averag function, data_index is the wave that the averages are placed into, index is the wave being averaged (a column, j, from the original data_wave), ncps_key is the averaging key, and 20 is the number of initial points to skip after the key begins.

It appears like this code is working for my matrix but if there are any errors that need to be pointed out or fixes that would be appreciated!

I would like to incorporate zapNaNs into this code as well as most of the matrix becomes NaNs but I have yet to figure out a way to appropriately do so, so help on this matter would be greatly appreciated!

What does ncps_key actually do? Is it some sort of flag that indicates the start/stop of rows that should all average together?

When you say that you want to incorporate zapNaNs, I presume that means you want to ignore NaNs in the averages. That is, if you have a patch of 100 numbers to average and 20 of them are NaNs you want to average 80 numbers. Is that right?

Depending on what you want to do with NaNs, zapNaNs may not be a good way to go, since you can't delete a single cell from a matrix. But from your data_index wave, perhaps so, but it will make each column potentially a different length and keeping everything in sync might be difficult.

If you could describe your desired outcome in a way that doesn't reference averag() perhaps we could come up with a totally different and superior method.

Hi John, thank you for your response!

You are correct that the ncps_key is just a start-stop. A 1 indicates to start or continue a running average while 0 indicates to end the average (or not run one).

I want to incorporate zapNaNs to ignore NaNs in the averages wave. My data wave has only numbers in it but once it is averaged, my 100,000+ row matrix stays that large but only has 800 or so actual filled rows in it. I wanted to use zapNaNs to remove the empty rows in my final wave so that I have a matrix of only averages reported.

Pretty much what my data is, is a four-hour cycle that repeats. A single cycle looks like this: Phase A (10 mins long, 600 data points), Phase B (10 mins long, 600 data points), Phase C (30 mins long, 1800 data points), Phase D (6 mins long, 360 data points), Phase E (2 mins long, 120 data points). Phase D and E repeat 14 more times to complete the cycle so it looks like this: (Phase A, B, C, [D,E]*15)* Y (where Y = total cycles for the experiment). 1 second = 1 row of data points.

I get a matrix where columns correspond to the number of compounds I am looking at (322) and rows correspond to how many seconds my experiment ran for. I am trying to create a function that can return a new wave with the average value of each phase of my experiment and for each column in that matrix. As my phases are subject to timing changes, I have been trying to use a key of 1's and 0's to help simplify the code a bit.

So a zero means "start of a block to be averaged"?

Sounds like the blocks that need to be averaged are quite regular so that it might be easiest to extract a block, average it, and copy the result into a result wave. The result wave could be created at the size it needs to be, rather than making it huge and then deleting the majority of it. If the actual data doesn't have NaNs, then the actual work can be done using MatrixOP.

Instead of an index with 1's and 0's, you could use a wave of point numbers containing the start of each block. That wave could be pre-computed based on the regular structure that you outline above. Is there some irregularity that I haven't understood?

It would have been best if you posted the code behind averag(data_index, index, ncps_key,20).

If your key wave does not change between iterations then it seems to me that for each group of values that you need to average you could extract the submatrix containing all the rows that need to be averaged.  You can use MatrixOP subRange(w,rs,re,cs,ce) to do that.  If all you need is the straight average of columns data you can apply MatrixOP averageCols(subRange(w,rs,re,cs,ce)).  This should give you one row that you can store in your output matrix.