Sort with incomplete lists

Hello! I have datasets, which I would like to average. Unfortunately, they are not evenly long, always depends on measurement quality (electrophysiology). So I may have V_rev_day1, V_rev_day2 and so on and I have textwaves, W_cond_day1, W_cond_day2..., containing text about the solutions applied. These lists are sorted alphabetically, but it may be, that on day1 I could measure cond1, cond2, cond4, and on day2 I could measure cond2 and cond4. What is now the nicest way to (in the one hand, for displaying) sort the waves so that not-measured conditions give a NaN in the V_rev, and that on the other hand, I can average them properly? Yours, Dominik

johnweeks

I believe the Waves Average package (see Analysis->Packages->Average Waves) will handle waves that have NaNs in them. By "handle" I mean that if you have 5 waves and one has NaN in a row, the average for that row will ignore the NaN and give you the average of 4 values.

Log in or register to post comments

September 11, 2018 at 03:40 pm - Permalink

_sk

Is v_rev_day1 a wave storing along the columns or rows cond1 , cond2, etc.?

If so, the best thing would be to set dimension labels with the respective conditions and then use finddimlabel or getdimlabel to find if this particular label exist as you are averaging iteratively over the particular condition.

•make/o/n=(3,2) w_test = p+q
•setdimlabel 1, 0, cond1, w_test
•setdimlabel 1, 1, cond2, w_test
•print finddimlabel(w_test, 1, "cond1")
  0
•print finddimlabel(w_test, 1, "cond2")
  1

best,

_sk

Log in or register to post comments

September 12, 2018 at 01:03 am - Permalink

d_lenz

I attach a two-version example file with two datasets, where one dataset is more complete than the other and I manually inserted points where missing. Maybe the problem gets more obvious then. I would have a W_Sol_[date] wave which contains all possible conditions, so I could sort along this wave, let's call it W_Sol_basis. But I want to have NaN in the less complete W_Sol_07_12_06 and the respective W_Vrev_[date] and W_Vzero_[date] rows, if an entry of the W_Sol_basis wave does not exist in the W_Sol_[date] wave.

So, I want to come from "Ex_data_before.pxp" to "Ex_data_after.pxp" automatically.

(I don't really understand, what the dimlabel really does.)

Yours, Dominik

Attachments Example data after manual correction (14.15 KB) Example data before manual correction (5.51 KB)

Log in or register to post comments

September 13, 2018 at 12:41 am - Permalink

_sk

The setdimlabel was for data regularization. That being said, I would use regularized standard for the date, i.e. ISO 8601: 20180913 (YYYYMMDD), where single digit numbers are always preceded by a zero.

If there is no row or column which should exist in place of (in your example) ex_0c then there cannot be a NaN or any value there, because the storage site for this NaN does not exist in the computer memory storing your wave. This again points to data regularization.

So if I understand you correctly, I can recommend you create a solution wave, w_sol_YYYYMMDD, which has all fields already accounted for, regardless of whether on this particular day this experiment was carried out or not and instantiate the wave upon creation to NaN, something like this:

make/o/n=(6,2) w_sol_20071206 = nan
make/o/n=(6,2) w_sol_20080703 = nan

You can choose to label the columns for easy addressing or remember which index corresponds to which experiment/ condition:

setdimlabel 0, 0, acetat, w_sol_20071206
setdimlabel 0, 1, cid100, w_sol_20071206
setdimlabel 0, 2, ex0a, w_sol_20071206
setdimlabel 0, 3, ex0b, w_sol_20071206
setdimlabel 0, 4, ex0c, w_sol_20071206
setdimlabel 0, 5, sulfat, w_sol_20071206
setdimlabel 1, 0, rev, w_sol_20071206
setdimlabel 1, 1, zero, w_sol_20071206

Then address fields like so:

•print w_sol_20071206[%acetat][%rev]
  -10.0742
// equivalent to
•print w_sol_20071206[0][0]
  -10.0742
// trying an empty field
•print w_sol_20071206[%cid100][%zero]
  NaN

Once your data is in shape, it is all about what you want to do: conditional statements, sorting, summation, etc.

edit:

btw, if you create a base wave, like you suggested, and set the index labels to the corresponding conditions, any duplication of the base wave will also carry over the index labels as well, in other words:

make/o/n=(6,2) w_sol_base = nan

setdimlabel 0, 0, acetat, w_sol_base
setdimlabel 0, 1, cid100, w_sol_base
setdimlabel 0, 2, ex0a, w_sol_base
setdimlabel 0, 3, ex0b, w_sol_base
setdimlabel 0, 4, ex0c, w_sol_base
setdimlabel 0, 5, sulfat, w_sol_base
setdimlabel 1, 0, rev, w_sol_base
setdimlabel 1, 1, zero, w_sol_base

duplicate/o w_sol_base, w_sol_20180913

print w_sol_20180913[%acetat][%rev]
  NaN

best,

_sk

Log in or register to post comments

September 13, 2018 at 01:56 am - Permalink

d_lenz

Dear _sk,

I'm halfway done, and I'm sure it will work this way. :) Just hoping that Pareto will shut up this time. ;)

Yours, Dominik

Edit: Yes, it worked, see screenshot :)

Key parts are:

	for(dim=0;dim<=(exp_max-1);dim+=1)
			
		sol_str=T_sol[dim]
		setdimlabel 0,dim, $sol_str, T_sol,fill_rev, fill_zero
	endfor

and

		for(dim=1;dim<=(exp_max-1);dim+=1)
			
			sol_str=dum_sol[dim]
//			print sol_str
			setdimlabel 0,dim, $sol_str, $trans_str
			dim_str= getdimlabel ($trans_str, 0, dim)
			dim_set=finddimlabel(T_sol, 0,dim_str)
			dim_find=finddimlabel($trans_str, 0,dim_str)
//			print dim_str, dim_set
//			print dum_rev
			fill_rev[dim_set]=dum_rev[dim_find]
	
		endfor

(ignore the 0 in the first and the 1 in the second snippet, that's just due to input wave format).

Big thank you! :D

Attachments screen_1.PNG (19.13 KB)

Log in or register to post comments

September 13, 2018 at 05:18 am - Permalink

_sk

I am glad you managed to make it work.

I must say that I don't understand your code, but as long as _you_ know what it does, it's okay.

best,

_sk

Log in or register to post comments

September 13, 2018 at 07:28 am - Permalink