Dimension labels

I am in the process of looking at the implementation and performance of dimension labels (DisplayHelpTopic "Dimension Labels"). I'm wondering how you folks use dimension labels, or if you don't, why not?

There are a few things dimension labels are good for:

  • Making a wave into something like a structured database by giving names to elements of data
  • Providing access to rows/columns/etc. that have special meanings without tying those elements to specific row/column/etc. numbers
    • Many statistics operations output a wave with row dimension labels to identify the outputs
    • User tick labels from waves use column dimension labels to identify the column containing tick type
    • Identifying a row or column to be used as a subrange for a graph trace
  • In certain cases, Igor uses dimension labels as a source of text: column titles in Listbox controls, tic labels for category plots, and coming- text markers in graphs

What is your present use for dimension labels, if any?

At present, finding the element number that corresponds to a dimension label is slow, especially in large waves. Has that discouraged you from using dimension labels?

If you use dimension labels, do you typically label a small number of items (like 10 rows in a 1000-point wave, for instance)?

Dimension labels are Igor names (like wave names) and that means it's harder to make them like descriptive text or user-facing informational labels. Is that a problem for you?

 

Thanks for your help on this!

I should have asked:
Do you typically assign a label to every element of dimension?

Note that that is a little bit different from "Do you typically label a small number of items?"

Hi,

I have been a big fan and user of dimension labels for a long time.

Some of my use cases.

1. Use instead of Struct - They are more easily queried to the value.  I typically make two waves one for numeric and another for string values.  I pass them to the relevant functions and call the values by the dimension label. In the function I don't need to remember the value index since I use the label.  If I need to add another variable, I can just edit the wave and add the dimension label along with the value.  I would like to be able to set edit dimension labels as the default behavior for wave editing. An example is when I use IP to create front end for metrology tools and I need to store machine constants (test and numeric).

2. Use them as poor man's DataFrame.  Since waves can only have a single type and usually  the data is numeric, I stored the sample id into the dimension label.  That way I don't have synchronize between two wave one numeric and one text.  In this use case I would have a label for each row and usually one for each column.  This is often as prepare data for export to another software package such as JMP.

3. Speed issues - not a typical concern.  My work typically involves more proof of concept analysis and is not production constrained.  Most of my work is in Material Science where the data volume is more limited since the cost of each data point is extremely high.

Hope that helps.

Andy

I also use dimension labels extensively (see below). Speed has been no concern so far. What I personally found a bit tedious:

1) Strings first need to be cleaned in specific ways (removing ;," etc.) before they can be fed into SetDimLabel. I have a dedicated function for this now. But this also needs to be done when matching input strings against dimension labels. I guess there is no way to accept arbitrary strings as dimension label in the future?

2) A dedicated function for writing out dimension labels into a free text wave would be nice to have. Currently I use:

Make/FREE/T/N=(DimSize(data,dim)) labelStrings = GetDimLabel(data,dim,p)

3) Changing a label of data which is plotted somewhere breaks the plot and it is often not obvious what is wrong. Either a warning or automatic update of plots would be nice.

I use dimension labels for:

  • Creating tables reminiscent of Excel with wave names for the rows and the column contents properly labeled.
  • Plotting data in a position-agnostic way via labels.
  • Accumulating results via a script where the dimension label is used as an unique identifier.
  • Labels as an additional way to store string data similar to Andy.
  • Recording dimension-specific information such as time or other parameters (this is a bit limited at the moment).
  • Storing the value of rows temporarily for backup (this is not ideal and loses precision of course, but there are cases where this is good enough such as for display purposes).

One little thing.

Since I use labels so frequently, I created a couple of functions that take a list and apply it a wave with a flag for the dimension to label, called "setlabels" and "setlabelsT" the first for numeric waves the second for text waves.  I have a procedure file with all my common additions and the real issue is that I forget to adopt the procedure file before sharing an experiment.

The reason for the two functions is that I don't have a clean way of declaring a wave in function not knowing if it is numeric or text ahead of time.  Is there a way to check the variable type of wave in a function without first having to declare it?  If so then it is easy enough to branch.

Andy

Andy-

I just wrote a small test function:

Function setlabels(Wave w)
	Variable i
	Variable npnts = numpnts(w)
	for (i = 0; i < npnts; i++)
		SetDimLabel 0, i, $("Row "+num2str(i)), w
	endfor
end

I made a text wave and passed it to this function (note the lack of "/T" on the Wave declaration). Igor happily added dimension labels to the wave even though it is the "wrong" type.

The message here is that at runtime, Igor knows the type of the wave, and as long as the compiled code isn't dependent on the numeric/text distinction, this kind of code is OK. If you tried to do arithmetic on that wave reference, a text wave would be a problem. But `SetDimLabel` doesn't care about the wave's type.

johnweeks wrote: made a text wave and passed it to this function (note the lack of "/T" on the Wave declaration). Igor happily added dimension labels to the wave even though it is the "wrong" type.

That is interesting to know and helps my specific use case, though I am not a fan of "cheating the system" as a coding practice.

Curious in general is there a way to determine the wave type in a function and adjusting accordingly?  Do I use try and catch or is there a more graceful way?

Andy

You can pass the wave's full path+name to WaveInfo, and then get the wave type.  Not sure if that's more or less graceful than a Try-Catch.

 

FUNCTION Test()

	SetDataFolder root:
	
	Make/O/D/N=10 Numeric=p
	Make/O/T/N=10 Text=num2str(p)
	
	Print WaveInfo($"root:Numeric",0)
	Print WaveInfo($"root:Text",0)
	
END

 

To answer John's original question, I don't currently make much use of them, though I can think of several functions where their use would greatly simplify code (e.g. functions with that take as an input a wave with a bunch of parameters).  However, most of the code that I have seen from colleagues doesn't make any use of dim labels, so I'm not sure how many people know about them.

One suggestion for making them easier to use/more visible to new users is to make it easier to add dim labels from tables.  If I make and edit a wave that does not have dim labels, there isn't an obvious way from any of the table menus to add that column to the table (at least in IP9); I instead have to run the extra command to show the dim label column.  

FUNCTION Test()

	Make/O/D/N=10 TestW=p
	
	Edit TestW		//No dimlabels column
	
	Edit TestW.l, TestW.d		//Makes a dimlabels column where I can edit dimlabels for each row
	
END

 

@Andy:

I work extensively with waves which can be both text or numeric, and I don't know beforehand which is which. This is no problem for many operations where the type does not matter, and it is easy enough to check the type after declaration. You can even do something like this:

WAVE   data = dataDF:wName
WAVE/T tdat = dataDF:wName
if ((WaveType(data,1) == 2))
	tdat[0] = "1"
else
	data[0] = 1
endif

Note that I declared the same wave twice and then simply use the appropriate one when necessary. As John already mentioned, many functions which do not assign something to the wave's contents don't care about the type declaration.

Also a big fan of dimlabels, here my applications:

1) Data handling/storage: Almost exclusively as 2D waves, typically chemical information stored in rows, sample codes in columns. In this scenario wave sizes can be between 10 to 30 rows and 1 to 1e6 columns. At least one dimension is fully labelled (well it may contain blank rows/cols), the other is usually filled, although it some cases it may be empty. I use these 2D waves and dimlabels a lot for plotting data.

2) As mentioned by others, storage of globals in packages. Makes it very easy to review or store a given state.

3) Speed is usually not an issue. In many cases I use FindDimlabel to get the row/col index of interest and use this from there on.

 

There is one thing I really wish DimLabels could do to store additional metadata: Accept StringByKey-type constructs in standard format, i.e. containing ":", and ";". 

 

 

ChrLie wrote: There is one thing I really wish DimLabels could do to store additional metadata: Accept StringByKey-type constructs in standard format, i.e. containing ":", and ";". 

Yeah, I suppose it might be possible to add some sort of flag saying that the dimension labels aren't names. As it is now, they are names- they were first envisioned as a way to give a name to a piece of a wave, so something akin to a wave name. There are a few characters that you can't use even in liberal names because they cause problems in parsing the names.

I suppose we could add some option to say, "these labels aren't names". Seems like that might cause unforeseen problems, though.

ChrLie wrote: 1) Data handling/storage: Almost exclusively as 2D waves, typically chemical information stored in rows, sample codes in columns. In this scenario wave sizes can be between 10 to 30 rows and 1 to 1e6 columns.

Hm... I would think that labelling a dimension with 1e6 columns might have performance issues... In order to avoid "finding" the overall label first, Igor walks the list of labels backward, so finding the very first label is the slowest. If you are using the labels as a source of text, then the look-up is fast: it is simply an array index. But finding the index where a particular label is is slow, as it requires iterating over all the labels, from the end of the wave, comparing each label to the desired string.

I have used dimension labels only sparingly. I appreciate their utility as common but succinctly focused references. I'd prefer that their design stay exactly as it is. If anything is to be done about them, focus on improving search+retrieval functions on their content. 

For any updates that might be inclined toward adding database features, I'd prefer to avoid any ideas to expand on dimension labels to become what I might then call "database dumping grounds". I am not against expanding with database-like features in IP. Even down to the individual units (dimensions) in waves.

We have WaveNotes. Why not add a new meta-content object: DimNotes.

By reference to what I envision ...

I would appreciate being able to scroll through an image stack, view layer 13 in the stack, and run the equivalent of SetDimNotes{imagestack,2,13,"The explosion starts in this frame",[appendflag=0]}. Later, I would run a loop function string layerNotes += num2str(ic) + ": " + GetDimNotes{imagestack,2,ic} + "\r" to collect all notes, pop them into a notebook, and print the notebook for review. With some clever care, I could also include if (strlen(GetDimNotes{imagestack,2,ic} != 0) to know when to attach the specific image stack layer to notebook as well.

To further define my vision of database expansion in IP, I would also propose two new additional meta-content designations: WaveCode(functionname) and DimCode(wave,dim,index,functionname). They would allow users to attach specific functions to waves or units in waves. Why? I can only offer vague ideas and the nagging thought that someone at a higher user level than me probably has a fantastic use case for them.

In essence, as IP expands its approach to database behavior, don't change existing objects to act as expanded meta-content. Add new, specifically-focused meta-content objects where needed.

At the risk of being called for old-school top-posting ...

I will go further on the proposal to incorporate database type functionalities. Allow users to set their own meta-label in wave notes and dimension notes.

// new "database" functions

SetDimNote(wavename, dimension, index, notestr, [append], [tag])
string rtnStr = GetDimNote(wavename, dimension, index, [tag])

// - wavename: the name of the wave (in the current data folder)
// - dimension: 0 row, 1 column, ...
// - index: the position along the given dimension
// - notestr: the string for the note
// - append: optional 0 (default) to overwrite or 1 to append
// - tag: optional string functioning as per UDName in window userdata (blank is global location)

// In GetDimNotes, using tag = "*" returns all notes in an key=value string list format
// example: "global=;PackageXYZ=this package set this note;"

// additional new "database" flag for wave notes

Note/T=tag/K/NOCR wavename, str
string notestr = note(wavename, [tag])

// As above, note(wavename,"*") returns all wave notes in a key=value string list format

// expanded "database" function for window notes

string udStr = GetUserData(winName, objID, "*")

// using "*" as above will return all user data associated with all userDataNames 
// for the given objID in the given winName (key=value string list format)

Give us these new kinds of "database" nuts and bolts, and we can work on building better machines from them.

I dare say that tag="*" could also be marketed as a REGEX pattern. But, as some would agree, doing so will likely only add more problems to the mix.

I'm back. These options make more sense.

SetDimLabel dimNumber, dimIndex, label, wavelist
SetDimNote/T=[tag]/K/NOCR dimNumber, dimIndex, notestr, wavelist // NEW
(SetWave)Note/T=[tag]/K/NOCR wavename, notestr // MODIFIED
SetWindow/W=... note([tag])=notestr  // MODIFIED
SetWindow/W=... userdata(UDName)=udstr
SetWindow/W=... label=label  // NEW

GetDimLabel(wavename, dimNumber, dimIndex)
GetDimNote(wavename, dimNumber, dimIndex, [tag])  // NEW
(GetWave)Note(wavename,[tag])
GetWindow/W=... note([tag])  // MODIFIED
GetWindowNote(winname, [tag])  // NEW
GetUserData(winname, objID, UDName)
GetWindowLabel(winname)  // NEW

FindDimLabel ...
FindWaveLabel ...  // NEW
FindWindowLabel ...  // NEW
// optional
// FindDimNote ...
// FindWaveNote ...
// FindWindowNote ...

CopyDimLabels ...
CopyWaveLabels ...  // NEW
CopyWindowLabels ...  // NEW

// tag, label, and UDName are kept to strict 255 byte string limits
// using "*" in any of these indicators in the GetXYZ functions returns all as key=value string list
// duplicating waves or windows copies over their notes

 

johnweeks wrote:
ChrLie wrote: 1) Data handling/storage: Almost exclusively as 2D waves, typically chemical information stored in rows, sample codes in columns. In this scenario wave sizes can be between 10 to 30 rows and 1 to 1e6 columns.

Hm... I would think that labelling a dimension with 1e6 columns might have performance issues... In order to avoid "finding" the overall label first, Igor walks the list of labels backward, so finding the very first label is the slowest. If you are using the labels as a source of text, then the look-up is fast: it is simply an array index. But finding the index where a particular label is is slow, as it requires iterating over all the labels, from the end of the wave, comparing each label to the desired string.

 

In my individual cases, if the DimSize in one dimension is large (in the thousands and higher) this dimension has usually no labels. These are often datasets from data bases or they are simulated and sample names/codes are omitted or not applicable. Finding row/col indexes based on DimLabels is usually restricted to the "short" dimension. This is probably why I don't run into speed issues. 

 

KZarzana wrote: One suggestion for making them easier to use/more visible to new users is to make it easier to add dim labels from tables.  If I make and edit a wave that does not have dim labels, there isn't an obvious way from any of the table menus to add that column to the table (at least in IP9); I instead have to run the extra command to show the dim label column.  

The command to edit the dimension labels can combine the "l" and the "d": `Edit MyWave.ld`.

If you have a table displaying MyWave, but it doesn't show the dimension labels, and you want to use the dialog, do this:

Table->Append Columns to Table
In the dialog, turn on the radio button on the right, "Edit dimension label and data columns"
Now you can select MyWave from the list of waves.
The generated command is `AppendToTable MyWave.ld`

One thing that annoys me is that if your wave is a matrix, and you display it in a table with the dimension labels, you can edit the row labels in place, but the column labels put up a dialog to get a dimension label. I figured out a few days ago that you can use the chasing arrows button in the table to transpose the rows and columns in the table display, and that allows you to edit the column labels in place.

@johnweeks Thanks for the tip.  I've gotten so used to being able to just toss waves from the data browser into a table that I completely forgot about the append columns menu option! 

Related to that, it would be nice to be able to transpose the table as a whole and not only the orientation of a single 2D wave. Case in point: Showing a bunch of 1D waves next to each other, where the values are small but the dimension labels (or the wave names) are long. Instead of

  • very_long_labelA  very_long_labelB very_long_labelC ...
  •           1                             2                           3
  •           4                             5                           6

one could then display:

  • very_long_labelA  1 4
  • very_long_labelB  2 5
  • very_long_labelB  3 6
johnweeks wrote: One thing that annoys me is that if your wave is a matrix, and you display it in a table with the dimension labels, you can edit the row labels in place, but the column labels put up a dialog to get a dimension label. I figured out a few days ago that you can use the chasing arrows button in the table to transpose the rows and columns in the table display, and that allows you to edit the column labels in place.

I left a comment about this a few years ago. The dialog that pops up is a bit annoying (at least for 2D waves). Instead, I often use MatrixTranspose, executed twice.

For everyone interested in database functionality I really recommend using mysql.  We ship an XOP that lets you communicate with mysql database which has better functionality and performance than anything that we could add on top of standard Igor waves.

AG

Let me rein in my overly enthusiastic and perhaps off-topic comments on database functions in IP to address the focused questions that John asked.

* I use dimension labels as a way to "tag" the dimension in general categories (called time, temperature, species ...) or as specific labels in each case (e.g. C=O peak for a specific analysis values on a set of FTIR spectra).

* I find no difficulties with speed when I use the labels for "look-up" behavior (e.g. plot label temperature versus label time).

* I don't restrict myself to labeling one a certain number of items. My use case is typically in the 10's of items, not 100's or more.

* I do not face any issues because labels cannot be notes. I do not view them this way. I do wish however that we could include expansive notes on dimensions.

Ultimately, I worry that my applications using dimension labels will inadvertently conflict with requirements from other users demanding to have dimension labels. I use them sparingly for this reason, and this reason alone. I fear conflicts will certainly arise when labels could also be used as (expansive) notes. I suggest therefore that we should have a DimNotes field in addition to a DimLabel field. I also fear conflicts will arise when labels might become more widely required as meta-content in different Igor Pro analysis packages. I suggest therefore that DimLabel should include a "tag" character. By comparison, we can set namespace limits on userData(UDName). Consider also allowing "namespace" tag designations on dimension labels (and eventually also on dimension notes) as per SetDimLabel/T=[tag] ...

We have no plans to change the roll of dimension labels as names for bits of waves. A dimension label is, and always will be, a name. That means they are limited to 255 bytes unless we further expand the length of all names. It also means that certain characters that you might want in a dimension label used as text can't be used, like a semicolon. The convenience of using dimension labels as a way to associate text with wave elements is too much to pass up in certain cases, but they will remain always fundamentally names.

I'm not quite sure what you mean by a "tag character". Does that include the "%" that is used when you want to refer to a part of a wave by its dimension label?

> It also means that certain characters ... can't be used ... they will remain always fundamentally names.

I see my confusion. Expanding their length is not the same as making them into illiberal free-text note fields. Thanks for the clarification.

> I'm not quite sure what you mean by a "tag character". Does that include the "%" that is used when you want to refer to a part of a wave by its dimension label?

Hmm. I had not thought about this aspect. I only thought about it with this example.

Suppose that I do this in an image stack ...

SetDimLabel 2, 13, to_review, myimage

So that I can also do this ...

variable to_reviewN = FindDimLabel(myimage,2,"to_review")

Suppose I also use an image processing package that, unbeknownst to me, at some point does this

SetDimLabel 2, -1, stack_layers, myimage

One way I see to get around this conflict is ...

SetDimLabel/T=JJW 2, 13, to_review, myimage

variable to_reviewN = FindDimLabel(myimage,2,"to_review","JJW")

Perhaps mirroring how we must refer to modules and sub-windows, the reference to a label under a specific tag might be accomplished via one or the other of these notations.

variable vval = myimage[1][1][#JJW%to_review] // JJW private tag label
variable vval = myimage[1][1][%JJW#to_review] // JJW private tag label
variable vval = myimage[1][1][%to_review] // global label -- FAILS (the label does not exist there)
display myimage[1][1][%stack_layers] // global label

Ah- you want a private namespace for your labels... All I can say is, ain't gonna happen! Sorry.

I would point out that there is no actual conflict in the labels in your example, since `SetDimLabel 2, -1, stack_layers, myimage` uses the overall label and that's different from `SetDimLabel/T=JJW 2, 13, to_review, myimage` which sets the label for layer 13. But the concern about conflicting use of dimension labels is real. Short of allowing multiple separate sets of dimension labels as you suggest, there is no real solution, except to be careful how you use your waves. That's always a concern, of course.

> Ah- you want a private namespace for your labels... All I can say is, ain't gonna happen! Sorry. ...  But the concern about conflicting use of dimension labels is real. ... be careful how you use your waves.

Glad I could clarify. I'll certainly continue to consider my approaches with dimension labels carefully.

I offer that having a DimNotes method in parallel to DimLabels could offer developers such as me an option to avoid conflicts. If private namespaces will not be possible in dimension notes either, they could at least be mimicked, for example using a YAML or JSON or HTML or MD prefix. Consider this example below of a dimension note for myimage[][][13]. The first and last lines are global. The middle two are in my private namespace by the fact of having a markdown tag prefix.

this dimension shows stack layers
#JJW to_review[2026-02-19|19:43]
#JJW the explosion happens at this layer 13
the scale on this dimension is 0.10 ms per step

I might add one thing regarding DimLabels:

We have Concatenate/DL, but unfortunately there is no SplitWave/DL. I'm not sure about complications for higher dimensions (3D, 4D) and the /SDIM flag, but for 2D it would be very useful.

 

I assume you are suggesting that splitWave would use dimension labels to name the new waves.  If that is the goal, how do you plan to name new waves when the content of the dimension labels already exist as waves in the current DF?

I would assume the /O flag handles this. So basically I suggest to have a shortcut to the example scenario below, which assumes that a 2D wave with column labels is split.

function Split2DWaveIntoColumns(wave w, int o)
	// if o!=0 ok to overwrite
	
	String NextLabel, NameList=""
	int i, n=0
	
	for(i=0; i<DimSize(w,1); i++)
		NextLabel=GetDimLabel(w,1,i)
		
		// if DL is empty use default naming
		if(strlen(NextLabel)==0)
			NameList += NameOfWave(w)+"_"+num2str(n)+";"
			n++
		else
			NameList += NextLabel +";"
		endif
	endfor
	
	if(o != 0)
		Splitwave/O/Name=NameList w
	else
		Splitwave/Name=NameList w 
	endif
end