DataFrame Equivalent

Hi,

Please free to show me the door if this is a nuts idea.

I am always concerned about data provenance and keeping related data together and I do that with 2D waves where I can and I also am a big fan of data labels and use them as a poor man's data frame since a 2D wave is of a single type.  Now there are times I need two sets of text points and this solution is not sufficient.

The idea is to use a special data folder as data frame with the contents being 1D waves of all the same length and then use a modified syntax to call the row and column.  Such as if a row point is deleted it is done for all the waves/columns.

example:

Student  "folder" has waves (equivalent of columns in data frame) of Name, Age, Income, weight, height,...

I can then reference it

Student.Name[] = str(......)

Andy

Why would we show you the door for a nuts idea? That just might take all the entertainment from the party.

Could you provide an example in a realistic use case rather than an abstract one? Essentially, where could I, as someone who has no clue about data "frames", benefit by implementing this so-called data frame approach in my work flow within Igor Pro? Consider that I am reasonable conversant doing analysis on multiple data sets in the 1-D (spectroscopy) as well as 2-D (images) world.

Otherwise, what you are asking appears to be for implementation of object-oriented language structures in Igor Pro. Or, you are asking something that is entirely over my head, in which case, please feel free to show me the door. :-)

SetDimLabels does support setting column labels...  this sounds like what you are describing.  Unless you need two sets of different row labels. 

Like JJ, I'm not sure what a data frame is... and haven't looked it up.

Jeff

Hi,

In Python programming there is a library Pandas and the main data structure is a DataFrame which from casual look is just a  data table, but where it differs is from its ancestors is that the data type for each column can be different, text, integer, float, boolean,... as opposed to a single type - all columns of a single type.  This is the main repository for handling data and it allows you create subsets for further processing and analysis in a variety of ways.  For more info https://realpython.com/pandas-dataframe/

As Jeff hinted the major interest is in having multiple row labels.  With extension using those row labels for creating a graph.

For example If I had data with the material source as wave (vendor1,vendor2,vendor3,vendor1,vendor3,...), a second with a tool id(a,a,b,b,c,a,..) and finally some property data such as strength(92,96,101,1110,93,....)

The Dataframe is primarily there to make sure you have consistency across all the data fields, but once you have that you can more easily create analysis. For example if I want to make a box plot based on those with strength vs vendor, and strength vs tool_id it would facilitate handling those data flows.  Today I have to split the strength wave into multiple waves for each value of vendor or tool_id and either create a another text wave with those unique names to set the labels for the box plot.

There is another stats program I use called JMP, that follows this model and it makes creating stats based analysis very easy. It is $1200 yr/seat and I would love not to have to pay for it.

Andy

I may not be seeing your entire idea, but...

Use a 2D text wave and num2str and str2num to place/retrieve numeric data.  You have access to as many columns as needed for the text identifiers.  Potential problems could include overhead in the num/str conversions as well as possible loss of numeric precision in the conversion.

Maybe you've considered and discarded this idea already.

Jeff

If I understand what you want the idea is this: Suppose you have a folder called mtrlsvalues containing two 2-D waves ...

TEXT MATRIX (header info)

source:     X  X Y Z  X  Y  Z
tool:          a   b a  a  c  b  b
...:             ...

DATA MATRIX (numeric values)

yield:  ...
modulus:  ...
break: ...
...: ...

You want to be able to issue a command plot_fromDB("mtrlsvalues","yield","source") --> produce a plot of yield versus source using the data in folder mtrlsvalues.

In reply to by johnweeks

johnweeks wrote:

Be aware that num2str has 6 digit precision.

I saw in the doc that a format specifier could be used to change the precision, if needed.  Could be a pain to use depending on Andy's gedanken code.

Heh. That format string is quite new! I wasn't aware of it. That solves a frequent source of tech support queries...

The beginnings of what you want might be a WAVE wave holding references to several 1D waves of various types. Apply dimension labels to the elements of the WAVE wave to provide the names for the various columns. There would be a need for lots of API support for such a thing...