How can we efficiently import data with varying numbers of columns?

If the lengths of each line are as small as in the example I would suggest you load them as a single column text wave and treat each line of the text wave as a comma separated string list. Something like LoadWave/O/N=LoadedTextWave /K=2 /L={0, 0, 0, 0, 1} /Q /J FullFileName should probably do it. Also have a look at StringFromList

The only issue would be if the comma separated strings become very long, because that would make the StringFromList calls very slow

Log in or register to post comments

January 31, 2022 at 11:40 am - Permalink

lorien

Hi,

I really appreciate your suggestions.

When loading my data with your suggested LoadWave tags, only the first column (the datetime information) is loaded into the text wave, and the rest of the data is not loaded in. I believe this is because we specified a single column with L={0,0,0,0,1} and we specified that the data was delimited text with /J.

I tried removing the /J tag but this causes Igor to be confused about the file type - I get the following error: "Function Execution Error: While executing LoadWave, the following error occurred: can't handle this type of file".

It's just strange to me that Igor wouldn't be able to load a file like this. I'll keep trying to troubleshoot and let you know if I find a solution.

Thanks again!

Log in or register to post comments

January 31, 2022 at 01:43 pm - Permalink

jtigor

The following worked for me. Note the the file name is not shown here.

LoadWave/J/D/K=0/V={"\t"," $",0,0}

For a test wave where data rows 2 & 3 consisted of 3 elements and the others of 4, the wave looked like this:

print wave2

  wave2[0]= {"Reading,Load,Travel,Time","1,0.000,0.000,0.000","2,0.000,0.000","3,0.000,0.000","4,0.000,0.000,0.098",""}

Log in or register to post comments

January 31, 2022 at 02:47 pm - Permalink

hrodstein

This kind of data does not fit neatly into Igor's wave concept which is rigid compared to a spreadsheet which is free form.

The only option for text data that includes non-numeric data (date/time, text such as "CH3I_Oven") is LoadWave/J (Load Delimited Text).

Let us call the number of columns to be loaded "n".

In "auto" mode (you omit /L or specify 0 for the number of columns to load) LoadWave/J determines n from the first line of data. In "explicit mode" (you include /L and specify a non-zero value for the number of columns) you explicitly specify n using /L.

If a row has fewer than n columns, LoadWave treats the corresponding cell as blank (NaN). If a row has more than n columns, LoadWave ignores the extra columns.

Given that and the nature of your data, I would load the rows one-at-a-time using LoadWave/J and specifying one row using /L and letting n be auto. After loading each line, the variable V_Flag will be set to the number of waves load and you can determine how to handle that line using that value or from the instrument ID. You will have to put the data together in whatever format works for you.

Log in or register to post comments

January 31, 2022 at 05:58 pm - Permalink

hrodstein

Another way to do this would be to create a modified version of the file in which an additional first line is added. This first line would be guaranteed to have as many columns as the longest subsequent line. Then load the file without /L. The first line would control the number of columns loaded. Rows with fewer columns would result in blanks.

I created a file containing this data which starts with a 9-column dummy line that I added:

1904-01-01 00:00:00, Dummy, 1, 2, 3, 4, 5, 6, 7
2022-01-17 16:10:01, CH3I_Oven, 40.0, 31.0, 0.0, 0
2022-01-17 16:10:01, Catalyst, 350.0, 349.9, 65.0, 1
2022-01-17 16:10:02, HMP110, 5.143088, 30.871475, -11.390093, 1.635575, 1.411919, 12.576250, 34.792370
2022-01-17 16:10:02, Int_Std_Oven, 38.0, 38.0, 65.0, 1
2022-01-17 16:10:02, Acetate_Oven, 50.0, 49.9, 23.0, 1
2022-01-17 16:10:02, CH3I_Oven, 40.0, 31.0, 0.0, 0
2022-01-17 16:10:02, Catalyst, 350.0, 349.9, 65.0, 1
2022-01-17 16:10:02, HMP110, 5.143088, 30.871475, -11.390093, 1.635575, 1.411919, 12.576250, 34.792370
2022-01-17 16:10:02, Int_Std_Oven, 38.0, 38.0, 64.8, 1
2022-01-17 16:10:03, Acetate_Oven, 50.0, 49.9, 21.8, 1

I then successfully loaded it with this command:

LoadWave/J/D/A=wave/E=1/K=0

You would then have to delete the first row from each of the loaded waves either manually via the table or using a function like this:

Function DeleteFirstRow(listOfWaveNames)
    String listOfWaveNames  // Semicolon-separated list of wave names
    
    int numWaves = ItemsInList(listOfWaveNames)
    int i
    for(i=0; i<numWaves; i+=1)
        String name = StringFromList(i, listOfWaveNames)
        WAVE w = $name
        DeletePoints 0, 1, w
    endfor
End

The LoadWave operation sets the string variable S_waveNames to a semicolon-separates list of the names of the waves loaded so you can invoke DeleteFirstRow like this:

DeleteFirstRow(S_waveNames)

Log in or register to post comments

January 31, 2022 at 06:34 pm - Permalink

lorien

@jtigor, thanks for the suggestion, but I think your solution only works in the case where there are no rows of data with more columns than the very first row.

@hrodstein, thanks for your response! It's super clear and precise. I think I will try to implement your first suggestion. The second suggestion also seems smart, but since I'm trying to create a function that can be shared with coworkers I'd prefer it to be flexible and a "one-step" solution. Therefore, loading each row individually seems the way to go.

I appreciate everyone's input!

Log in or register to post comments

February 1, 2022 at 08:59 am - Permalink