Load data in lines

Hello everyone,

I'm working on a project where data is generated for many different systems and while I have been able to manage with most of the files, this one format (shown below) has me stumped. I wasn't involved in the format schemes and have no control over them. I would be extremely grateful if someone here would help me figure out how to read in this data!

- There are 5 header lines I would like to ignore, except that I do want to take the run number (line 3) from it to use as a prefix for the wavenames
- Each line after the header is the equivalent of one wave, starting with
-- the date/time in the format yyyy-mm-dd_hh:mm:ss.ddd_UTC
-- the "unit" number, i.e. 0, 1 etc
-- the actual data values

# AcqX
# 2011-04-22_12:36:13.390_UTC
# Run 33
# Frequency data (New software)
# Date_Time Iteration Data (4000 points at 100.000000Hz)... Before Precession = 52 s. Precession duration = 40 s.
2011-04-26_19:34:39.342_UTC 0 33162 32345 31586 31055 30877 31092 31653 32426 33239 33899 34257 34231 33826 33135 32319 31563 31045 30879 31107 31677 .....
2011-04-26_19:36:15.130_UTC 1 34111 34747 34889 34499 33670 32589 31510 30681 30289 30428 31062 32047 33155 34131 34749 34871 34471 33633 32559 31490 .....

Many thanks!
I'm not sure if the built-in LoadWave and friends can handle this natively. I assume someone from WaveMetrics will chime in on that.

Here's one solution:

- Open the file with Open.

- Set up a for loop that calls FReadLine refNum, lineContents and breaks when FReadLine returns the empty string (lineContents is the name of a string variable that you declare). Do nothing with the first 5 lines.

- Inside the for loop, get the number of entries in each line by calling ItemsInList(lineContents, " "). Subtract 2 from this to avoid counting the two header entries (or you can parse them if you like).

- Allocate a wave with the appropriate number of points.

- Then set up another for loop inside the first one that loops over all the items. Use StringFromList(j, lineContents, " "), where 'j' is the second loop counter. Note that this is not very efficient but it should get the job done. For increased speed you can try looping over the string directly, though whether it's faster or not is hard to predict. For each item call str2num and store it in the wave.

- Repeat this for all the lines in the file, each time making a new wave. Don't forget to close the file at the end using Close.

For my own convenience I'm assuming that you have some experience with Igor programming, and that these instructions make sense. If not then just let us know.
Here is a solution. It uses the "Load Fortran File" format (LoadWave /F). This requires that each data column occupy the same number of digits as was the case in the sample data you posted.

NOTE: You will need to set the variable numColumns to the right number. It needs to be set to the number of data columns plus five. If this number varies from file to file then the function will need to be modified to count the number of data columns.

// Loads file of this format:
// # AcqX
// # 2011-04-22_12:36:13.390_UTC
// # Run 33
// # Frequency data (New software)
// # Date_Time Iteration Data (4000 points at 100.000000Hz)... Before Precession = 52 s. Precession duration = 40 s.
// 2011-04-26_19:34:39.342_UTC 0 33162 32345 31586 ...
// 2011-04-26_19:36:15.130_UTC 1 34111 34747 34889 ...
// The run number is loaded into a global variable named runNumber in the current data folder.

Function LoadNeutronData(pathName, fileName)
    String pathName         // Name of Igor symbolic path or "" for dialog
    String fileName             // File name or "" for dialog
    // Print columnInfoStr  // For debugging only
   
    Variable firstRow = 5
    Variable numColumns = 25
    Variable defaultFieldWidth = 6
   
    String columnInfoStr = ""
    columnInfoStr += "N=UTCDate,F=6,W=10;"      // The date
    columnInfoStr += "N='_skip_',F=-2,W=1;"     // Skip underscore after the date
    columnInfoStr += "N=UTCTime,F=7,W=12;"      // The time
    columnInfoStr += "N='_skip_',F=-2,W=5;"     // Skip "_UTC" and space after the time
    columnInfoStr += "N=unit,F=0,W=2;"          // Unit number
    columnInfoStr += "C=100,F=0,W=6;"           // Handles the remaining columns
   
    LoadWave /F={numColumns,defaultFieldWidth,0} /R={English,2,2,2,2,"Year-Month-DayOfMonth",40} /O /L={0, firstRow, 0, 0, 0} /B=columnInfoStr /A=data /E=1 /P=$pathName fileName  

    // Combine date and time into date/time
    Wave UTCDate, UTCTime
    UTCDate += UTCTime
    RemoveFromTable UTCTime
    KillWaves /Z UTCTime                        // No longer needed
    ModifyTable format(UTCDate)=8, showFracSeconds(UTCDate)=1, width(UTCDate)=180
   
    // Load the run number
    String filePath = S_path + S_fileName       // Set by LoadWave
    String text
    Variable refNum
    Open /R refNum as filePath
    FReadLine refNum, text                      // Skip first line
    FReadLine refNum, text                      // Skip second line
    FReadLine refNum, text
    Variable/G runNumber                        // Creates global variable
    sscanf text, "# Run %d", runNumber          // Sets global variable
    Close refNum   
End

Function LoadNeutronDataDialog()
    LoadNeutronData("", "")
End

Thank you so much!

I just tested the code from you, hrodstein, and I've realized that I didn't describe the situation properly.
All the points in an individual line (about 4000 of them in this case) make up one wave. In the extract I showed, there are then 2 waves, each with 20 points in the excerpt (4000 in reality). For example, wave 0 started at 2011-04-26_19:34:39.342_UTC, and has data points
33162
32345
31586
..
..

My apologies for not being clearer. Ideally I would try 741's suggestion on my own, but I'm a little pressed for time. :-(
Here you go. Copy and paste this into the procedure window (press command-M or control-M). I've added a new menu entry to the macros menu, just select that, choose a file, and it will load a set of waves with the names W_Neutron0, W_Neutron1, etc.

Menu "Macros"
    "Load Neutron", /Q, LoadNeutron()
End

constant kNeutronHeaderLength = 5   // number of lines to skip at the beginning of the file
constant knEntriesToSkip = 2 // number of entries to skip at the beginning of a line
                                    // entries are separated by spaces, so "2011-04-26_19:34:39.342_UTC"
                                    // is a single entry
constant knTrailingSpaces = 0   // IMPORTANT: set this the number of trailing spaces at the end of the line

Function LoadNeutron()
   
    variable refNum
   
    // this selects the file but does not open it
    Open /R/D /F="All Files:.*;" refNum
    if (strlen(S_fileName) == 0)
        // user cancel
        return 0
    endif
   
    // open the file
    Open /R refNum as S_fileName
   
    string lineContent
    variable i
   
    // the first lines are header only, skip these
    for (i = 0; i < kNeutronHeaderLength; i+=1)
        FReadLine refNum, lineContent
    endfor
   
    // the main loop
    // read all lines separately. Each line is a full wave
    variable nValues, j
    string outputWaveName, strValue
    for (i = 0; ; i += 1)
        FReadLine refNum, lineContent
        if (strlen(lineContent) == 0)
            // no more data to be read
            return 0
        endif
       
        nValues = ItemsInList(lineContent, " ") - knEntriesToSkip - knTrailingSpaces
        outputWaveName = "W_Neutron" + num2str(i)
       
        Make /O/N=(nValues) /D $outputWaveName
        wave output = $outputWaveName
       
        // get out the numbers
        for (j = 0; j < nValues; j+=1)
            strValue = StringFromList(j + knEntriesToSkip, lineContent, " ")
            output[j] = str2num(strValue)
        endfor
    endfor
   
    Close refNum
End
   



There are a few gotcha's involved since this is a very impromptu parsing approach.
- I'm ignoring the first two entries on a line, are those important for you?
- If there are trailing spaces after the last value of each line, they will mess up the counting of the number of values. To address this, check the number of trailing spaces on the data lines in a real file (likely zero or one) and change constant knTrailingSpaces = 0 to the correct value.
- The function does not check if there are any previously loaded W_Neutron waves, it just makes the required number and overwrites any that may exist. This is important not only because data could be lost, but also because the following could happen: say you load a file with 10 datasets, and then you load one with 5 datasets. The first load will create W_Neutron1 through 10, and the second will overwrite W_Neutron1 through 5, but leave 6 through 10 untouched. This might fool you into thinking that the second file contained 10 datasets.

All of this can be addressed by making the function a bit more clever, but this is just the bare minimum.
Here is another crack at it, adapted from the "Load Row Data" procedure file that ships with Igor.

This skips the date/time data and loads the rest of each row into a wave.

Menu "Load Waves"
    "Load Neutron Data . . .", LoadNeutronData("", "")
End

static Function ExtractRowDataInto1DWaves(mat, baseName)
    Wave mat                        // Matrix containing row-oriented data
    String baseName             // Base name of output waves
   
    Variable numRows, numColumns
    Variable row
    String name
   
    numRows = DimSize(mat, 0)
    numColumns = DimSize(mat, 1)
   
    row = 0
    do
        name = baseName + num2istr(row)
        Make/O/D/N=(numColumns) $name
   
        // Store matrix data in the output wave.
        Wave w = $name
        w = mat[row][p]
        row += 1
    while (row <= numRows-1)
End

Function LoadNeutronData(pathName, fileName)
    String pathName                     // Name of path or "" for dialog.
    String fileName                         // Name of file or "" for dialog.

    Variable linesToSkip = 5
    Variable linesToLoad = 0                // Number of lines to load or 0 for auto (load all lines).
    Variable columnsToSkip = 2              // Number of columns to skip. Skips date/time data and unit number.
    Variable columnsToLoad = 0          // Number of columns to load or 0 for auto (load all columns).
    String baseName ="data"             // Base name to use for new waves.
    Variable makeTable = 1                  // 1 == make a table showing new waves.
   
    LoadWave/Q/J/M/D/A=tempLoadRowDataMatrix/P=$pathName/K=0/L={0,linesToSkip,linesToLoad,columnsToSkip,columnsToLoad} /V={" ", "", 0, 1} fileName
    if (V_flag == 0)
        return -1       // Probably user cancelled
    endif

    ExtractRowDataInto1DWaves(tempLoadRowDataMatrix0, baseName)
    Variable numRows = DimSize(tempLoadRowDataMatrix0, 0)
    KillWaves tempLoadRowDataMatrix0
   
    if ((makeTable==1) %& (numRows>0))
        Variable row = 0
        String name
        Edit
        do
            name = baseName + num2istr(row)
            AppendToTable $name
            row += 1
        while (row <= numRows-1)
    endif
   
    return 0
End

Urm, I can't figure out what I'm doing incorrectly, but something is awry.
One thing is that the delimiter is actually a tab rather than a space. Is that a problem?
I've checked the trailing spaces: it is zero.

hrodstein: With your version, I get the error message shown in the -0 image.
741: Your version goes through the motion, but it doesn't load the data in the file (-1 image).

For now, I'm going to take a break. I'm using emacs and sed to convert the files to formats I can read in with Igor, and I'll return to this problem later. Thank you two for your efforts -- I'll report back if I make any progress.
load-0.jpg load-1.jpg
neutron wrote:

One thing is that the delimiter is actually a tab rather than a space. Is that a problem?


Possibly. The example you posted in your first post loaded fine with my code, but it had spaces. Howard's code might be failing for the same reason.

This should deal with tabs:
Menu "Macros"
    "Load Neutron", /Q, LoadNeutron()
End
 
constant kNeutronHeaderLength = 5   // number of lines to skip at the beginning of the file
constant knEntriesToSkip = 2 // number of entries to skip at the beginning of a line
                                    // entries are separated by spaces, so "2011-04-26_19:34:39.342_UTC"
                                    // is a single entry
constant knTrailingSpaces = 0   // IMPORTANT: set this the number of trailing tabs at the end of the line
 
Function LoadNeutron()
 
    variable refNum
 
    // this selects the file but does not open it
    Open /R/D /F="All Files:.*;" refNum
    if (strlen(S_fileName) == 0)
        // user cancel
        return 0
    endif
 
    // open the file
    Open /R refNum as S_fileName
 
    string lineContent
    variable i
 
    // the first lines are header only, skip these
    for (i = 0; i < kNeutronHeaderLength; i+=1)
        FReadLine refNum, lineContent
    endfor
 
    // the main loop
    // read all lines separately. Each line is a full wave
    variable nValues, j
    string outputWaveName, strValue
    for (i = 0; ; i += 1)
        FReadLine refNum, lineContent
        if (strlen(lineContent) == 0)
            // no more data to be read
            return 0
        endif
 
        nValues = ItemsInList(lineContent, "\t") - knEntriesToSkip - knTrailingSpaces
        outputWaveName = "W_Neutron" + num2str(i)
 
        Make /O/N=(nValues) /D $outputWaveName
        wave output = $outputWaveName
 
        // get out the numbers
        for (j = 0; j < nValues; j+=1)
            strValue = StringFromList(j + knEntriesToSkip, lineContent, "\t")
            output[j] = str2num(strValue)
        endfor
    endfor
 
    Close refNum
End


Notice that all I've done is change " " to "\t" in the calls to StringFromList and ItemsInList.

If this doesn't work either then you might want to include an actual data file.
741 wrote:

Notice that all I've done is change " " to "\t" in the calls to StringFromList and ItemsInList.


And we have a winner... :-)
I can't explain why the tabs got changed to spaces when I pasted the extract into my original post.
So cool, thanks a lot!

Have a nice weekend!
neutron wrote:
hrodstein: With your version, I get the error message shown in the -0 image.
741: Your version goes through the motion, but it doesn't load the data in the file (-1 image).


I did assume that the delimiter was a space since that is what was in your original post.

If you want to send a sample file to support@wavemetrics.com, I will see what is going on. If so, please zip it to prevent line wrapping by the email process.
The problem was the issue of the delimiter. It appeared to be a space but was a tab. Here is a version that works on a file that neutron sent to me for testing. The only change was to remove the /V flag which set the delimiter to space.

Menu "Load Waves"
    "Load Neutron Data . . .", LoadNeutronData("", "")
End
 
static Function ExtractRowDataInto1DWaves(mat, baseName)
    Wave mat                        // Matrix containing row-oriented data
    String baseName             // Base name of output waves
 
    Variable numRows, numColumns
    Variable row
    String name
 
    numRows = DimSize(mat, 0)
    numColumns = DimSize(mat, 1)
 
    row = 0
    do
        name = baseName + num2istr(row)
        Make/O/D/N=(numColumns) $name
 
        // Store matrix data in the output wave.
        Wave w = $name
        w = mat[row][p]
        row += 1
    while (row <= numRows-1)
End
 
Function LoadNeutronData(pathName, fileName)
    String pathName                     // Name of path or "" for dialog.
    String fileName                         // Name of file or "" for dialog.
 
    Variable linesToSkip = 5
    Variable linesToLoad = 0                // Number of lines to load or 0 for auto (load all lines).
    Variable columnsToSkip = 2              // Number of columns to skip. Skips date/time data and unit number.
    Variable columnsToLoad = 0          // Number of columns to load or 0 for auto (load all columns).
    String baseName ="data"             // Base name to use for new waves.
    Variable makeTable = 1                  // 1 == make a table showing new waves.
 
    LoadWave/Q/J/M/D/A=tempLoadRowDataMatrix/P=$pathName/K=0/L={0,linesToSkip,linesToLoad,columnsToSkip,columnsToLoad} fileName
    if (V_flag == 0)
        return -1       // Probably user cancelled
    endif
 
    ExtractRowDataInto1DWaves(tempLoadRowDataMatrix0, baseName)
    Variable numRows = DimSize(tempLoadRowDataMatrix0, 0)
    KillWaves tempLoadRowDataMatrix0
 
    if ((makeTable==1) %& (numRows>0))
        Variable row = 0
        String name
        Edit
        do
            name = baseName + num2istr(row)
            AppendToTable $name
            row += 1
        while (row <= numRows-1)
    endif
 
    return 0
End