Reading custom binary file - really slow

 

 

Hi there,

I have to import an unusual filetype into Igor pro, from some electrophysiological hardware. The datafile format is a little odd - it has 1024 bytes of header info in ASCII, followed by binary with multiple records parsed into the following chunks:

  • One little-endian int64 timestamp (actually a sample number; this can be converted to seconds using the sampleRate variable in the header)
  • One little-endian uint16 number (N) indicating the samples per record (always 1024, at least for now)
  • One little-endian uint16 recording number (version 0.2 and higher)
  • 1024 big-endian int16 samples
  • 10-byte record marker (0 1 2 3 4 5 6 7 8 255)

I've been using FBinRead to bring the data into Igor 64 (in Igor 7). My two strategies have been to load in the actual data into separate waves of chunks of 1024, or to import the entire file and then delete the points that don't include data (code below). Both methods are painfully slow - the former takes around 7.5 minutes to load ~900 seconds of data (30kHz samplng) while the latter takes around the same amount of time. Loading the whole binary file (including unwanted info) takes around one second, so the lost time is the massive number of repeated wave operations. Is there a clever way of speeding this up? If it were only 7.5 mins per experiment, it would be fine, but that is for a single channel and my experiment records 64 channels at a time, so I'm looking at about 7 hours to import an experiment...

Any advice to optimise this would be gratefully received! The hardware manufacturers provide Python and Matlab code that can import a file within less than a minute, so I assume there must be a better way...

Loading 1024-byte chunks:

    FSetPos reference, start_byte                       // Skip header
    Make /o /n=1 $(NewName)
   
    Wave ImportedWave = $(FileName)
   
    variable read_point
    make /o /n=1024 read_buffer
    wave read_buffer
    Variable maximum_byte
    Fstatus reference
    maximum_byte =V_logEOF
    start_byte +=12
   
    //start_byte=maximum_byte
    Variable bitVs = bitVolts[0]
    Do
        if(start_byte<maximum_byte)
            FSetPos reference, start_byte
            make /o /n=1024 $("read_buffer"+num2str(counter))
            wave read_buffer  = $("read_buffer"+num2str(counter))
            FBinRead /B=2 /F=2 reference, read_buffer
            read_buffer *= bitVs
            counter+=1
            start_byte+=2070
        endif
    while(start_byte<maximum_byte)
   
    Concatenate /kill /o /NP Wavelist("read_buffer*",";",""), ImportedWave
   
    variable samplingF = 1/samplerate[0]
   
    SetScale /P x 0, samplingF, "s", ImportedWave
    Setscale /P y, 0, 0, "V", ImportedWave
    ImportedWave /= 1e6         // convert from µV to V

 

Deleting unwanted points: 

FSetPos reference, start_byte                       // Skip header
    Fstatus reference
    Variable maximum_byte
    maximum_byte =V_logEOF
   
    Make /o /n=((maximum_byte-1024)/2) $(NewName)       // convert size from bytes to int16
   
    Wave ImportedWave = $(NewName)
   
    variable read_point
    Variable bitVs = bitVolts[0]
   
    FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
    ImportedWave *= bitVs               // scale wave
   
    Variable first_point = 0
    variable last_point = numpnts(ImportedWave)
   
    // trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
    // 2 bytes = 1 point
   
    Do
        deletepoints first_point, 6, ImportedWave      
        deletepoints (first_point+1024), 5, ImportedWave
        first_point +=1024
    while(first_point<last_point)

   
    variable samplingF = 1/samplerate[0]
   
    SetScale /P x 0, samplingF, "s", ImportedWave
    Setscale /P y, 0, 0, "V", ImportedWave
    ImportedWave /= 1e6         // convert from µV to V
   

 

The deletepoints operation is likely the issue. Off the top of my head, I have to wonder whether you could instead do this ...

* Split the source data wave into separate "chunks" with something like

chunk1 = source[pstart1,pend1]
chunk2 = source[pstart2,pend2]
...

(a cleverer method would be to do this using a matrix and a for loop)

* Concatenate the chunks back to a contiguous wave.

I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

Instead, create another wave, call it OutputWave, of the correct final size. Then copy the wanted points from ImportedWave to OutputWave. Then kill ImportedWave.

 

Thank you both - the concatenate method cut the run time down to about 4 minutes for the same test data, and the duplicating and copying points method from hrodstein got it down to around 6 seconds. Happy days! :)

In reply to by hrodstein

hrodstein wrote:

I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

 

Is there a performance issue if you kill points from the "bottom" end of the wave?

Is there a performance issue if you kill points from the "bottom" end of the wave?

Deleting from the end would be better but still not good.

Anytime you delete points, any points after the deleted points must be moved in memory. When done in a loop, this will be slow.

In some cases you can use the Extract operation to create a wave with deleted points. Extract handles the speed issue internally.

In reply to by hrodstein

Another approach, which might or might not be faster, is to set the value of points you want to delete to NaN (not a number) within the loop. Then, after the loop has executed, call WaveTransform zapNaNs. This assumes that NaN is an invalid value in your particular waves. If you expect any NaNs, then obviously you wouldn't want to do this.

Ah yes, good point. Here we go:

 

Function LoadOpenEphysData(FileName,NewName)
   
    string FileName, NewName
    print "Start:", time()

    SetDataFolder root:
   
    variable start_byte = 1024          // Skip 1024 byte header
    variable Reference                      // file reference

    open /R  /P=DataFolder Reference as filename    // path already defined as "datafolder"
    // read header
    string buffer
    variable buffer2
    make /o /n=1 /T format,description,date_created
    make /o /n=1 version,header_bytes,channel,channelType,sampleRate,blockLength,bufferSize,bitVolts
    Freadline /T=";" reference, buffer
    format = buffer
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.version = %f;", buffer2
    version = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.header_bytes = %f;", buffer2
    header_bytes = buffer2
    Freadline /T=";" reference, buffer
    description = buffer
    Freadline /T=";" reference, buffer
    date_created = buffer
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.channel = %f;", buffer2
    channel = buffer2
    Freadline /T=";" reference, buffer
//  channelType = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.sampleRate = %f;", buffer2
    sampleRate = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.blockLength = %f;", buffer2
    blockLength = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.bufferSize = %f;", buffer2
    bufferSize = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.bitVolts = %f;", buffer2
    bitVolts = buffer2
    Fgetpos reference

    FSetPos reference, start_byte                       // Skip header
    Fstatus reference
    Variable maximum_byte
    maximum_byte =V_logEOF
   
    Make /o /n=((maximum_byte-1024)/2) ImportedWave // convert size from bytes to int16
   
   
    variable num_records=numpnts(ImportedWave)/1035
    Make /o /n=(1024*num_records) /o $(NewName)
    Wave OutputWave = $(NewName)   
   
    variable read_point
    Variable bitVs = bitVolts[0]
   
    FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
    ImportedWave *= bitVs               // scale wave
   
    variable first_point = 0
    variable last_point = numpnts(ImportedWave)
    Variable input_left=6
    variable input_right=1029
    variable output_left=0
    variable output_right=1023
   
    // trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
    // 2 bytes = 1 point
   
    variable counter = 0
    Do
        OutputWave[output_left,output_right]=ImportedWave[p+input_left]
        input_left+=11
        input_right+=1035
        output_left+=1024
        output_right+=1024
        first_point+=1024               // legacy counter, can switch this at some point
        counter+=1
    while(first_point<=last_point)


   
    variable samplingF = 1/samplerate[0]
   
    SetScale /P x 0, samplingF, "s", OutputWave
    Setscale /P y, 0, 0, "V", OutputWave
    OutputWave /= 1e6           // convert from µV to V
   
    close reference
    killwaves importedwave
    print "End:",time(),"; loaded"+nameofwave(OutputWave)
    close reference
    print ("Loaded "+nameofwave(OutputWave))
End