Reading custom binary file - really slow

 

 

Hi there,

I have to import an unusual filetype into Igor pro, from some electrophysiological hardware. The datafile format is a little odd - it has 1024 bytes of header info in ASCII, followed by binary with multiple records parsed into the following chunks:

  • One little-endian int64 timestamp (actually a sample number; this can be converted to seconds using the sampleRate variable in the header)
  • One little-endian uint16 number (N) indicating the samples per record (always 1024, at least for now)
  • One little-endian uint16 recording number (version 0.2 and higher)
  • 1024 big-endian int16 samples
  • 10-byte record marker (0 1 2 3 4 5 6 7 8 255)

I've been using FBinRead to bring the data into Igor 64 (in Igor 7). My two strategies have been to load in the actual data into separate waves of chunks of 1024, or to import the entire file and then delete the points that don't include data (code below). Both methods are painfully slow - the former takes around 7.5 minutes to load ~900 seconds of data (30kHz samplng) while the latter takes around the same amount of time. Loading the whole binary file (including unwanted info) takes around one second, so the lost time is the massive number of repeated wave operations. Is there a clever way of speeding this up? If it were only 7.5 mins per experiment, it would be fine, but that is for a single channel and my experiment records 64 channels at a time, so I'm looking at about 7 hours to import an experiment...

Any advice to optimise this would be gratefully received! The hardware manufacturers provide Python and Matlab code that can import a file within less than a minute, so I assume there must be a better way...

Loading 1024-byte chunks:

	FSetPos reference, start_byte						// Skip header
	Make /o /n=1 $(NewName)
	
	Wave ImportedWave = $(FileName)
	
	variable read_point
	make /o /n=1024 read_buffer
	wave read_buffer 
	Variable maximum_byte
	Fstatus reference
	maximum_byte =V_logEOF 
	start_byte +=12
	
	//start_byte=maximum_byte
	Variable bitVs = bitVolts[0]
	Do
		if(start_byte<maximum_byte)
			FSetPos reference, start_byte
			make /o /n=1024 $("read_buffer"+num2str(counter))
			wave read_buffer  = $("read_buffer"+num2str(counter))
			FBinRead /B=2 /F=2 reference, read_buffer
			read_buffer *= bitVs
			counter+=1
			start_byte+=2070
		endif
	while(start_byte<maximum_byte)
	
	Concatenate /kill /o /NP Wavelist("read_buffer*",";",""), ImportedWave
	
	variable samplingF = 1/samplerate[0]
	
	SetScale /P x 0, samplingF, "s", ImportedWave
	Setscale /P y, 0, 0, "V", ImportedWave
	ImportedWave /= 1e6			// convert from µV to V

 

Deleting unwanted points: 

FSetPos reference, start_byte						// Skip header
	Fstatus reference
	Variable maximum_byte
	maximum_byte =V_logEOF 
	
	Make /o /n=((maximum_byte-1024)/2) $(NewName)		// convert size from bytes to int16
	
	Wave ImportedWave = $(NewName)
	
	variable read_point
	Variable bitVs = bitVolts[0]
	
	FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
	ImportedWave *= bitVs				// scale wave
	
	Variable first_point = 0
	variable last_point = numpnts(ImportedWave)
	
	// trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
	// 2 bytes = 1 point
	
	Do
		deletepoints first_point, 6, ImportedWave		
		deletepoints (first_point+1024), 5, ImportedWave
		first_point +=1024
	while(first_point<last_point)

	
	variable samplingF = 1/samplerate[0]
	
	SetScale /P x 0, samplingF, "s", ImportedWave
	Setscale /P y, 0, 0, "V", ImportedWave
	ImportedWave /= 1e6			// convert from µV to V
	

 

The deletepoints operation is likely the issue. Off the top of my head, I have to wonder whether you could instead do this ...

* Split the source data wave into separate "chunks" with something like

chunk1 = source[pstart1,pend1]
chunk2 = source[pstart2,pend2]
...

(a cleverer method would be to do this using a matrix and a for loop)

* Concatenate the chunks back to a contiguous wave.

I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

Instead, create another wave, call it OutputWave, of the correct final size. Then copy the wanted points from ImportedWave to OutputWave. Then kill ImportedWave.

 

Thank you both - the concatenate method cut the run time down to about 4 minutes for the same test data, and the duplicating and copying points method from hrodstein got it down to around 6 seconds. Happy days! :)

In reply to by hrodstein

hrodstein wrote:

I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

 

Is there a performance issue if you kill points from the "bottom" end of the wave?

Is there a performance issue if you kill points from the "bottom" end of the wave?

Deleting from the end would be better but still not good.

Anytime you delete points, any points after the deleted points must be moved in memory. When done in a loop, this will be slow.

In some cases you can use the Extract operation to create a wave with deleted points. Extract handles the speed issue internally.

In reply to by hrodstein

Another approach, which might or might not be faster, is to set the value of points you want to delete to NaN (not a number) within the loop. Then, after the loop has executed, call WaveTransform zapNaNs. This assumes that NaN is an invalid value in your particular waves. If you expect any NaNs, then obviously you wouldn't want to do this.

Ah yes, good point. Here we go:

 

Function LoadOpenEphysData(FileName,NewName)
	
	string FileName, NewName
	print "Start:", time()

	SetDataFolder root:
	
	variable start_byte = 1024			// Skip 1024 byte header
	variable Reference						// file reference

	open /R  /P=DataFolder Reference as filename	// path already defined as "datafolder"
	// read header
	string buffer
	variable buffer2
	make /o /n=1 /T format,description,date_created
	make /o /n=1 version,header_bytes,channel,channelType,sampleRate,blockLength,bufferSize,bitVolts
	Freadline /T=";" reference, buffer
	format = buffer
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.version = %f;", buffer2
	version = buffer2
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.header_bytes = %f;", buffer2
	header_bytes = buffer2
	Freadline /T=";" reference, buffer
	description = buffer
	Freadline /T=";" reference, buffer
	date_created = buffer
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.channel = %f;", buffer2
	channel = buffer2
	Freadline /T=";" reference, buffer
//	channelType = buffer2
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.sampleRate = %f;", buffer2
	sampleRate = buffer2
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.blockLength = %f;", buffer2
	blockLength = buffer2
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.bufferSize = %f;", buffer2
	bufferSize = buffer2
	Freadline /T=";" reference, buffer
	sscanf buffer, " \nheader.bitVolts = %f;", buffer2
	bitVolts = buffer2
	Fgetpos reference

	FSetPos reference, start_byte						// Skip header
	Fstatus reference
	Variable maximum_byte
	maximum_byte =V_logEOF 
	
	Make /o /n=((maximum_byte-1024)/2) ImportedWave	// convert size from bytes to int16
	
	
	variable num_records=numpnts(ImportedWave)/1035
	Make /o /n=(1024*num_records) /o $(NewName)
	Wave OutputWave = $(NewName)	
	
	variable read_point
	Variable bitVs = bitVolts[0]
	
	FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
	ImportedWave *= bitVs				// scale wave
	
	variable first_point = 0
	variable last_point = numpnts(ImportedWave)
	Variable input_left=6
	variable input_right=1029
	variable output_left=0
	variable output_right=1023
	
	// trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
	// 2 bytes = 1 point
	
	variable counter = 0
	Do
		OutputWave[output_left,output_right]=ImportedWave[p+input_left]
		input_left+=11
		input_right+=1035
		output_left+=1024
		output_right+=1024
		first_point+=1024				// legacy counter, can switch this at some point
		counter+=1
	while(first_point<=last_point)


	
	variable samplingF = 1/samplerate[0]
	
	SetScale /P x 0, samplingF, "s", OutputWave
	Setscale /P y, 0, 0, "V", OutputWave
	OutputWave /= 1e6			// convert from µV to V
	
	close reference
	killwaves importedwave
	print "End:",time(),"; loaded"+nameofwave(OutputWave)
	close reference
	print ("Loaded "+nameofwave(OutputWave))
End