Parallel For-Loops for Matrix Operations

Function MWE_RIXS() // MWE of RIXS map import variable i String ImpList = "" NewPath/O/Q Sympath PathInfo Sympath String pathname = S_path Variable TimerA = StartMSTimer String fileName String fileList = IndexedFile(Sympath, -1, ".txt") for(i=0; i<ItemsInList(fileList); i+=1) fileName = StringfromList(i,fileList) LoadWave/A/G/D/O/Q/M/P=Sympath fileName Variable ext0 = strsearch(FileName,"_",Inf,1) fileName = S_fileName[0,ext0-1] String XESimagName = "XESimag_" + fileName String XESspecName = "XESspec_" + fileName Wave ImpMatrix = $StringFromList(0, S_waveNames) Rename ImpMatrix, $XESimagName Duplicate ImpMatrix, $XESspecName Wave ImpCurve = $XESspecName MatrixTranspose ImpCurve MatrixOP/O ImpCurve=sumCols(ImpCurve) MatrixTranspose ImpCurve KillWaves/Z ImpMatrix Variable Energy = round(str2num(FileName) / 100) / 10 String LastEnergy = StringFromList(ItemsInList(fileList)-1,fileList) Variable LastEnergy1 = round(str2num(LastEnergy) / 100) / 10 Print "Importing RIXS Map: Current Energy is " + num2str(Energy) + " / " + num2str(LastEnergy1) + " eV" ImpList += NameOfWave(ImpCurve) + ";" endfor Make/O/D/N=(ItemsInList(ImpList)+1) $("E_eV") Wave eWave = $("E_eV") Variable Enum for(i=0; i<ItemsInList(ImpList); i+=1) String XEScurve = StringFromList(i,ImpList) Concatenate/NP {$XEScurve}, RIXS sscanf XEScurve, "XESspec_%i",Enum Enum /=100 eWave[i] = round(Enum) endfor eWave /= 10 Variable TimerB = StopMSTimer(TimerA) Print "Processing Time = " + num2str(round(TimerB/10E5)) + " seconds" End

aclight

Here is a modified version of your code that does the load in multiple threads and is somewhat faster. I've also made some other changes that theoretically improve performance but, in this situation, don't really matter since they weren't bottlenecks to begin with. Read the comments for information on why I changed things. Note that my version below does not keep the individual waves loaded from each file like your original code does.

Function MWE_RIXS()
	//  MWE of RIXS map import
	variable i
	String ImpList = ""
   
	NewPath/O/Q Sympath
	PathInfo Sympath
	String pathname = S_path
   
	Variable TimerA = StartMSTimer
   
	String fileName
	String fileList = IndexedFile(Sympath, -1, ".txt")
	// NOTE: The order of files in fileList is undefined. If it matters,
	// then you must sort them in order to ensure that the files
	// are ordered how your code expects them to be ordered.
	fileList = SortList(fileList, ";", 16)
    
	WAVE/T fileListWave = ListToTextWave(fileList, ";")
	Variable numFiles = numpnts(fileListWave)
	Make/O/FREE/WAVE/N=(numFiles) ImpWaveWave
	
	MultiThread ImpWaveWave = LoadDataFromFile(pathname, fileListWave[p])
   
	Make/O/D/N=(numFiles+1) $("E_eV")
	Wave eWave = $("E_eV")
	Variable Enum
    
	Concatenate {ImpWaveWave}, RIXS
   MultiThread eWave[0,numFiles-1] = ExtractEnumFromName(nameofwave(ImpWaveWave[p])) / 10
     
	Variable TimerB = StopMSTimer(TimerA)
	Print "Processing Time = " + num2str(round(TimerB/10E5)) + " seconds"
   
End

ThreadSafe Function/WAVE LoadDataFromFile(String pathOnDisk, String fileName)
	LoadWave/A/G/D/O/Q/M pathOnDisk + fileName
	Variable ext0 = strsearch(FileName,"_",Inf,1)
	fileName = S_fileName[0,ext0-1]
   
	String XESimagName = "XESimag_" + fileName
	String XESspecName = "XESspec_" + fileName
   
	Wave ImpMatrix = $StringFromList(0, S_waveNames)
	// There was no need to duplicate the wave here, just change the
	// 2nd argument to Rename. With this change, I also removed
	// the KillWave command that came later.
	Rename ImpMatrix, $XESspecName
	Wave ImpCurve = $XESspecName

	// This can be done in one line, as below. But doing it as one
	// line leaves the output as a 1D wave, while the original
	// code left it as a 2D wave with 1 column.
	// Therefore I removed the /NP flag from the call to
	// concatenate in the calling function.
	//    MatrixTranspose ImpCurve
	//    MatrixOP/O ImpCurve=sumCols(ImpCurve)
	//    MatrixTranspose ImpCurve

	MatrixOP/O ImpCurve=sumCols(ImpCurve^t)^t
   
	return ImpCurve
End

ThreadSafe Function ExtractEnumFromName(String name)
	Variable Enum
	sscanf name, "XESspec_%i",Enum
	Enum /=100
	Enum = round(Enum)
	return Enum
End

On my Windows machine running 8.04 Beta 1 with a 16 core/32 thread processor, the time it takes to load all of the data went from about 120 seconds to about 28 seconds (your original code vs. my code above). [Edited with correct information, see next comment]

I sampled Igor while running your code and looked at what function calls took the most time. Note this is something I can do since I have access to Igor's debugging symbols but a regular user could not do this. Much of the time is spent by the different threads waiting to acquire mutexes that are used to protect the file handling code.

On Windows, the type of mutex we use is slow to acquire. Changing to a more lightweight mutex is something we're looking into for Igor 9. On Macintosh, we're already using relatively fast mutexes. You didn't say which platform you're using.

I believe that you could improve performance somewhat by specifying the numLines parameter of LoadWave's /L flag. It looks like that value is constant at least for the data set you attached, but I don't know if it's constant for all of your data sets. If it is, then I would hard code that value. Otherwise you could load a single file to determine the number of lines and then do the rest of the loading as I do above, in multiple threads.

As the "Loading Very Large Files" section of the LoadWave documentation details, on Windows if a file is over 500K Igor will automatically determine the number of lines in the file. However my profiling data shows that doing this takes about 20% of the total time to load one of the files, so if you can pre-specify this value that will help.

If you use my code above, you might play around with adding the /NT flag to the following wave assignment statement:

MultiThread ImpWaveWave = LoadDataFromFile(pathname, fileListWave[p])

Due to the issues with mutexes, it may be that the optimal number of threads to use is less than the available number of threads but still greater than 1.

As a smaller point, can Igor be directed directly to the zip file rather than manually unzipping it?

There's nothing built into Igor that can unzip a file. You could either call ExecuteScriptText to use a shell program to unzip, or you might try the ZIP XOP (https://github.com/IP-XOP/ZIP).

November 11, 2019 at 10:55 am - Permalink

aclight

Correction: With my code in the comment above, it takes 28 seconds, not 85 seconds, to load all of the data. The 85 second test was with my sampling profiler enabled, which causes a surprising slowdown in this case.

November 15, 2019 at 09:12 am - Permalink

Shannon

Thanks a lot for that code!

I was unfortunately trapped in some other tasks the last weeks but have now tested it on my laptop. It's a Win10 running Igor 8.04 (34722) with 2 core/4 thread i7 processor. The retesting the original code required 150 sec for me. Defining the number of lines with the LoadWave /L flag cuts this to 112 sec. Your multithreaded code runs at 72 sec, but drops to 53 sec with number of lines defined. This is a huge improvement.

The dimensions of the text file matrices are consistent since they represent pixels of a detector, thus defining the size in advance is wise.

Thanks for the other tips about moving the matrix operations into one line as well!

December 3, 2019 at 04:35 am - Permalink

aclight

There are two other tweaks you can make to my most recent code:

1. Add the /ENCG={1,4} flag to the LoadWave command. If you do this, you must be using Igor Pro 8.04 or later--otherwise, Igor will crash. This tells Igor that your text file contains UTF-8 encoded data. Your data is ASCII, a subset of UTF-8, to telling Igor this lets it save a bit of time since it doesn't need to check.

2. Change your MatrixOp command to:

MatrixOP/O ImpCurve=sumRows(ImpCurve)

Here is the modified version of the LoadDataFromFile worker function:

// Uses LoadWave to do the loading
ThreadSafe Function/WAVE LoadDataFromFile(String pathOnDisk, String fileName)
	LoadWave/A/G/D/O/Q/M/ENCG={1,4} pathOnDisk + fileName
	Variable ext0 = strsearch(FileName,"_",Inf,1)
	fileName = S_fileName[0,ext0-1]
   
	String XESimagName = "XESimag_" + fileName
	String XESspecName = "XESspec_" + fileName
   
	Wave ImpMatrix = $StringFromList(0, S_waveNames)
	// There was no need to duplicate the wave here, just change the
	// 2nd argument to Rename. With this change, I also removed
	// the KillWave command that came later.
	Rename ImpMatrix, $XESspecName
	Wave ImpCurve = $XESspecName
	MatrixOP/O ImpCurve=sumRows(ImpCurve)
   
	return ImpCurve
End

I have made several changes to Igor 9 that dramatically improve the load time. On my machine, I'm down to about 4-5 seconds to load all of the files. The biggest change I made will impact performance only when loading using multiple threads at once, and the more threads you have the bigger improvement you get. That matters a lot on my 32 thread machine, but less on your 4 thread machine. But it should still help.

The other big change I made is that Igor is now using a much more performant algorithm for inspecting the individual bytes in the text file (eg. is this a number, letter, tab, etc.). That change improves performance by about 20%.

We'll hopefully start beta testing Igor 9 early next year, so keep your eyes out on the forums for beta testing information if you are interested.

Thanks for posting your original question and providing an interesting problem. You might not have intended this, but it demonstrated several bottlenecks in Igor that we've fixed to make text file loading much faster, and some of the fixes will improve performance in other ways as well.

December 3, 2019 at 08:01 am - Permalink

Parallel For-Loops for Matrix Operations

Igor Pro 10

Igor XOP Toolkit

Igor NIDAQ Tools MX