GetFileFolderInfo: check only for a few properties

Hi all,

I am developing a program for the user to load the last saved file in a path. Usually, in our environment, users can save files in folder trees randomly (pointless to track where the previous file was saved etc)

I use GetFileFolderInfo in a modified version of WM routine PrintFoldersAndFiles(pathName, extension, recurse, level) to check for file creation dates and find the newest file.

The problem is the speed of operation. Now I have ~1500 files in my folder tree and the operation takes 10 secs! The folder trees have usually more files so the operation will be even slower.

In the same folder structure I called a python script from Igor pro:

"python3 -c \\\"from pathlib import Path;from os.path import getmtime;print(max(list(Path('%s').rglob('*%s')),key=getmtime))"

and the operation took 0.2 secs!

I believe that the bottleneck is the amount of info GetFileFolderInfo checks for. Am I wrong?

Is it possible to add a bit switch to allow GetFileFolderInfo to search only for specific properties (e.g modification date)?

Thanks

You haven't given us the code you are using to call GetFileFolderInfo, so it's hard to say whether changes to your code could make doing this in Igor faster. But I doubt you could this particular thing in Igor faster than in Python because Igor does not give you access to a lower level file/path object like Python does. With the exception of commands that work on open files and take refNum parameters, file access in Igor is based on the name or full path of the file, and looking up that file using the OS's file API commands is relatively slow.

My suggestions to get the best performance possible are as follows:

1. To the extent possible, use the GetFileFolderInfo /P flag and provide the name of an Igor symbolic path and a file name instead of providing a full or partial path via fileOrFolderNameStr. That way Igor only needs to look up a file name within a path for which it already has an internal descriptor.

2. I assume you are using IndexedFile to get a list of file names in a directory. Make sure you are passing -1 as the index parameter so you get a list of all files at once instead of getting one file name at a time. That should be much faster.

If you want to send us the code you're using to iterate through the files and get the timestamp, we may be able to offer advice about how to improve performance. Feel free to send that to support directly if you wish.

In reply to by aclight

Hi aclight,

Thanks for your advice/reply.

here is the code I am using now (MXP_GetNewestCreatedFileInPathTree is the function I want to optimise)

Function MXP_LoadNewestFileInPathTreeAndDisplay(string extension)
    /// Load the last file found in the directory tree with root at pathName
    string latestfile = ""
    variable latestctime = 0
    string filepathStr = MXP_GetNewestCreatedFileInPathTree("pMXP_LoadFilesBeamtimeIgorPath", extension, latestfile, latestctime, 1, 0)
    WAVE wRef = MXP_WAVELoadSingleDATFile(filepathStr, "")
    MXP_DisplayImage(wRef)
    print "File loaded: ", filepathStr
    return 0
End

Function/S MXP_GetNewestCreatedFileInPathTree(string pathName,
           string extension, string &latestfile, variable &latestctime,
           variable recurse, variable level)
    //  MXP_GetNewestCreatedFileInPathTree is a modified WM code of
    //  PrintFoldersAndFiles(pathName, extension, recurse, level)
    //  It recursively finds all files in a folder and subfolders looking for
    //  the creation date of each file, catching the newest one.
    //  pathName is the name of an Igor symbolic path that you created
    //  using NewPath or the Misc->New Path menu item.
    //  extension is a file name extension like ".txt" or "????" for all files.
    //  recurse is 1 to recurse or 0 to list just the top-level folder.
    //  level is the recursion level - pass 0 when calling MXP_GetNewestCreatedFileInPathTree.
    //  latestfile and latestctime are called by reference as the recursive function call would
    //  reset pass-by-value arguments. We could alternatively use SVAR and NVAR.
    /// DO NOT CALL THE FUNCTION DIRECTLY.

    PathInfo $pathName
    string path = S_path   
    if(!V_flag) // If path not defined
        print "pMXP_LoadFilesBeamtimeIgorPath is not set!"
        path = MXP_SetOrResetBeamtimeRootFolder()
    endif
   
    // Reset or make the string variable
    variable folderIndex, fileIndex

    // Add files
    fileIndex = 0
    do
        string fileName
        fileName = IndexedFile($pathName, fileIndex, extension)
        if (strlen(fileName) == 0)
            break
        endif
           
        GetFileFolderInfo/Z/Q (path + fileName)
       
        if(V_creationDate > latestctime)
            latestfile = (path + fileName)
            latestctime = V_creationDate
        endif
   
        fileIndex += 1
    while(1)
   
    if (recurse)        // Do we want to go into subfolder?
        folderIndex = 0
        do
            path = IndexedDir($pathName, folderIndex, 1)
            if (strlen(path) == 0)
                break   // No more folders
            endif

            string subFolderPathName = "tempPrintFoldersPath_" + num2istr(level+1)
           
            // Now we get the path to the new parent folder
            string subFolderPath
            subFolderPath = path
           
            NewPath/Q/O $subFolderPathName, subFolderPath
            MXP_GetNewestCreatedFileInPathTree(subFolderPathName, extension, latestfile, latestctime, recurse, level+1)
            KillPath/Z $subFolderPathName
           
            folderIndex += 1
        while(1)
    endif
    return latestfile
End

I will try to apply your recommendations and see how much I can improve. I will report back.

All the best.

Hi aclight,

I changed the first do-while loop to:

 

    fileIndex = 0
    string fileNames = IndexedFile($pathName, -1, extension)
    do
        string fileName
        filename = StringFromList(fileIndex, fileNames)
        if (strlen(fileName) == 0)
            break
        endif
           
        GetFileFolderInfo/P=$pathName/Z/Q fileName
       
        if(V_creationDate > latestctime)
            latestfile = (path + fileName)
            latestctime = V_creationDate
        endif
   
        fileIndex += 1
    while(1)

 

and tested in another folder structure:

1. Previous code: 6.44 sec

2. New code (this post) 4.97 sec

3. Call Python 0.14 sec

I got a 20% improvement.

If I try to do the same change in the second do-while loop things get slightly slower.

So, I guess that the best i can do?

Cheers,

eg

 

I'm surprised that change didn't have a bigger effect on performance.

Can you reproduce the problem in a single folder with a lot of files (not doing a recursive search)? If so, please call IndexedFile($pathName, -1, "????") and send me the output of that (either here or through support). You can save it in an Igor experiment or in a text file, whatever is easier for you. Then please also tell me the value of the extension string when you are running this code.

I will use the list of all files to create a test directory that contains the same file names as on your system, and then use your code to determine where the bottlenecks are.

If you can't do that, you could try using Igor's function profiling procedure. To use this, add the following include statement to the main procedure window then compile procedures.

#include <FunctionProfiling>

Then select the Windows->Procedures->FunctionProfiling.ipf menu item and read the comments for instructions.

I primarily want for you to confirm that most of the time is spent in GetFileFolderInfo rather than in IndexedFile.

One other important question--are the files on a local drive or a network/shared drive? That could make a big difference in the performance.

Please also provide your OS and version and the version of Igor you are using.

Another idea--if you're using IP9, try adding the /UTC flag with GetFileFolderInfo. That will avoid converting the time from UTC to the local time, which might speed things up a little.

So, I produced a folder with roughly the same number of files as in the folders. The command now takes 2.5 sec to execute.

I use:

    variable timerRefNum = StartMSTimer
    /// Load the last file found in the directory tree with root at pathName
    string latestfile = ""
    variable latestctime = 0
    string filepathStr = MXP_GetNewestCreatedFileInPathTree("pMXP_LoadFilesBeamtimeIgorPath", "????", latestfile, latestctime, 1, 0)
    variable microSeconds = StopMSTimer(timerRefNum)
    print "Time elapsed: ", microSeconds/1e6, " sec"

It's lots. The extention string I use is ".dat".

I attach a .txt with the filenames.

I use Igor9 and adding /UTC has no effect.

Thanks

FilenamesWM.txt

FWIW, on my machine (with a fast Nvme drive), running your code with all of your file names takes about 0.6 seconds.

In any case, I used a profiler to see where time is spent and it looks like a substantial amount of time is spent asking the OS for the information for the S_creator output variable. I'll report this internally and see if there's something we can do to avoid this. Thanks for providing the info needed to reproduce the problem.

I looked over the current code for GetFileFolderInfo and I don't see a way for you to call the operation in a way that skips the code that is particularly slow. Adding your suggested feature is probably something that would need to wait for IP10.

If you can rely on Python being present then that might be the way to go.

A possible alternative would be to do something like this:

start = StopMSTimer(-2); ExecuteScriptText/B "cmd /c dir /O-D /TC /B /R c:\Windows\system32\*.dll"; print (Stopmstimer(-2)-start)/1e6, ItemsInList(S_value, "\r")

In this case that gives you a directory listing of all .dll files in the given directory, sorted by creation date (/TC), with the most recently created file first. So you would just need the first line of the output of this command for every directory. Then you would need to call GetFileFolderInfo on those files to get the actual creation timestamp to determine which one was created first.

This may or may not be significantly faster than calling GetFileFolderInfo on every file in every folder. On my Windows machine, the line above takes 0.31 seconds to execute and finds 3638 files. As a comparison, the test based on your code and file names took about 0.6 seconds for around 1600 files.

Someone with more command line experience might be able to put together a command that would give you just the most recent file in a directory.