Require (many) functions to use their own data folder, release when done

We have written a fairly extensive package to run one of our instruments. There are many (~200) functions in this package. They use many waves and global variables that are kept in a data folder called, let's say, scanFolder. The initialization function sets this folder as the default so that all the other functions use those waves and variables.

The issue arises when, in the same experiment, I want to bring in some of the data and analyze it. I don't want this data to be loaded into scanFolder, which should be reserved only for the instruments waves and variables. I could in principle change the DF by hand each time I want to bring in some data, but I usually forget to do so, or forget to change it back to scanFolder before running the instrument again.

I have a couple of ideas on how to fix this, non particularly appealing given the large number of folders:

1) Prepend every wave and variable used in the functions by an absolute path, e.g., root:scanFolder:

2) Start every function (or at least many of them) by saving the current DF then switching to scanFolder, then, at the function's end, switch back to the saved folder.

These are rather brute-force approaches. Does anyone else have a better idea? Is there some magical way to tag a group of functions to use a particular data folder?

First, I wonder if these many waves and global variables are really necessary. For example, you do not need to refer to global variables if you do not want to save something for a longer time and only if you absolutely find no other way to share values between functions. Maybe, if you are working with a panel, you can also store the values in the panel controls itself, which avoids a zoo of global variables just for this purpose.

Then, I urge you to use approach 1 instead of 2. 2 will bite you sooner or later, since you will lose track of all the ways a function can exit without setting the folder back. You may use it in some cases (such as you initialize function) which are very simple in structure and create / work with a lot of data from that folder. In the case of 1, it is very simple and does not require much change to your code. Simply add:

DFREF sF = root:scanFolder

and then prepend sF: to each wave / global variable declaration, e.g.,:

Wave status = sF:status_values

As far as I know there are no easy solutions for this.

I use 1 always and add 2 for anything longer than few lines... It seems brute force solution, but it is useful and may be necessary. Also, it is not that much work, you get used to it quickly... 

NOTE1: Adding 2 protects you against programmer sloppiness and it makes sure any Duplicate, Make, variable/g, etc. creates object where you expect it.       

NOTE2: the worst thing you can do is write a function which expects to be in a specific folder. This is guarantee to fail, eventually... If you need to be in specific folder, you must set it yourself within that function. 

As an alternative to the approach from @chozo, you can also do this ...

DFREF sf = root:scanFolder

...

wave/SDFR=sF status = status_values

 

That's right, JJ. I forgot about this one. It might be shorter and more convenient in some cases, e.g., if you use the wave name as is in the function: Wave/SDFR=sf status_values

Another thing or two:

- you may use a global constant to declare your target folder, so that you make sure it's the same for all functions and also easily changeable. But this places the burden on the initialize function to properly crease this folder. I use the following approach (which works for arbitrarily nested path structure):

static StrConstant kWorkingDir = "root:Packages:scanFolder:"

static Function init()
    ...
    int i
    if (!DataFolderExists(kWorkingDir))     // create base dir
        DFREF saveDF = GetDataFolderDFR()
        SetDataFolder root:     // root is part of the path and is omitted below
        for (i = 1; i < ItemsInList(kWorkingDir,":"); i++)
            NewDataFolder/O/S $StringFromList(i,kWorkingDir,":")
        endfor
        SetDataFolder saveDF
    endif
    ...
End

Function someOtherFunction()
    DFREF workDF = $kWorkingDir
    ...
End

- You can also just use the working folder as an input to your function, e.g., myFunc(DFREF workDF). This way the function is not dependent on a specific hardcoded path within the function. You might want to check whether the folder actually exist at start, though.

Another variant on this strategy is described here:

DisplayHelpTopic "Managing Package Data"

If you want to avoid repeatedly looking up your many waves and variables, structures can be helpful and provide a neat solution: write one function to put object references in a structure, then pass that structure between functions.

My usual approach is the following:

- Use wave getter functions like

Function/WAVE GetDataWave(...)

End

This function returns a wave reference and the parameters determine where to fetch it from. Now every function which wants to use that wave uses this wave getter. It also creates the wave if there is none, so every caller gets a valid wave reference. This makes the call sites much easier.

Advantages:
- Makes it easy to get all functions using that wave by searching for GetDataWave
- Encapsulates the knowledge where exactly the wave is located.
- Noone needs to know where the wave is located.

- Use datafolder getters

Function/DF GetDataDFR(...)

End

Same idea as wave getters just for datafolders.
- Write your code without SetDataFolder as much as possible. There are only very few very esoteric reasons for using SetDataFolder.
- This approach can be adapted for global strings and variables, which should be used sparringly if possible.
- Temporary waves and datafolders should be always free

Some more general points are mentioned in https://alleninstitute.github.io/MIES/mies-concepts.html, but that might be a bit steep reading.

The idea from tony using structs also totally makes sense. You can also emulate a class-like OOP approach where the struct is the object data and the functions are the class functions.

I like the idea of using something like

DFREF sf = root:scanFolder

But do I need to put this in every function where it's needed? Or is there a way to have a global data folder reference?

 

You don't want a global here. Globals are hard to keep in synch with what you're doing. Best to use globals ONLY for things that must persist, like saving a preference. Yes, put that in every function.

One way to approach that would be to have a DFREF input parameter to your functions. Then the top-level one would determine what data folder to work from, and pass the DFREF down through the call chain.

Yes, you would need the DFREF declaration or something like ...

DFREF sf = getWorkFolder()

with the approach Thomas has outlined in every function. Or as mentioned above, you use the data folder reference as input to most of your functions, and provide sf only to your master function which calls everything. It is really not that problematic if you get used to it, and you can have multiple folders and references as needed.

And as mentioned multiple times, you might want to consider stepping away from global variables as much as possible to make your life easier in the end. I think a limited number of global constants, like the one I use in the example further above to specify the folder is fine. This helps to separate important but hardcoded parameters from the jungle of your code.