Wave / Variable Access in Thread Groups

Hello everybody

I am stuck with some code and need help:
I am running a lot (~2500) of simple fits on a lot of data sets (~3000 with 2500 points each). The fitfunction is always the same and I only need one coefficient. Unfortunately one fit depends on the one before. The data sets are independent. A normal version takes about 6 hours to run (which is 7 sec for one data set). Processor load is about 18% on an i7 8-core system. The bottle neck is identified as the many calls to funcfit.
The typical approach would be to parallelize along the data sets.

Here is my struggle:
-- The fitfunction takes a while to compute (and is normalized anyway, I only need the amplitude coefficient). It is about 2 times faster to compute it before and then just use the data from the computed wave (Still some improvement if I sacrifice it).
-- The fitfunction might be measured data, hence there is no way to compute it.

Is there a way to provide the same wave to a lot of threats without copying it all the time?
I could get around the computed ("$") wave reference by introducing more fitfunctions.

Threadsafe Function DeconvolutionFuncPCOdd(wc,wy,wx):Fitfunc
Wave wc, wy, wx
Wave DCO=$"root:NAC:Experiment:FitDeconvolutionOdd"+SelectString(wc[4],"","Fitted")

The other relevent fragments are (and ThreadGroupPutDF does not work the way I hoped it would work)
        NVar Period=root:NAC:Experiment:ChopperPeriod, Win=root:NAC:Machine:DeconvolutionWindow
        NewDataFolder root:NAC:MT
        Duplicate /O root:NAC:Experiment:FitDeconvolutionOdd root:NAC:MT:FitDeconvolutionOdd
        Duplicate /O root:NAC:Experiment:FitDeconvolutionEven root:NAC:MT:FitDeconvolutionEven
        Duplicate /O root:NAC:Experiment:FitDeconvolutionOddFitted root:NAC:MT:FitDeconvolutionOddFitted
        Duplicate /O root:NAC:Experiment:FitDeconvolutionEvenFitted root:NAC:MT:FitDeconvolutionEvenFitted
        Variable /G root:NAC:MT:Chopper=Period, root:NAC:MT:DeconvolutionWindow=Win
        Variable TID, NThr=1
        ThreadGroupPutDF TID, root:NAC:MT
        For (i=0;i<DimSize(DeconPrep,1)-1;i+=NThr)
            For (j=0;j<NThr;j+=1)
                If (i+j < DimSize(DeconPrep,1)-1)
                    If (!DeconFlag[i])
                        ThreadStart TID, j, DeconvolutePulseMT(DeconPrep, i, Deconvolution, SelectString(Mod(i,2), "Even", "Odd"), Fitted)
                        //DeconvolutePulseMT(DeconPrep, i, Deconvolution, SelectString(Mod(i,2), "Even", "Odd"), Fitted)
            While (ThreadGroupWait(TID, 50)!=0)
            DoUpdate /W=NAC_Control        

and (the NVARs can be passed as variables to the function)
ThreadSafe Static Function DeconvolutePulseMT(Data, Index, Result, Parity, Fitted)
Wave Data
Variable Index
Wave Result
String Parity
Variable Fitted
NVar Win=root:DeconvolutionWindow, Period=root:ChopperPeriod
//Variable Win=0.025, Period=2
Variable YOffset=0
Variable Step, Shift
    Make /FREE /N=4 /O FitCoef
    Make /FREE /N=(DimSize(Data,0)) Process
    SetScale /P x, DimOffset(Data,0), DimDelta(Data,0), "", Process
    Duplicate /FREE Process Diff, Int, Subtr
    SetScale /P x, 0, Step, "", Diff, Int
    For (Shift=0;Shift<Period+9*Win;Shift+=Step)
        FitCoef={0, 1, 0, Fitted}  //  Offset, Amp, Shift
        SetScale /P x, -Shift, Step, "", Process, Subtr
        FuncFit /N /NTHR=1 /Q /W=2 /H="1011" $"DeconvolutionFuncPC"+Parity FitCoef Process(-Win,+Win)
            Case "Odd":
            Case "Even":
    Integrate /P Diff /D=Int
    SetScale /P x, -5*Win, Step, "", Int
    Killwaves Process, Int, Diff, Subtr
    Return NoError

Maybe I can pass the fitting wave as an x-wave to the fitfunction, but I don't like these botched constructs.

The goal is to reconstruct the energy input into a detector by measuring a response function and the actual experiment.
The data is too noisy to perform this task with Fourier transformations (and I'm a little bit sad about this).

I know that shared data in a multithread environment is dangerous and usually should be avoided, but since this is read only only access it should be safe.

Long post: potato available on request.

Thanks a lot,
Hans J Drescher
To my knowledge every thread has always its own datafolder hiearachy. So there is no way to share data between them.

But couldn't you move the datasets just around?
So you partition the number of datasets and let each work thread do all the fits on one dataset and then the next dataset, and so on.

So the steps would be something like:
- Partition data sets
- Start worker threads
- Let worker threads initialize themselves. The idea would be that you move all one time initialization, especially the Make parts at the beginning and do that only once before the very first fit.
- Start the inner loop in the worker thread which waits for input data in the queue. As soon as it gets data it will perform all 2500 fits on one dataset.
- Start sending out datafolders with one dataset at a time at the threads. You can move the datafolders with the data if you remember that the worker threads return them.
- Collect the results from the output queues of the worker threads.
Thank you!

I was hoping that I could avoid queues.
I'm implementing it at the moment, when I am done I'll share my results.