Wave / Variable Access in Thread Groups

Hello everybody

I am stuck with some code and need help:
I am running a lot (~2500) of simple fits on a lot of data sets (~3000 with 2500 points each). The fitfunction is always the same and I only need one coefficient. Unfortunately one fit depends on the one before. The data sets are independent. A normal version takes about 6 hours to run (which is 7 sec for one data set). Processor load is about 18% on an i7 8-core system. The bottle neck is identified as the many calls to funcfit.
The typical approach would be to parallelize along the data sets.

Here is my struggle:
-- The fitfunction takes a while to compute (and is normalized anyway, I only need the amplitude coefficient). It is about 2 times faster to compute it before and then just use the data from the computed wave (Still some improvement if I sacrifice it).
-- The fitfunction might be measured data, hence there is no way to compute it.

Is there a way to provide the same wave to a lot of threats without copying it all the time?
I could get around the computed ("$") wave reference by introducing more fitfunctions.


Threadsafe Function DeconvolutionFuncPCOdd(wc,wy,wx):Fitfunc
Wave wc, wy, wx
Wave DCO=$"root:NAC:Experiment:FitDeconvolutionOdd"+SelectString(wc[4],"","Fitted")
	wy=wc[0]+wc[1]*DCO(x-wc[2])
End


The other relevent fragments are (and ThreadGroupPutDF does not work the way I hoped it would work)

		NVar Period=root:NAC:Experiment:ChopperPeriod, Win=root:NAC:Machine:DeconvolutionWindow
		NewDataFolder root:NAC:MT
		Duplicate /O root:NAC:Experiment:FitDeconvolutionOdd root:NAC:MT:FitDeconvolutionOdd
		Duplicate /O root:NAC:Experiment:FitDeconvolutionEven root:NAC:MT:FitDeconvolutionEven
		Duplicate /O root:NAC:Experiment:FitDeconvolutionOddFitted root:NAC:MT:FitDeconvolutionOddFitted
		Duplicate /O root:NAC:Experiment:FitDeconvolutionEvenFitted root:NAC:MT:FitDeconvolutionEvenFitted
		Variable /G root:NAC:MT:Chopper=Period, root:NAC:MT:DeconvolutionWindow=Win
		Variable TID, NThr=1
		TID=ThreadGroupCreate(NThr)
		ThreadGroupPutDF TID, root:NAC:MT
		For (i=0;i<DimSize(DeconPrep,1)-1;i+=NThr)
			For (j=0;j<NThr;j+=1)
				If (i+j < DimSize(DeconPrep,1)-1)
					If (!DeconFlag[i])
						ThreadStart TID, j, DeconvolutePulseMT(DeconPrep, i, Deconvolution, SelectString(Mod(i,2), "Even", "Odd"), Fitted)
						//DeconvolutePulseMT(DeconPrep, i, Deconvolution, SelectString(Mod(i,2), "Even", "Odd"), Fitted)
					EndIf
				EndIf
			EndFor
			Do
			While (ThreadGroupWait(TID, 50)!=0)
			ProgressValue=i
			DoUpdate /W=NAC_Control			
		EndFor
		TID=ThreadGroupRelease(TID)


and (the NVARs can be passed as variables to the function)

ThreadSafe Static Function DeconvolutePulseMT(Data, Index, Result, Parity, Fitted) 
Wave Data
Variable Index
Wave Result
String Parity
Variable Fitted
NVar Win=root:DeconvolutionWindow, Period=root:ChopperPeriod
//Variable Win=0.025, Period=2
Variable YOffset=0
Variable Step, Shift
	Make /FREE /N=4 /O FitCoef 
	Make /FREE /N=(DimSize(Data,0)) Process
	SetScale /P x, DimOffset(Data,0), DimDelta(Data,0), "", Process
	Duplicate /FREE Process Diff, Int, Subtr
	Process=Data[p][Index]
	Step=DimDelta(Data,0)
	Diff=0
	Int=0
	Subtr=0
	SetScale /P x, 0, Step, "", Diff, Int
	For (Shift=0;Shift<Period+9*Win;Shift+=Step)
		FitCoef={0, 1, 0, Fitted}  //  Offset, Amp, Shift
		SetScale /P x, -Shift, Step, "", Process, Subtr
		FuncFit /N /NTHR=1 /Q /W=2 /H="1011" $"DeconvolutionFuncPC"+Parity FitCoef Process(-Win,+Win)
		StrSwitch(Parity)
			Case "Odd":
				DeconvolutionFuncPCOdd(FitCoef,Subtr,Subtr)
				Break
			Case "Even":
				DeconvolutionFuncPCEven(FitCoef,Subtr,Subtr)
				Break
		EndSwitch
		Process-=Subtr[p]
		Diff[X2Pnt(Diff,Shift)]+=FitCoef[1]
	EndFor
	Integrate /P Diff /D=Int
	SetScale /P x, -5*Win, Step, "", Int
	Result[][Index]=Int[p]
	Killwaves Process, Int, Diff, Subtr
	Return NoError
End


Maybe I can pass the fitting wave as an x-wave to the fitfunction, but I don't like these botched constructs.

The goal is to reconstruct the energy input into a detector by measuring a response function and the actual experiment.
The data is too noisy to perform this task with Fourier transformations (and I'm a little bit sad about this).

I know that shared data in a multithread environment is dangerous and usually should be avoided, but since this is read only only access it should be safe.

Long post: potato available on request.

Thanks a lot,
Hans J Drescher
To my knowledge every thread has always its own datafolder hiearachy. So there is no way to share data between them.

But couldn't you move the datasets just around?
So you partition the number of datasets and let each work thread do all the fits on one dataset and then the next dataset, and so on.

So the steps would be something like:
- Partition data sets
- Start worker threads
- Let worker threads initialize themselves. The idea would be that you move all one time initialization, especially the
Make
parts at the beginning and do that only once before the very first fit.
- Start the inner loop in the worker thread which waits for input data in the queue. As soon as it gets data it will perform all 2500 fits on one dataset.
- Start sending out datafolders with one dataset at a time at the threads. You can move the datafolders with the data if you remember that the worker threads return them.
- Collect the results from the output queues of the worker threads.
Thank you!

I was hoping that I could avoid queues.
I'm implementing it at the moment, when I am done I'll share my results.
HJ