Diagnostic Tools in motofit

I am trying to characterize the fits I am doing with motofit and am interested in the following diagnostics:

1. plot of chi2 error vs. # of trials

Unsure of how to do this, but I think I need to concatenate the chi2 value onto a wave during the evaluation loop and then display this wave. Is there an easier way to do this? Chi2 is displayed in the progress window, so that value is in memory somewhere and should be grabable during execution. My first attempt to do this unfortunately failed.

2. Plot of Coefficient values vs. # of trials

This one I have been able to do somewhat, utilizing /DUMP, which creates the matrix M_gencurvefitpopdump[n][*] where n is the coefficient number. (see attached image) However, there seems to be some problem here, as should I "hold" any variable during the fit, the output matrix is offset in such a way that I get something resembling a plaid shirt on acid (see second attached image). The values are still there, they are just not entered into the matrix in an orderly way. As a work-around, I have chosen to set the range of "Held" parameters to zero, for example varying a parameter between 200 and 200 results in a value of 200 being selected for every trial. I would prefer not to do this, however, as it is slower and clunkier in terms of UI.

The gencurvefit help file indicates the following about /DUMP:

/DUMP At the end of every iteration Gencurvefit will dump the genetic population of the parameters that vary, into a wave called M_gencurvefitpopdump. The wave will be 3D, having dimensions [V_nterms][V_nterms*popsize][]. The number of layers represent the output after each iteration. The first layer is that of the initial guess population and the last layer is that of the population when the fit terminates

However, the matrix is actually 2D and as far as I can tell contains the best (or average?) values of the population after a certain number of generations... which is why my graph was so easy to do in the first place. Did I miss something here? Also, I am able to do thousands of iterations, yet the number of points is much smaller, around 100 or so.

Thanks in advance!
1) The /DUMP flag only outputs to M_gencurvefitpopdump when the fit improves (so help file is wrong), so the dimensions of this wave will be smaller than the number of trials.
However, I think there may be a bug in the /DUMP flag, not a deal breaker in terms of fitting correctness but probably making that matrix wave incorrect and 'plaidlike'. I wil put an updated version on the web in a couple of hours.

2) Plotting Chi2 vs # of trials is not currently possible from one operation of Gencurvefit. What you will have to do is call Gencurvefit in a loop. You should use the /SEED and /K flags to set the seed and number of iterations:

variable ii
for(ii = 1 ; ii < 1001 ; ii += 1)
gencurvefit /SEED=2/K={ii, 10, 0.7,0.5} etc
print "Number of iterations: ", ii, " Chisq: ", V_chisq
print parameterWave
endfor

1. This plaid effect only happens when you click "hold" for a coefficient in the global fit dialog.
2. Hmm.... It seemed easier to me when I first looked at this.

as always, thanks andy
mtaylor wrote:
1. This plaid effect only happens when you click "hold" for a coefficient in the global fit dialog.

as always, thanks andy


No worries. The plaid effect no longer happens. There was a bug in the code for the /DUMP flag. It's now fixed and available for download.
Here is some example code for a gaussian fit.

Function mygauss(w,x):fitfunc
    Wave w
    variable x
    return w[0]+w[1]*exp(-((w[2]-x)/w[3])^2)
End

Function test()
    variable ii, iters = 1000
    Make/O/N=100 GDataX, GDataY                 // waves for data
    GDataX = enoise(10)                             // Random X values
    GDataY = 3*exp(-((GDataX-2)/2)^2) + gnoise(0.3)     // Gaussian plus noise

    Make/O/D limits ={{-1,0,-10, 0},{1,4,10,3}}     // make the lower and upper limits
    Make/O/D coefs = {0,2,1,2}                          // make the initial coefficients

    make/n=(iters)/d/o Chi2prog = 0
    make/n=(4, iters)/o pars

    for(ii = 0 ; ii < iters ; ii += 1)
        Gencurvefit/q/n/K={ii+1, 10, 0.7, 0.5}/SEED=10/X=GDataX mygauss,GDataY,coefs,"0000",limits      // do the fit
        Chi2prog[ii] = V_Chisq
        pars[][ii] = coefs[p]
        print ii
    endfor
End
Many thanks Andy

I have been able to implement everything I wanted thanks to your code example, but I have lost some of the convenience of the global fit panel, most notably the ability to link coefficients and fit multiple functions simultaneously (I assume this is accomplished using a structure fit, but I haven't been successful in getting that going). Because I am fitting complex data (a+b*i), I really need this functionality.

I was also unable to find an option or flag listed in the help file for specifying log spaced data (which is needed to produce proper graphs for the fit function vs. empirical data). Is there a flag for this? Global fit has a check-box, but it is possible that global fit is scaling the wave before feeding it to gencurvefit for all I know.
mtaylor wrote:
Many thanks Andy

I have been able to implement everything I wanted thanks to your code example, but I have lost some of the convenience of the global fit panel, most notably the ability to link coefficients and fit multiple functions simultaneously (I assume this is accomplished using a structure fit, but I haven't been successful in getting that going). Because I am fitting complex data (a+b*i), I really need this functionality.

I was also unable to find an option or flag listed in the help file for specifying log spaced data (which is needed to produce proper graphs for the fit function vs. empirical data). Is there a flag for this? Global fit has a check-box, but it is possible that global fit is scaling the wave before feeding it to gencurvefit for all I know.


Perhaps I've got the wrong end of the stick, but the whole point of Globalfit is that you can link and fit multiple functions simultaneously. If it's not doing that, then it's important to know.

There is no way to specify log spaced fit data in the output of Gencurvefit, you would have to do that manually.

The version I wrote is a modified copy of the WM code, but I don't update it as frequently as they do.
ok, I can groom my frequency data into log-frequency space, that's trivial to do. However, simultaneous fitting is not so easy to do. Dare I modify motofit_globalfit2 to include the changes you posted here? I am actually quite pleased with the tool I have now as it's a bit less cluttered for my purposes than the global fit dialog (thanks to your help), the only thing lacking is the ability to combine functions as in global fit.
I noticed an issue with this method. The number of operations is growing with the generation number. Is it possible to avoid this, or would that require fundamentally changing gencurvefit?
mtaylor wrote:
I noticed an issue with this method. The number of operations is growing with the generation number. Is it possible to avoid this, or would that require fundamentally changing gencurvefit?


It's not clear what you mean. What operations?
each time gencurvefit is called in the for loop, the number of iterations increases, increasing the number of function evaluations. So the first time it is called, 1 iteration is performed, the second time, 2, and so on, so it is really slowing down the overall fit. If I want to fit over 100 iterations, for instance, this for loop will call gencurvefit for a total of 5050 iterations (1+2+3+4+...+98+99+100), so the fit will take (nominally) 50 times longer. I can see why this is done, but I would prefer it if I can get the diagnostic information without sacrificing so much speed.
You don't need to do such high sampling then, but there is no other way of doing what you want.
Would it be possible to modify the xop such that /dump stores values into the matrix every generation, instead of the current behavior? That would give the best of both worlds.
mtaylor wrote:
Would it be possible to modify the xop such that /dump stores values into the matrix every generation, instead of the current behavior? That would give the best of both worlds.


Whilst it would always be possible in coding terms, the current design was intentional. I use the /DUMP flag, with a monte carlo tempering scheme, to investigate whether the code can be used to investigate the posterior probability distribution of the parameter vectors.
Andy,

I am trying to make some changes to the xop, but cannot compile it successfully. The source directory seems to be missing an included file, gencurvefit.h. Am I missing something here? Also, I am using version 6 of the XOP toolkit; I don't know if that makes any difference.
Many thanks to andyfaff, gencurvefit now supports calling an update function, which can access the current population's data. Very, very cool!