comparing chi square values

Hi,

I'm fitting a Gaussian function to a number of different curves. These curves may have quite different maximum amplitudes; they may be quite different "sizes." I'd like to compare the fit quality for the different curves. The V_chisq values that are returned will depend on the relative size of the curves, which means that I need some way to standardize them that will allow me to compare. Is this what the weighting wave is for? I don't quite understand how to implement that. Just to explain a little more, consider just the maxima in two curves. Assume curve A's max=100 and curve B's max=10. Also assume that the fits of A and B have maxima of 90 and 9, respectively. Then for curve A, V_chisq=(y_hat-y)^2=(90-100)^2=100, and for curve B, V_chisq=(y_hat-y)^2=(9-10)^2=1. The V_chisq for curve B is much less, so it appears to be a better fit, but, of course, it is not. Each fit is equally good. How do you set up the weighting to account for this?

Thanks,
Tim
The weighting wave is expected to be your best estimate of the measurement uncertainty, expressed as the standard deviation of a normal distribution. But if your two gaussian curves are actually equivalent, with equivalent measurement noise, then the weighting wave would do what you want- it will normalize the chi-square value by the "size" of your curve because the measurement noise is proportional to that size.

This may seem pedantic, but it is good to understand what you are saying when you apply a weighting wave.

If it is the amplitude of the curve that is different, then by the same kind of argument, you could justify scaling the chi-square value by the amplitude before the comparison.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Thanks, John. It doesn't seem pedantic. In fact, perhaps not pedantic enough because I still don't quite see how to proceed. My Gaussian curves should have equivalent measurement noise as they were measured by the same instrument. What exactly do I use for a weighting wave? I don't know how I would get a standard deviation of a normal distribution here. Again, imagine two Gaussian blips. I want to be able to compare them. Where does a normal distribution come into the picture?

In response to your last sentence, it is not simply the amplitude that is different for different curves. They are going to be subtly different in various ways. They are from actual measured data, so they are not exactly Gaussian. Nevertheless, my previous example about 100 amplitude vs. 10, is relevant. Some peaks are going to be a lot bigger than others. I want to compare the relative quality of each curve's Gaussian fit.

Thanks,
Tim

johnweeks wrote:
The weighting wave is expected to be your best estimate of the measurement uncertainty, expressed as the standard deviation of a normal distribution. But if your two gaussian curves are actually equivalent, with equivalent measurement noise, then the weighting wave would do what you want- it will normalize the chi-square value by the "size" of your curve because the measurement noise is proportional to that size.

This may seem pedantic, but it is good to understand what you are saying when you apply a weighting wave.

If it is the amplitude of the curve that is different, then by the same kind of argument, you could justify scaling the chi-square value by the amplitude before the comparison.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

tdgordon wrote:
Thanks, John. It doesn't seem pedantic. In fact, perhaps not pedantic enough because I still don't quite see how to proceed.


It seems to me that, all else being equal, the two parameters of interest are the relative accuracy of the fit and the relative uncertainty (precision) of the fit.

* Relative Accuracy = (A_fit - A_theory)/A_theory
* Relative Precision = \Delta A_fit / A_fit

As I understand, a properly-weighted chi-squared value is a representation of how well the data fits the model. When the relative noise level (e.g. inverse S/N ratio) is the same AND the model is the same AND the number of points are the same, the properly weighted chi-squared value should be the same. So, the only thing learned by the chi-squared is how noisy your data are relative to the perfect model.

A more important aspect of this is to assure that you are keeping the S/N ratio the same. When you lower the amplitude from 100 to 10, are you scaling the noise by the same factor? If not, then chi-squared will be different in fitting the two cases. My (hopefully correct) guess is, when you drop the signal by 10 and keep the noise the same, the appropriate re-scaling of chi-squared is a factor of 10^2 = 100. I base this off the fact that you will have to increase the dwell time per channel by a factor of 100 when your signal drops by a factor of 10 (since S/N goes as dwell time squared).

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
Thanks for your reply, Jeff. It seems that I'm not making my question clear. Let me try again. Imagine an oscilloscope that measures some voltage blip. These blips are Gaussian to a reasonable approximation. The blip shape and size will vary due to various physical factors that I have no control over. For example, the amplitude of the blip will depend on how big of a particle that is detected in my instrument. I can't control this. All that I do is collect voltage vs. time. I get one file per blip. I load it in Igor and get a time wave and a voltage wave. I fit a Gaussian to it. I get a chi sq. Then repeat this with blip #2. I want to compare chi square values for the two blips. How do I do it? I want to know which blip deviates further from the Gaussian shape. Which blip more closely conforms to the Gaussian fit? If I use a weighting wave to make the chi square values comparable, please explain how to do that.

Tim


jjweimer wrote:
tdgordon wrote:
Thanks, John. It doesn't seem pedantic. In fact, perhaps not pedantic enough because I still don't quite see how to proceed.


It seems to me that, all else being equal, the two parameters of interest are the relative accuracy of the fit and the relative uncertainty (precision) of the fit.

* Relative Accuracy = (A_fit - A_theory)/A_theory
* Relative Precision = \Delta A_fit / A_fit

As I understand, a properly-weighted chi-squared value is a representation of how well the data fits the model. When the relative noise level (e.g. inverse S/N ratio) is the same AND the model is the same AND the number of points are the same, the properly weighted chi-squared value should be the same. So, the only thing learned by the chi-squared is how noisy your data are relative to the perfect model.

A more important aspect of this is to assure that you are keeping the S/N ratio the same. When you lower the amplitude from 100 to 10, are you scaling the noise by the same factor? If not, then chi-squared will be different in fitting the two cases. My (hopefully correct) guess is, when you drop the signal by 10 and keep the noise the same, the appropriate re-scaling of chi-squared is a factor of 10^2 = 100. I base this off the fact that you will have to increase the dwell time per channel by a factor of 100 when your signal drops by a factor of 10 (since S/N goes as dwell time squared).

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH

If your signal intensity is truly arbitrary, and noise scales with signal, why not normalize the signal before fitting?

In the example you give, where A=constant*B, sqrt(v_chisq)/w_coef[1] will be the same for both fits. W_coef[1] is the fitted peak height and scales linearly with the signal.

Note that in your example, the fit to curve B IS a better fit to the data, because your model is much closer to your data in signal space. That's how goodness of fit is measured. Least squares minimization doesn't care about the shape of your fitting function, but rather how close it gets to the data.
I have to agree- curve fitting doesn't really tell you the degree to which the shape is similar. The chi-square value tells you the distance between the data points and the model curve, or rather, a sum of the squares of those distances. The squares make sure that all deviations between data and fit sum in a positive way.

It is possible using chi-square values to decide if two different fits to the same data are significantly different, or one significantly better than the other.

You can use the sigmas on the fit coefficients to decide how well each coefficient has been determined, and to decide if a fit to one data set has significantly different coefficient values than another fit

I'm not sure I know a way to measure "how close to Gaussian" a data set is. I'm not sure that's a well-defined concept.

The measurement noise might scale with size- such things often do. But it is just as likely that the noise is simply an ever-present random addition to whatever signal it is that you're measuring. In that case, it's likely to be constant, or nearly so. You might be able to estimate the noise from periods between "blips".

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
OK. I understand better. You could include some other peak shape factor (e.g. Lorenztian), to obtain a different metric for the meaning of "closer to Gaussian" for one fit or the other. Otherwise, you will only have a properly normalized chi squared that will tell you this: one data blip is better or worse than the other at following your model. If this is what you mean by "... which blip deviates further from the Gaussian shape", then chi squared will tell you what you need.

I would again say that, chi squared will depend quadratically on 1/(S/N)^2. In the limit that S/N -> infinity, chi squared will go to zero (perfect fit of the data to the model). In the limit S/N -> 0, chi squared will be the largest (infinity). The suggestion that you normalize the peaks in height is a reasonable way to see this in practice. The noise will eventually dominate as the amplitude of the blip decreases. At some point, the data will be just as good to fit a random peak shape as they are to fit a Gaussian. The chi squared will fall apart (head sporadically to infinity).

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
As I understand it, one is looking for some sort of 'figure of merit' for how well a set of data follows a particular shape of curve.
The way I judge this is by looking at the residual of the fit. If the residual looks wavy then the shapes of the curve are mismatched. If the residual is a 'random' distribution of points either side of zero then the data is consistent with the fit.

I have come up with a rough and ready way of attempting to measure this (example code given below). It simply runs a smooth over the residuals and then calculates the RMS deviation of this. The idea being that data that is randomly distributed around zero will smooth towards zero more efficiently than data where there are many consecutive points offset in the same direction. So the 'deviation parameter' is small for consistent shapes, and increases as the shapes mismatch more. You can play with the smooth width, which is a parameter of this method.

I have no idea about the rigour of this, but it intuitively makes sense.

Here is some code to illustrate this idea (hopefully it is self-explanatory):
Function test()
   
    Variable vRMS
    Variable vSmth=7

    // True Gaussian with noise
    Make/O/D/N=(100) wGauss
    wGauss[] = 10000 * StatsNormalPDF(x, 50, 10)
    wGauss[] = wGauss[p] + gnoise(sqrt(wGauss[p]))
    // Plot and Fit Gaussian
    PlotAndFit(wGauss)
    // Smooth Residual and calc RMS deviation
    Wave wResData = $("Res_wGauss")
    vRMS=CalcDeviationParameter(wResData, vSmth)
    TextBox/C/N=text0/F=0/A=MC "Deviation Parameter ("+num2str(vSmth)+") = "+num2str(vRMS)
   
    // LogNormal
    Make/O/D/N=(100) wLogNorm
    wLogNorm[] = 8000 * StatsLogNormalPDF(x, 0.2, -5, 55)
    wLogNorm[] = wLogNorm[p] + gnoise(sqrt(wLogNorm[p]))
    // Plot and wLogNorm Gaussian
    PlotAndFit(wLogNorm)
    // Smooth Residual and calc RMS deviation
    Wave wResData = $("Res_wLogNorm")
    vRMS=CalcDeviationParameter(wResData, vSmth)
    TextBox/C/N=text0/F=0/A=MC "Deviation Parameter ("+num2str(vSmth)+") = "+num2str(vRMS)
End

Function PlotAndFit(wData)
    wave wData
    Display wData
    ModifyGraph rgb=(0,15872,65280), mode=3, marker=19, msize=0.5
    CurveFit/NTHR=0 gauss  wData /D /R
    String sResName = "Res_" + NameOfWave(wData)
    Wave wRes_Data = $sResName
    ModifyGraph zero(Res_Left)=1, mode($sResName)=4
    SetAxis/A/E=2 Res_Left
End
Function CalcDeviationParameter(wResData, vSmth)
    Wave wResData
    Variable vSmth
   
    String sSmthName = NameOFWave(wResData) + "_smth"
    Duplicate/O wResData,$sSmthName
    Wave wResSmth = $sSmthName
    Smooth vSmth, wResSmth
    WaveStats/Q wResSmth
    Print "Deviation Parameter(" + sSmthName + "): ", V_rms
    return(V_rms)
End


Hope this helps,
Kurt
Thanks to all. I thought it might help to clarify by showing some actual plots and explaining. See 3 plots in attachment. Top one has the best agreement with the Gaussian fit and chi sq=0.1. Middle one appears to be a poorer fit than top, but it's chi sq is lower=0.08. This reversal of chi sq ordering is, I presume, due to the fact that the top curve also has a higher amplitude than the middle and, therefore, the absolute deviation between data and fit is worse even if the relative deviation is less. The bottom curve is obviously the worst fit to a Gaussian. It's chi sq is largest, but again, the amplitude is a bit different than the other two, so I assume that the chi sq should be scaled to make the comparison with other two a fair one.

So how to scale the chi sq values so that all three fit qualities can be ascertained?
compare Gaussians.png
It looks to me like what you have is overlapping Gaussian peaks. The bottom one has at least three particles in the plot- a big one with smaller ones on either side.

Your noise level is relatively low; it might be that you could smooth and look at derivatives to decide if the shape is more than one overlapped peak. I think that Multipeak Fit, with suitable tweaking of the auto-detection parameters, would find three peaks in the bottom plot.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Very perceptive. That is exactly what we have in some cases--several particles that are close together. In other cases, however, what we have is a single particle that produces a signal that looks like there are 3 peaks when there is really only 1; this artifact happens when the instrument is not aligned properly. In the 3 plots I shared, each panel is a single particle that LOOKS like multiple particles. This is the reason why I need to quantify how much the data diverge from Gaussian form: the amount of deviation determines how far out of alignment the instrument is and we need to set a spec for how far out of alignment is acceptable.


johnweeks wrote:
It looks to me like what you have is overlapping Gaussian peaks. The bottom one has at least three particles in the plot- a big one with smaller ones on either side.

Your noise level is relatively low; it might be that you could smooth and look at derivatives to decide if the shape is more than one overlapped peak. I think that Multipeak Fit, with suitable tweaking of the auto-detection parameters, would find three peaks in the bottom plot.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Are the side peaks always in the same place for a peak of a given width? Does an out-of-alignment instrument always have side peaks on both sides? One approach might be to fit a Gaussian, then a sum of three Gaussians. The amplitude of the side peaks seems like exactly what you're after. In a perfectly aligned instrument, I take it that the side peaks are absent, at least within noise. In that case, the three-peak fit may fail with a singular matrix error, or the side peaks may fit some noise feature. Fitting noise can probably be distinguished by side peak locations that are unreasonable, or side peak widths that are unreasonable.

Are the side peaks always the same width as the main peak? That's what it looks like in your plots. If that is the case, the three-peak fit can be simplified, using just one coefficient for the peak width. My guess is also that for a given width, the side peaks are at a fixed offset from the main peak. That would guide the initial guesses for the three-peak fit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Thanks, John. The short answer is that I don't know, but I would guess that the answers to your questions are no.

"Are the side peaks always in the same place for a peak of a given width?" I can't see any physical reason for why they would be, so no. I have seen plots where there is a peak on only one side. Yes, in a perfect instrument there are no side peaks within noise. I don't think side peaks are the same width as the main peak.

Doesn't it seem that something like this but slightly more sophisticated should work: calculate chi sq and then normalize by amplitude. For example, max for data in top panel is 1.152 and max for data in middle panel is 0.896, so multiply chi sq for the middle panel by 1.152/0.896 to compare those two fits. If you do that, you get chi sq for middle = 0.10305, which makes the fit quality look more like it should. I realize this isn't quite right because it doesn't account for width differences (or shape differences, speaking more generally) between the Gaussians in the two panels. But something like that seems like the right metric.

Tim

johnweeks wrote:
Are the side peaks always in the same place for a peak of a given width? Does an out-of-alignment instrument always have side peaks on both sides? One approach might be to fit a Gaussian, then a sum of three Gaussians. The amplitude of the side peaks seems like exactly what you're after. In a perfectly aligned instrument, I take it that the side peaks are absent, at least within noise. In that case, the three-peak fit may fail with a singular matrix error, or the side peaks may fit some noise feature. Fitting noise can probably be distinguished by side peak locations that are unreasonable, or side peak widths that are unreasonable.

Are the side peaks always the same width as the main peak? That's what it looks like in your plots. If that is the case, the three-peak fit can be simplified, using just one coefficient for the peak width. My guess is also that for a given width, the side peaks are at a fixed offset from the main peak. That would guide the initial guesses for the three-peak fit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

For what I see in the spectra, John is on the right track. The spectra for out of tune instruments cannot be characterized by just one parameter. Alternatively, the chi-squared value from a peak fit is not the best, most reliable (and perhaps not even a valid) "one parameter" value to quantify (correlate) the degree of the instrument being out of tune.

I wonder if a first and/or second derivative can be used to find a "one value" correlation characteristic of perfect and out-of-tune instruments. As side peaks form in the blip, the derivatives of that blip will have more zero crossings, the amplitudes of the intermediate derivative "peaks" will vary, and their shapes will be less symmetrical. The integral of the shape will also be different for symmetrical versus asymmetrical blips. The integral seems a bit more difficult to parse for characteristic parameters though. The method of comparing residuals after fitting seems also worth considering.

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
I am late to the party, but perhaps an approach that can give you a single parameter to compare across measurements would be as follows (basically an attempt to put a number on visual inspection of residuals):

- compute the residuals of your fit.
- plot the frequencies of the residuals with some appropriate binning width to resolve the shape (also depends on how 'dense' your data is, I guess).
- fit gaussian's to the frequency plot and compare your favorite 'goodness of fit' parameter.

The idea being that a 'good' choice of model will result in a normal distribution of residual frequencies (deviations due to random noise), whereas an inappropriate choice of function may lead to a non-normal distribution. For example, from an eyeball and guess approach to your third plot, I'd guess that you would get at least a bi-modal looking distribution of residual frequencies which would likely kill your 'goodness of fit' parameter. I'd also guess that this approach would be less sensitive to amplitude of signal. You can try to 'calibrate' the method with known data-sets of aligned and mis-aligned measurements to try and decide on a 'cut-off' value.

Probably pretty simple to write some quick code to implement this.