Understanding chi square for weighted data

Hello everyone

I have a question regarding chi square, which is given after a fit.

I read here, in the forum, that a good fit is indicated by a chi square value that is not too far from the number of data points, in weighted data. In fact, for a good fit, chi square should be about the number of data points.

This is my question. What if after the fit the value of chi-square is smaller than the number of data points in a fit where the data are weighted? I have this situation where the fit looks good/perfect, and the value of chi-square is about 11 while the number of data point is 63. How to interpret this?

Attached is a picture of the fit

Image

My understanding is that the weightings should be representative of the noise in the data. At first glance, it looks as though your error bars are quite large - if the error bars are supposed to represent +/- one st. dev. then one would expect roughly one third of the data points to lie outside the error bars. If you over estimate the size of the uncertainty (noise), then the 'normalisation' of Chi-Sq using such weightings will cause the chi-sq to be smaller than one would reasonably expect. 

Here is an example to demonstrate the fitting and calculating the 'Reduced - Chi-Sq', which should be around 1 for a good fit, much greater than 1 for a poor fit, and much less than 1 for overfitting.

Make/O/D/N=100 myData, myNoise
SetScale/I x -5,5,"", myData
SetScale/I x -5,5,"", myNoise
myNoise = 0.15 + 0.1 * p / DimSize(myNoise,0)
myData = 2+tanh(x) + gnoise(myNoise)
Display myData
ModifyGraph mode=2,marker=1,rgb=(0,0,65280)
ErrorBars myData Y,wave=(myNoise,myNoise)
CurveFit/NTHR=0/W=0 Sigmoid  myData /W=myNoise /I=1 /D /R
ModifyGraph zero(Res_Left)=1,mode(Res_myData)=1
print V_chisq / (DimSize(myNoise,0) - 4)

https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic

 

Are your error bars standard deviations of multiple samples? The weights should really be the standard error of the mean, that is STD/sqrt(N-1).

> Are your error bars standard deviations of multiple samples? The weights should really be the standard error of the mean, that is STD/sqrt(N-1).

But ... IP 8 gives options for weightings using standard deviation not standard error of the mean? Has this changed in IP 9? Or am I misunderstanding something?

Hmmm... I see in the Curve Fitting help file this statement: "...where σi  is the standard deviation for each data value"

I think I can wiggle out of this on the basis that the standard error IS the estimated standard deviation of the mean, and the data points in the case of averages of multiple measurements are the mean values. So using the standard deviation of the measurements is NOT the standard deviation of the value that's being used for fitting.

Jeff- how wiggly was that? :)

I find myself getting tied into linguistic knots when talking about statistical stuff. But I think this statement is actually not incorrect.

I think it's actually better to fit the original data rather than the averaged values. If the measurement are normally distributed and the fit function is linear, the two fits will be identical. If you violate either of those they may not be, and with all the raw data present, you can see problems like biases, non-constant variance, skewed residuals, etc.

Admittedly, one can get a bit lost wading among the terms uncertainty and error as well as average and mean as well as standard or not.

https://en.wikipedia.org/wiki/Standard_error

The Guide to Uncertainty in Measurement (GUM) is one of my go-to references.

https://www.bipm.org/documents/20126/2071204/JCGM_100_2008_E.pdf/cb0ef4…

More recently, I also draw on this book.

Measurements and Their Uncertainties, I G Hughs and T P A Hase (Oxford University Press, 2010).

I hold that error means a difference between any one measured value and the truth while uncertainty maps how errors are distributed in a set of measurements. I ground myself first at the infinite end with the terms mean (mu) and standard deviation (sigma). An infinite distribution has no need to invoke a standard error in the mean (SEM), ostensibly because the value would be identically zero. In a finite world, the terms that I use to represent the distribution become average <...> and standard uncertainty S. I use the term standard uncertainty of the mean S_M not standard error of the mean (although, if I was truly pedantic, I guess I may have to call the term standard uncertainty of the average).

I agree that curve fits should be to the entire set of data but sometimes these things are not done that way for various reasons -- good, bad, and ugly.

In any case, regardless of terminology, my question is simple:  Based on how the algorithm for the curve fitting works in IP, should the weightings that are fed to curve fit be standard uncertainties / standard deviations (S or sigma) or should the weightings fed to the curve fit be the standard uncertainty (standard error) of the mean (S_M or SEM)?

If the weightings should be the S_M (SEM), I would ask to change the Curve Fit help and input dialogs on the Curve Fit panel to use the proper terms.

RATHER THAN

... where sigma is the standard deviation for each value.

USE

... where SEM is the standard error of the mean for each value.

I can appreciate that this distinction may not (will not???) matter *as long as all data points are derived from the same number of measurements N*. The difference between S and S_M across all weights cancels out. BUT, when some data points are from five measurements and other data points are only from three measurements, the distinction in weighting using S versus S_M matters at each data point.

Finally, I note that in Hughs and Hase, the discussion on weighted regression fitting suggests that weighting factors are typically taken as the inverse of the variances in the distribution. So, they suggest the weightings should be tied to S not to S_M.

In reply to by johnweeks

johnweeks wrote:

Are your error bars standard deviations of multiple samples? The weights should really be the standard error of the mean, that is STD/sqrt(N-1).

The error bars (+/-) represent the standard error. Each data point is obtained by averaging/integrating the data over a certain interval, and the standard error is generated/calculated accordingly.

In reply to by jjweimer

jjweimer wrote:

Admittedly, one can get a bit lost wading among the terms uncertainty and error as well as average and mean as well as standard or not.

https://en.wikipedia.org/wiki/Standard_error

The Guide to Uncertainty in Measurement (GUM) is one of my go-to references.

https://www.bipm.org/documents/20126/2071204/JCGM_100_2008_E.pdf/cb0ef4…

More recently, I also draw on this book.

Measurements and Their Uncertainties, I G Hughs and T P A Hase (Oxford University Press, 2010).

I hold that error means a difference between any one measured value and the truth while uncertainty maps how errors are distributed in a set of measurements. I ground myself first at the infinite end with the terms mean (mu) and standard deviation (sigma). An infinite distribution has no need to invoke a standard error in the mean (SEM), ostensibly because the value would be identically zero. In a finite world, the terms that I use to represent the distribution become average <...> and standard uncertainty S. I use the term standard uncertainty of the mean S_M not standard error of the mean (although, if I was truly pedantic, I guess I may have to call the term standard uncertainty of the average).

I agree that curve fits should be to the entire set of data but sometimes these things are not done that way for various reasons -- good, bad, and ugly.

In any case, regardless of terminology, my question is simple:  Based on how the algorithm for the curve fitting works in IP, should the weightings that are fed to curve fit be standard uncertainties / standard deviations (S or sigma) or should the weightings fed to the curve fit be the standard uncertainty (standard error) of the mean (S_M or SEM)?

If the weightings should be the S_M (SEM), I would ask to change the Curve Fit help and input dialogs on the Curve Fit panel to use the proper terms.

RATHER THAN

... where sigma is the standard deviation for each value.

USE

... where SEM is the standard error of the mean for each value.

I can appreciate that this distinction may not (will not???) matter *as long as all data points are derived from the same number of measurements N*. The difference between S and S_M across all weights cancels out. BUT, when some data points are from five measurements and other data points are only from three measurements, the distinction in weighting using S versus S_M matters at each data point.

Finally, I note that in Hughs and Hase, the discussion on weighted regression fitting suggests that weighting factors are typically taken as the inverse of the variances in the distribution. So, they suggest the weightings should be tied to S not to S_M.

Thank you so much JJ for your comments.

The data points shown in the image are actually the result of a bin, these come from the same dataset (2D dataset to be precise). I guess we all here know what bin is; otherwise, I can also specify that binning consist of averaging/integrating the data points over a defined interval. So, each of the data points (black closed markers) represents the mean of the data points in the considered range (interval).

 

My question is to understand the exact meaning of the chi-square that I get, with respect to this fit.

In reply to by KurtB

KurtB wrote:

My understanding is that the weightings should be representative of the noise in the data. At first glance, it looks as though your error bars are quite large - if the error bars are supposed to represent +/- one st. dev. then one would expect roughly one third of the data points to lie outside the error bars. If you over estimate the size of the uncertainty (noise), then the 'normalisation' of Chi-Sq using such weightings will cause the chi-sq to be smaller than one would reasonably expect. 

Here is an example to demonstrate the fitting and calculating the 'Reduced - Chi-Sq', which should be around 1 for a good fit, much greater than 1 for a poor fit, and much less than 1 for overfitting.

Make/O/D/N=100 myData, myNoise
SetScale/I x -5,5,"", myData
SetScale/I x -5,5,"", myNoise
myNoise = 0.15 + 0.1 * p / DimSize(myNoise,0)
myData = 2+tanh(x) + gnoise(myNoise)
Display myData
ModifyGraph mode=2,marker=1,rgb=(0,0,65280)
ErrorBars myData Y,wave=(myNoise,myNoise)
CurveFit/NTHR=0/W=0 Sigmoid  myData /W=myNoise /I=1 /D /R
ModifyGraph zero(Res_Left)=1,mode(Res_myData)=1
print V_chisq / (DimSize(myNoise,0) - 4)

https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic

 

Thanks KurtB for your comments.

All I can say at this point is that the error bars are the standard errors resulting from integrating over regular interval. I will try to look into why I have such a small chi-square value.

In reply to by Arnaud_P

Arnaud_P wrote:

stion is to understand the exact meaning of the chi-square that I get, with respect to this fit.

I have extended the demo a bit to try to illustrate what Igor actually calculates as ChiSquared.

Function ChiSqDemo()
    SetRandomSeed 0.1
    Make/O/D/N=100 myData, myNoise
    SetScale/I x -5,5,"", myData
    SetScale/I x -5,5,"", myNoise
    myNoise = 0.15 + 0.1 * p / DimSize(myNoise,0) // arbitrary, non-uniform noise
    myData = 2+tanh(x) + gnoise(myNoise) // sigmoid function for creating fake data
    Display myData
    ModifyGraph mode=2,marker=1,rgb=(0,0,65280)
    ErrorBars myData Y,wave=(myNoise,myNoise)
    CurveFit/NTHR=0/W=0 Sigmoid  myData /W=myNoise /I=1 /D /R
    ModifyGraph zero(Res_Left)=1,mode(Res_myData)=1
    print "ChiSq:", V_chisq
    Wave/D Res_myData
    Duplicate/O Res_myData, ResDivNoiseSq
    ResDivNoiseSq[] = (Res_myData[p] / myNoise[p]) ^2
    print "SumSq ResidDivNoise",sum(ResDivNoiseSq)
    Wave/D W_coef
    print "Reduced ChiSq:", V_chisq / (DimSize(myNoise,0) - (DimSize(W_coef,0))
end

which produces the following:

  ChiSq:  113.838
  SumSq ResidDivNoise  113.838
  Reduced ChiSq:  1.18581

So you can see that the Chi-Squared reported from the fit function is the sum-square of the residuals divided by the weighting for each point.

For comparing different datasets that may have different numbers of points, and indeed assessing the goodness of the fit, in my experience it is usual to normalise Chi-Sq to the Reduced Chi Sq. This does require that the weightings used are a good estimation of the random noise in the data, and thus systematic deviations in the residuals can be attributed to the model not describing the empirical data in detail and will result in a Reduced Chi-Sq >>1.

Hope this helps!

 

In reply to by KurtB

KurtB wrote:

 

Arnaud_P wrote:

 

stion is to understand the exact meaning of the chi-square that I get, with respect to this fit.

 

 

I have extended the demo a bit to try to illustrate what Igor actually calculates as ChiSquared.

Function ChiSqDemo()
    SetRandomSeed 0.1
    Make/O/D/N=100 myData, myNoise
    SetScale/I x -5,5,"", myData
    SetScale/I x -5,5,"", myNoise
    myNoise = 0.15 + 0.1 * p / DimSize(myNoise,0) // arbitrary, non-uniform noise
    myData = 2+tanh(x) + gnoise(myNoise) // sigmoid function for creating fake data
    Display myData
    ModifyGraph mode=2,marker=1,rgb=(0,0,65280)
    ErrorBars myData Y,wave=(myNoise,myNoise)
    CurveFit/NTHR=0/W=0 Sigmoid  myData /W=myNoise /I=1 /D /R
    ModifyGraph zero(Res_Left)=1,mode(Res_myData)=1
    print "ChiSq:", V_chisq
    Wave/D Res_myData
    Duplicate/O Res_myData, ResDivNoiseSq
    ResDivNoiseSq[] = (Res_myData[p] / myNoise[p]) ^2
    print "SumSq ResidDivNoise",sum(ResDivNoiseSq)
    Wave/D W_coef
    print "Reduced ChiSq:", V_chisq / (DimSize(myNoise,0) - (DimSize(W_coef,0))
end

which produces the following:

  ChiSq:  113.838
  SumSq ResidDivNoise  113.838
  Reduced ChiSq:  1.18581

So you can see that the Chi-Squared reported from the fit function is the sum-square of the residuals divided by the weighting for each point.

For comparing different datasets that may have different numbers of points, and indeed assessing the goodness of the fit, in my experience it is usual to normalise Chi-Sq to the Reduced Chi Sq. This does require that the weightings used are a good estimation of the random noise in the data, and thus systematic deviations in the residuals can be attributed to the model not describing the empirical data in detail and will result in a Reduced Chi-Sq >>1.

Hope this helps!

 

 

Hi KurtB,

Yes, this explanation is indeed helpful.

Thanks

Just to follow up. Should the input values for the weightings to a curve fit in Igor Pro be the standard uncertainties (standard deviations) or should they be the standard uncertainties of the mean? In case the terms are still confusing, as the number of measurements goes to infinity, the standard uncertainty goes to the standard deviation and the standard uncertainty of the mean goes to zero.

In reply to by jjweimer

jjweimer wrote:

Just to follow up. Should the input values for the weightings to a curve fit in Igor Pro be the standard uncertainties (standard deviations) or should they be the standard uncertainties of the mean? In case the terms are still confusing, as the number of measurements goes to infinity, the standard uncertainty goes to the standard deviation and the standard uncertainty of the mean goes to zero.

I think that if you are fitting using the means of many bins then you should use the standard uncertainties of the mean as the weighting in the curve fit.

I have attempted to illustrate this in the following demo, where I have generated a load of data and cast this as both a set of binned and non-binned waves, and fitted both sets. No guarantee that this is strictly correct, but I think it is. I hope the code is sufficiently simple to be pretty much self-explanatory?

Function CompareFitStDevs()
    //SetRandomSeed 0.1  // uncomment this for consistent 'random' data
    Variable vSD = 0.2
    Variable vBinSize = 1000
    Variable vNumBins = 100
    Make/O/D/N=(vNumBins, vBinSize) myDataY, myDataX, myNoise
    Make/O/D/N=(vNumBins) myBinDataY, myBinDataSE
    SetScale/I x -5,5,"", myBinDataY
   
    // populate with noisy sigmoid
    myNoise[][] = vSD
    myDataX[][] = pnt2x(myBinDataY, p)
    myDataY[][] = 2+tanh(myDataX[p][q]) + gnoise(myNoise[p][q]) // sigmoid function for creating fake data
   
    // populate binned data - use for loop to (hopefully) make it clear what is being done
    Variable i
    Make/FREE/D/N=(vBinSize) wtemp
    for(i=0; i < vNumBins; i += 1)
        wtemp[] = myDataY[i][p]
        myBinDataY[i] = mean(wtemp)
        myBinDataSE[i] =sqrt( variance(wtemp) / vBinSize ) //empirical std error of the mean
    endfor
   
    // flatten the original dataset into one dimension
    Redimension/N=(vNumBins * vBinSize) myDataY, myDataX, myNoise
   
    // fit the full dataset, as one.
    Display myDataY vs myDataX
    ModifyGraph mode=2, rgb=(0,0,65280)
    CurveFit/NTHR=0/N Sigmoid  myDataY /X=myDataX /W=myNoise /I=1 /D
    Wave/D W_coef
    Variable vNumFitParams = DimSize(W_coef,0)
    Variable vChiSq_Full = V_chisq
   
    // fit the binned data
    Display myBinDataY
    ModifyGraph mode=2, rgb=(0,0,65280)
    CurveFit/NTHR=0/N Sigmoid  myBinDataY  /W=myBinDataSE /I=1 /D /R
    ModifyGraph zero(Res_Left)=1, lowTrip(Res_Left)=0.0001
    SetAxis/A/E=2 Res_Left
    ErrorBars Res_myBinDataY Y,wave=(myBinDataSE,myBinDataSE)
    ModifyGraph mode(myBinDataY)=3,marker(myBinDataY)=19,msize(myBinDataY)=0.1
    Variable vChiSq_Binned = V_chisq
   
    // refit binned data, but with an a priori (known) st dev
    Duplicate/O myBinDataSE, myKnownBinDataSE
    myKnownBinDataSE[] = vSD / sqrt(vBinSize)
    CurveFit/NTHR=0/N Sigmoid  myBinDataY  /W=myKnownBinDataSE /I=1 /D
    Variable vChiSq_Binned_Known = V_chisq
   
    // print summary of Chi-Sq data
    print "\n"
    print " Full dataset:"
    print "ChiSq:", vChiSq_Full
    print "Reduced ChiSq:", vChiSq_Full / (DimSize(myDataY,0) - vNumFitParams )
   
    print " Binned dataset (empirical st dev):"
    print "ChiSq:", vChiSq_Binned
    print "Reduced ChiSq:", vChiSq_Binned / (DimSize(myBinDataY,0) - vNumFitParams )
   
    print " Binned dataset (known st dev):"
    print "ChiSq:", vChiSq_Binned_Known
    print "Reduced ChiSq:", vChiSq_Binned_Known / (DimSize(myBinDataY,0) - vNumFitParams )
End

 

 

Jeff- Sorry I didn't respond more quickly, things have been a bit hectic. My understanding is that the weight should be the standard deviation of whatever value you are fitting. If you have multiple points averaged, you have a more or less built-in estimate, which is the "standard error of the mean". If you have raw data, and are lucky enough to have some independent estimate of the standard deviation of your measurements, use that. If you have raw values and no idea of the actual measurement uncertainty, and the uncertainty is constant, and there are no systematic errors, and you have a good model, you are probably best off to fit the raw data without weighting as the residuals will estimate it better than a binning estimation.

Please note all the qualifications to my last statement!