Orthogonal Distance Regression - weight with two SDs?

The X weights are specified using the /XW flag.

This is covered in the reference documentation for CurveFit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

January 10, 2013 at 11:52 am - Permalink

kronion

Thank you for pointing me to the manual again, no clue why I did not see that section before!
BTW: Now I also realized that Igor assumes by default /I=0, i.e. that weights are 1/SD.

Here is my corrected function (to complete the thread):

function myTestFun()
    //generate data
    make/O/N=10 myX, myY, myX_SD, myY_SD
    myX=p+gnoise(.3); myY=p*2.5 + gnoise(.6)
    myX_SD=gnoise(1.); myY_SD=gnoise(.8)
    //display data
    display myY vs myX
    ModifyGraph mode=3,marker=19;DelayUpdate
    ErrorBars myY XY,wave=(myX_SD,myX_SD),wave=(myY_SD,myY_SD)
    //fit data
    CurveFit/ODR=2/NTHR=0/TBOX=768 line  myY /X=myX /I=1 /W=myY_SD /XW=myX_SD /D  //ODR=2: orth. dist. regr.
                                    //... I=1: SDs as weights; 
                                    //... W and XW for weights (y and x direction)
end

Log in or register to post comments

January 11, 2013 at 01:27 am - Permalink

johnweeks

kronion wrote:
BTW: Now I also realized that Igor assumes by default /I=0, i.e. that weights are 1/SD.

That's so, but since you had /I=1 in your original posting, I didn't mention it.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

January 11, 2013 at 01:38 pm - Permalink

DeannaM

Hello,

I am relatively new to using IGOR, and would also like to perform orthogonal distance regression with standard deviations in both the x and y direction. I would also like to constrain the y-intercept to 0. In the above example shown above, how would I replace myY and myX with and myX_SD and myY_SD with the waves corresponding to my data. Also, how would I incorporate constraining the y intercept to zero.

An example of the appropriate code to use would be greatly appreciated.

Log in or register to post comments

February 4, 2013 at 06:31 am - Permalink

johnweeks

I assume that you are doing line fits. The easiest way to do this is to do a regular fit first using the Curve Fit dialog (you will only be able to select the Y error wave as the weighting wave). Then click on the command generated by the dialog that you will find in the History part of the Command window, press Enter to copy it to the command line, then edit the command to add the parts needed for ODR fitting.

To read more about ODR fitting, copy this command to Igor's command line and press Enter:

DisplayHelpTopic "Errors in Variables: Orthogonal Distance Regression"

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

February 4, 2013 at 09:23 am - Permalink

SW33

Hi,
If I=0 is 1/SD (where SD = standard deviation) and I=1 is SD, are there other options to use?
I have %SD errors.

Thanks.

Log in or register to post comments

May 13, 2014 at 06:30 am - Permalink

johnweeks

SW33 wrote:
If I=0 is 1/SD (where SD = standard deviation) and I=1 is SD, are there other options to use?

Those are the only options. You will need to duplicate your %SD wave and adjust it to be SD.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

May 13, 2014 at 09:49 am - Permalink

mwpro

Are there R2 output for the orthogonal fits?

Log in or register to post comments

July 14, 2018 at 11:35 am - Permalink

johnweeks

No, no R-squared for ODR fits. R-squared is valid only for fits that are linear in the coefficients (line fits, polynomials) and ODR fits are inherently nonlinear.

Log in or register to post comments

July 16, 2018 at 11:03 am - Permalink

jamesedlee

Hello,

I am curious about how the curvefit algorithm determines Confidence and Prediction Intervals for ODR with and without weights. Does it make sense that 95% CI decrease when I include weights for the X and Y waves used in the regression?

Thank you

Log in or register to post comments

October 30, 2019 at 11:44 am - Permalink

johnweeks

I'm going to have to think about this one tomorrow- I tried this:

make junk = x+gnoise(10)

I did a regular line fit with residuals and confidence bands:

•CurveFit/TBOX=792 line junk /D /F={0.99, 7}/R
  fit_junk= W_coef[0]+W_coef[1]*x
  Res_junk= junk[p] - (W_coef[0]+W_coef[1]*x)
  W_coef={-6.1618,1.0776}
  V_chisq= 14707.9;V_npnts= 128;V_numNaNs= 0;V_numINFs= 0;
  V_startRow= 0;V_endRow= 127;V_q= 1;V_Rab= -0.864326;
  V_Pr= 0.965619;V_r2= 0.932419;
  W_sigma={1.9,0.0258}
  Fit coefficient confidence intervals at 99.00% confidence level:
  W_ParamConfidenceInterval={4.97,0.0676,0.99}
  Coefficient values ± 99% Confidence Interval
    a   =-6.1618 ± 4.97
    b   =1.0776 ± 0.0676

I see in the source code that chi-square is involved in the computation of confidence bands, so I checked it like this:

•matrixOP/O r2 = sum(Res_junk*Res_junk)
•print r2
  r2[0]= {14707.9}

Note that the chi-square reported by the fit is the same as the sum squared residual, as expected.

Then I did an ODR fit:

•CurveFit/TBOX=792/ODR=2 line junk /D /F={0.99, 7}/R
  Fit converged properly
  fit_junk= W_coef[0]+W_coef[1]*x
  Res_junk= junk[p] - (W_coef[0]+W_coef[1]*x)
  W_coef={-8.8744,1.1203}
  V_chisq= 6663.41;V_npnts= 128;V_numNaNs= 0;V_numINFs= 0;
  V_startRow= 0;V_endRow= 127;
  W_sigma={1.93,0.0264}
  Fit coefficient confidence intervals at 99.00% confidence level:
  W_ParamConfidenceInterval={5.06,0.0691,0.99}
  Coefficient values ± 99% Confidence Interval
    a   =-8.8744 ± 5.06
    b   =1.1203 ± 0.0691

I see what you see in a graph that I don't show here: the confidence and prediction bands shrink a bit, even though the report confidence limits on a and b are a tiny bit larger. But then I checked the residual:

•matrixOP/O r2 = sum(Res_junk*Res_junk)
•print r2
  r2[0]= {2954.84}

Note that the check gives a residual about half as large as the chi-square reported by CurveFit. I imagine that's why the bands shrink, but it's not clear to me that the bands are correct.

The chi-square for an ODR fit is a complicated combination of the X and Y residual, and I suspect the reported residuals may be for the adjusted X, not the input X. Since ODR fitting is a strange kind of beast, I can't tell you off the top of my head how these things should be computed. I'll try to look into it in more depth tomorrow and report back.

Log in or register to post comments

October 30, 2019 at 05:44 pm - Permalink

jamesedlee

Thank you,

I can provide a little more detail about what I am trying to do. I am comparing 2 sets of measurements (each with error) which look reasonably linear. The intercept of this relationship theoretically has meaning. So I tried a couple different methods for regressions:

1) ODR=2 w/ weights in x and y:

curvefit/odr=2 line, myY/x=myX /W=MyY_err /xw=MyX_err /I=1 /D /F={.95, 7}

  Results:a	=4.8787 ± 0.183 (± 95% CI)

	  b	=0.13648 ± 0.00371 (± 95% CI)

2) ODR=2 w/out weights in x and y. (*I think this method assumes equal weights in x and y?)

curvefit/odr=2 line, myY/x=myX /I=1 /D /F={.95, 7}

  Results:a	=5.3278 ± 1.1 (± 95% CI)

	  b	=0.12691 ± 0.0234 (± 95% CI)

3) Least-Squares w/out weights (only for comparison)

curvefit line, myY/x=myX /I=1 /D /F={.95, 7}

  Results:a	=5.3933 ± 1.1 (± 95% CI)

	  b	=0.12551 ± 0.0234 (± 95% CI)

Obviously, my preferred method is method 1, but provides unrealistic uncertainty confidence bans (Prediction Band for the intercept w/method 1 is +/- 0.93) which is a problem.

Log in or register to post comments

October 31, 2019 at 07:35 am - Permalink

johnweeks

You are correct that ODR fits depend heavily on the weighting in both X and Y directions. Among other things, they serve as a geometric scaling factor for the fit. In your case, the slope is much less than 1.0, so the perpendicular direction is nearly vertical. Without weighting, that means that horizontal deviations will be small. If the X errors are large compared to the Y errors, then the geometry is more nearly like what you would have with a slope of 1, or at least the geometry will reflect the error magnitude. Small errors in X would tend to push it more toward a standard least-squares fit, which is appropriate. Standard least squares should be exactly the same as ODR with zero X errors.

Without weighting, the ODR fit is done as if the weighting is 1 in both directions, but then the error estimates for the coefficients are patched up to take the residuals into account. I rely on the ODRPACK code to do this correctly. It was written by people much smarter than me :)

One complication of all this stuff is that the ODR fitting technique is inherently nonlinear, so normal statistics aren't exact. The degree of error is probably small for most problems, and fitting a line should be the closest to a benign situation you can get. But still, it is nonlinear and you should be aware of that.

To compute the confidence intervals for the coefficients, I use internally exactly the equations given here: DisplayHelpTopic "Calculating Confidence Intervals After the Fit". I have no reason to believe it is incorrect for ODR fits, but I am uncertain that the degrees of freedom is correct. That equation is definitely correct for OLS fits, but since ODR fits also fit the independent variable, how does that affect the DoF?

The confidence band is computed by a similar sort of computation that uses the coefficients, the sigmas and the raw (unweighted) chi-square value. So it will have the same problem for an ODR fit that the coefficients have: what is the proper DoF?

The ODR prediction band is clearly wrong- a 95% confidence interval prediction band should enclose about 95% of the raw data. This clearly fails on an ODR fit. The prediction band for an OLS fit uses the unweighted chi-square: that is, sum((yi - yi_hat)^2), or the sum of residuals squared. Light turns on... maybe for an ODR fit, the appropriate residual would be the perpendicular distance...

Now you know about as much as I do.

Log in or register to post comments

October 31, 2019 at 10:26 am - Permalink

johnweeks

In thinking about this a bit more, I see (part of) the problem. In the OLS case (no X errors):

Pi (prediction band) = t*(sigma^2 + V(yhati))

t is the appropriate student's t statistic, sigma is the standard deviation of the Y data, and V(yhati) is the variance of the estimated model value. V(yhati) is computed from the Hessian matrix of derivatives of the model with respect to the coefficients.

My problem here is, what do you use for sigma in the ODR case? I suppose that it's something derived from the error ellipse for the combined X and Y errors.

BUT- since the whole thing is nonlinear by virtue of being an ODR fit, possibly the best way to approach the problem is by some sort of bootstrap or jackknife analysis.

Log in or register to post comments

October 31, 2019 at 11:52 am - Permalink

jamesedlee

Thank you again Johnweeks!

I have had a lot of the same thoughts you have communicated. The analogue to Sum-Squares-Residuals would seem to be the orthogonal distance, but I was unsure that was how Igor calculated it and what that would mean when comparing the CI (would the CI appear smaller/tighter in a plot using ODR with error in both the x and y data?). Is the the orthogonal distance what is used to calculate CI?

The question of DoF is interesting and has been on my mind for other reasons (the data are time series and so are inherently auto-correlated). Yeesh.

Log in or register to post comments

October 31, 2019 at 05:24 pm - Permalink

johnweeks

I think you really need to figure out a way to use a bootstrap or other Monte Carlo method. I should probably remove the option to display the confidence band and prediction band. The question of confidence intervals for the coefficients is somewhat murky. At present, they are based on the sigmas reported by the ODRPACK code, which I trust. The Student's T statistic uses the standard N-M (number of data points minus number of coefficients) for the DoF, which seems likely to be wrong. More stuff is fitted in ODR fitting, so the degrees of freedom should be smaller, but I don't know what to use as the number.

Log in or register to post comments

November 1, 2019 at 10:00 am - Permalink

jjweimer

I have to ask one question as an outside observer with only an intermediate level of appreciation to what is being discussed.

* I take it that all that is under question is the sense or reliability of the confidence bands that are returned by Igor Pro from ODR fitting. The uncertainty values that are returned by the ODR method are not under scrutiny?

Naively, I have to wonder whether the fact that the returned values change depending on whether weighting factors are or not used has a fundamental insight about the nature of the scatter in the uncertainties at any one data point relative to the scatter in the measured values themselves. I also have to wonder whether the variation in the results and/or the importance of distinguishing the outcomes over two different data sets suggest that this system needs to be analyzed from first principles using an uncertainty budget equation as well as using other empirically-driven fitting routines?

Log in or register to post comments

November 1, 2019 at 11:28 am - Permalink

johnweeks

The ODR prediction bands are very much smaller than the ones computed by OLS. They are clearly incorrect.

The ODR confidence bands are very similar to the OLS bands, which would be expected. Whether they are actually correct is another question :)

The sigmas for the coefficients are as reported by the ODRPACK code. Since they are similar to, but not the same as the OLS sigmas, I am confident in them.

The confidence intervals for the coefficients appear more or less correct, but there is a question of degrees of freedom for ODR that I don't know how to answer. That would make small differences in the magnitude of the confidence interval.

I will let the original poster address the nature of the original problem.

Log in or register to post comments

November 1, 2019 at 01:41 pm - Permalink

tony

johnweeks wrote:

The sigmas for the coefficients are as reported by the ODRPACK code. Since they are similar to, but not the same as the OLS sigmas, I am confident in them.

FWIW, my implementation of Numerical Recipes "straight-line data with errors in both coordinates", coded for Igor in 1998 (before the days of findroots), gives very similar sigmas (to within ~ 0.1%) and of course identical fit and chisquare.

Incidentally, is there a straightforward way to determine the best fit with Igor taking into account error correlation? A method is given here:

York, D., Evensen, N.M., Martinez, M.L. and De Basabe Delgado, J., 2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics, 72(3), pp.367-375.

Log in or register to post comments

November 6, 2019 at 05:53 am - Permalink

johnweeks

FWIW, my implementation of Numerical Recipes "straight-line data with errors in both coordinates", coded for Igor in 1998 (before the days of findroots), gives very similar sigmas (to within ~ 0.1%) and of course identical fit and chisquare.

I implemented an Igor-code version of that myself to use as a check on the ODRPACK integration. So it's not too surprising that agreement is observed :)

Incidentally, is there a straightforward way to determine the best fit with Igor taking into account error correlation?

I have had a couple of requests for fitting taking into account error correlation. Implementation is not particularly straightforward in the general case, especially given my feeble mathematical ability, and I'm not certain how to properly represent the covariances in the user interface.

Log in or register to post comments

November 11, 2019 at 11:23 am - Permalink