Fit a straight line to X-Y data with correlated errors

Calculate the best fit line, in a least-squares sense, taking into account correlated errors in x and y.

Output includes the reduced chi-square statistic, covariance matrix for fit coefficients, and confidence bands.

Please let me know if you find any errors. A more general solution for fitting with correlated errors would be welcome (e.g., Moniot 2009, doi:10.1016/j.apnum.2008.01.001), but that's beyond my maths skills.

// Use an LSE method to find best fit line y = a + bx
// For normally distributed and possibly correlated errors in x and y.
// When correlation coefficients in Rwave are all 0,
// YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, confidence=0.95, CB=1, studentize=1)
// is the same as
// CurveFit/ODR=2/X=1 line Ywave /X=Xwave /W=sigmaY /I=1 /XW=sigmaX /F={0.95, 5}
// Writes to W_coef, W_sigma, V_chisq, V_q, V_MSWD, M_covar, and
// W_ParamConfidenceInterval

// Uses method of York (1969) Least squares fitting of a line with
// correlated errors. Earth and Planetary Science Letters, 5: 320-324

// Maximum likelihood errors after York et al. (2004)
// Unified equations for the slope, intercept, and standard errors of the
// best straight line. American Journal of Physics, 72: 367-375.

// optional parameters for YorkFit
// errors defines the type of errors in Yerror and Xerror waves. Set
// errors to 1 (default) for se, 2 for 2*se, 0: for reciprocal errors.
// confidence supplies the confidence level for calculating confidence
// interval for fit coefficients and confidence bands.
// CB determines whether to append confidence bands to plot. CB = 1
// (default): plot confidence bands, CB = 2: calculate and plot Ludwig
// (1980) style confidence bands. These are 'symmetrized', such that
// confidence bands conicide with those for a regression of x against y.
// When CB = 0, confidence bands are recalculated but not added to the
// top graph.
// Set studentize to 1 (default) to always use data scatter to calculate
// studentized confidence interval. Otherwise, if p(chisq) > studentize
// or studentize = 0, a priori errors are used to calculate confidence
// interval. This option is provided as method for adjusting the
// confidence bands for fits to underdispersed data.
function YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, [errors, confidence, CB, studentize])
    wave Ywave, Yerror, Xwave, Xerror, Rwave
    variable errors, confidence, CB, studentize
   
    errors = ParamIsDefault(errors) ? 1 : errors
    confidence = ParamIsDefault(confidence) ? 0.95 : confidence
    CB = ParamIsDefault(CB) ? 1 : CB
    studentize = ParamIsDefault(studentize) ? 1 : studentize
       
    // the fit coefficients
    variable a, b
   
    variable oldb, Xbar, Ybar, s2_a, s2_b
    Make /D/free/N=(numpnts(Xwave)) sigmaX, sigmaY, weightX, weightY, W, w_alpha, w_beta, w_U, w_V, w1, w2, Xcorr, Ycorr
   
    sigmaX = errors > 0 ? Xerror/errors : 1/Xerror
    sigmaY = errors > 0 ? Yerror/errors : 1/Yerror
                   
    // first get an estimate of b
    CurveFit /Q/X=1 line Ywave /X=Xwave /W=sigmaY /D/I=1
    wave wfit = $"fit_"+NameOfWave(Ywave)
    wave /D w_coef
    b=w_coef[1]

    weightX = 1/sigmaX^2
    weightY = 1/sigmaY^2
   
    w_alpha=sqrt(weightX*weightY)
       
    do
        oldb=b
        W=weightX[p]*weightY[p]/(weightX[p]+b^2*weightY[p]-2*b*Rwave*w_alpha)
        w1=W*Xwave
        Xbar=sum(w1)/sum(W)
        w1=W*Ywave
        Ybar=sum(w1)/sum(W)
       
        w_U=Xwave-Xbar
        w_V=Ywave-Ybar
        w_beta=W*(w_U/weightY + b*w_V/weightX - (b*w_U + w_V)*Rwave/w_alpha)
        w1=W*w_beta*w_V
        w2=W*w_beta*w_U
        b=sum(w1)/sum(w2)
    while(abs((b - oldb)/oldb) > 1e-14)
   
    Make /D/O w_coef
    a=Ybar-b*Xbar
    w_coef={a, b}
   
    wfit=a+b*x
    printf "%s = %g%+g*x\r", GetWavesDataFolder(wfit, 4), a, b
   
    // calculate chi-squared and MSWD
    Duplicate /free W w_chiSq
    w_chiSq=W*(Ywave-b*Xwave-a)^2
   
    // use same global variables as Igor's CurveFit
    Execute /Q "variable /G V_chisq, V_q"
    NVAR chi2=V_chisq
    NVAR pChi=V_q
   
    chi2=sum(w_chiSq)
    variable DoF=numpnts(Xwave)-2
    variable /G V_MSWD=chi2/DoF
   
    // calculate uncertainties in fit coefficients
    Xcorr=Xbar+w_beta
    w1=W*Xcorr
    xbar=sum(w1)/sum(W) // xbar is now calculated for corrected points, small-xbar in York et al 2004
    w_U=Xcorr-xbar // this is small u in York et al 2004
   
    w1=W*w_U^2
    s2_b=1/sum(w1)
    s2_a=1/sum(W)+xbar^2*s2_b
    variable sigma_a=sqrt(s2_a)
    variable sigma_b=sqrt(S2_b)
    Make /D/O w_sigma={sigma_a,sigma_b}
    // these are LSE calculated for points adjusted to maximum liklihood positions,
    // and are equal to maximum likelihood errors
   
    // calculate p value for chi-squared
    pChi = 1 - StatsChiCDF(chi2, numpnts(Xwave)-2)
   
    if(studentize==0 || pChi > studentize)
        printf "Calculating %g%% confidence from a priori errors\r", 100*confidence
    endif
   
    // in the case of underdispersion, optionally use tValue for 'infinite' points
    DoF = studentize==0 || pChi > studentize ? 1e10 : DoF
    variable tValue = StatsInvStudentCDF(1-(1-confidence)/2, DoF)
   
    // calculate confidence interval for fit coefficients
    printf "Coefficient values ± %g%% Confidence Interval\r", 100*confidence
    printf "a = %g ± %g\r", a, tValue*w_sigma[0]
    printf "b = %g ± %g\r", b, tValue*w_sigma[1]
    Make /D/O W_ParamConfidenceInterval={tValue*w_sigma[0],tValue*w_sigma[1],confidence}
   
    // calculate covariance matrix
    variable cov = -xbar*s2_b // cov(a,b)
    Make /D/O/N=(2,2) M_Covar
    M_Covar = { {s2_a, cov}, {cov, s2_b} }
   
    Print w_coef
    Print w_sigma
    printf "M_covar = {{%g,%g},{%g,%g}}\r", s2_a, cov, cov, s2_b
    printf "χ2 = %g, MSWD = %g, p(χ2) = %g\r", chi2, V_MSWD, pChi
   
   
    // create confidence band waves
    Make /O/N=200 $"UC_"+NameOfWave(Ywave) /wave=UC, $"LC_"+NameOfWave(Ywave) /wave=LC
    SetScale /I x, leftx(wfit), pnt2x(wfit,numpnts(wfit)-1), UC, LC
   
    if(CB==2)
        // calculate confidence interval, following Ludwig (1980)
        // Calculation of uncertainties of U-Pb isotope data
        // Earth and Planetary Science Letters, 46: 212-220
        // switch to using variable names consistent with Ludwig reference:
        // i = intercept, s = slope
       
        // xbar is for corrected points. do the same for ybar:
        Ycorr = a + xCorr * b
        w1=W*Ycorr
        ybar=sum(w1)/sum(W)
       
        variable deltaTheta = tValue*sigma_b*cos(atan(b))^2
        variable sSym = (tan(atan(b)+deltaTheta) + tan(atan(b)-deltaTheta))/2
        variable iSym = ybar - sSym*xbar
        variable deltaS = (tan(atan(b) + deltaTheta) - tan(atan(b)-deltaTheta) )/2
        variable deltaI = tValue*sigma_a + (deltaS - tValue*sigma_b) * xbar
        UC = iSym + sSym*x + sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
        LC = iSym + sSym*x - sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
        printf "Symmetric confidence bands: %s, %s\r", PossiblyQuoteName(NameOfWave(UC)), PossiblyQuoteName(NameOfWave(LC))
    else
        // the 'normal' form for confidence interval, non-symmetrized.
        UC = a + b*x + sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
        LC = a + b*x - sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
    endif
   
    CheckDisplayed Ywave, wfit, UC, LC
    if(!(V_flag&0x01))
        return 1
    endif
    if(!(V_flag&0x02))
        AppendToGraph wfit
    endif
    if(CB && !(V_flag&0x04))
        AppendToGraph UC
    endif
    if(CB && !(V_flag&0x08))
        AppendToGraph LC
    endif
end

 

A few years ago I was working on line-fitting in three dimensions. I found the following references useful: (the first is a paper-back book)

S. Van Huffel and J. Vandewalle, “The total least squares problem,” Society for Industrial and Applied Mathematics (SIAM), (Philadelphia, 1991) ISBN 0-89871-275-0.

G. H. Golub and C. F. Van Loan, “An analysis of the total least squares problem,” SIAM J. Numer. Anal., 17, no. 6, December 1981, pp. 883-893.

S. Van Huffel and J. Vandewalle, “Analyses and properties of the generalized total least squares problem AXB when some or all columns in A are subject to error,” SIAM J. Matrix. Anal. Appl., 10, no. 3, July 1989, pp. 294-315.

T. J. Abatzoglou and J. M. Mendel, “Constrained total least squares,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’87), 12, 1987, pp. 1485-1488.

You're welcome, Tony.  I would also suggest looking at IgorPro's Singular Value Decomposition operation

DisplayHelpTopic "MatrixSVD"

That, in conjunction with diagonalization of the covariance matrix, might be useful. SVD can be handy for doing line fits through 2- and 3-D point clouds. Finding confidence levels of the resulting coefficients is beyond my skill set.

For anyone following this thread, I have edited the code snippet to add some additional options for calculating confidence bands.

Forum

Support

Gallery

Igor Pro 8

Learn More

Igor XOP Toolkit

Learn More

Igor NIDAQ Tools MX

Learn More