Fit a straight line to X-Y data with correlated errors

Calculate the best fit line, in a least-squares sense, taking into account correlated errors in x and y.

Output includes the reduced chi-square statistic, covariance matrix for fit coefficients, and confidence bands.

Please let me know if you find any errors. A more general solution for fitting with correlated errors would be welcome (e.g., Moniot 2009, doi:10.1016/j.apnum.2008.01.001), but that's beyond my maths skills.

// Use an LSE method to find best fit line y = a + bx
// For normally distributed and possibly correlated errors in x and y.
// When correlation coefficients in Rwave are all 0,
// YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, confidence=0.95, CB=1, studentize=1)
// is the same as
// CurveFit/ODR=2/X=1 line Ywave /X=Xwave /W=sigmaY /I=1 /XW=sigmaX /F={0.95, 5}
// Writes to W_coef, W_sigma, V_chisq, V_q, V_MSWD, M_covar, and
// W_ParamConfidenceInterval

// Uses method of York (1969) Least squares fitting of a line with
// correlated errors. Earth and Planetary Science Letters, 5: 320-324

// Maximum likelihood errors after York et al. (2004)
// Unified equations for the slope, intercept, and standard errors of the
// best straight line. American Journal of Physics, 72: 367-375.

// optional parameters for YorkFit
// errors defines the type of errors in Yerror and Xerror waves. Set 
// errors to 1 (default) for se, 2 for 2*se, 0: for reciprocal errors. 
// confidence supplies the confidence level for calculating confidence 
// interval for fit coefficients and confidence bands.
// CB determines whether to append confidence bands to plot. CB = 1 
// (default): plot confidence bands, CB = 2: calculate and plot Ludwig 
// (1980) style confidence bands. These are 'symmetrized', such that 
// confidence bands conicide with those for a regression of x against y. 
// When CB = 0, confidence bands are recalculated but not added to the 
// top graph.
// Set studentize to 1 (default) to always use data scatter to calculate 
// studentized confidence interval. Otherwise, if p(chisq) > studentize 
// or studentize = 0, a priori errors are used to calculate confidence 
// interval. This option is provided as method for adjusting the 
// confidence bands for fits to underdispersed data. 
function YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, [errors, confidence, CB, studentize])
	wave Ywave, Yerror, Xwave, Xerror, Rwave
	variable errors, confidence, CB, studentize
	
	errors = ParamIsDefault(errors) ? 1 : errors
	confidence = ParamIsDefault(confidence) ? 0.95 : confidence
	CB = ParamIsDefault(CB) ? 1 : CB
	studentize = ParamIsDefault(studentize) ? 1 : studentize
		
	// the fit coefficients
	variable a, b
	
	variable oldb, Xbar, Ybar, s2_a, s2_b
	Make /D/free/N=(numpnts(Xwave)) sigmaX, sigmaY, weightX, weightY, W, w_alpha, w_beta, w_U, w_V, w1, w2, Xcorr, Ycorr
	
	sigmaX = errors > 0 ? Xerror/errors : 1/Xerror
	sigmaY = errors > 0 ? Yerror/errors : 1/Yerror
					
	// first get an estimate of b
	CurveFit /Q/X=1 line Ywave /X=Xwave /W=sigmaY /D/I=1
	wave wfit = $"fit_"+NameOfWave(Ywave)
	wave /D w_coef
	b=w_coef[1]

	weightX = 1/sigmaX^2
	weightY = 1/sigmaY^2
	
	w_alpha=sqrt(weightX*weightY)
		
	do
		oldb=b
		W=weightX[p]*weightY[p]/(weightX[p]+b^2*weightY[p]-2*b*Rwave*w_alpha)
		w1=W*Xwave
		Xbar=sum(w1)/sum(W)
		w1=W*Ywave
		Ybar=sum(w1)/sum(W)
		
		w_U=Xwave-Xbar
		w_V=Ywave-Ybar
		w_beta=W*(w_U/weightY + b*w_V/weightX - (b*w_U + w_V)*Rwave/w_alpha)
		w1=W*w_beta*w_V
		w2=W*w_beta*w_U
		b=sum(w1)/sum(w2)
	while(abs((b - oldb)/oldb) > 1e-14)
	
	Make /D/O w_coef
	a=Ybar-b*Xbar
	w_coef={a, b}
	
	wfit=a+b*x
	printf "%s = %g%+g*x\r", GetWavesDataFolder(wfit, 4), a, b
	
	// calculate chi-squared and MSWD
	Duplicate /free W w_chiSq
	w_chiSq=W*(Ywave-b*Xwave-a)^2
	
	// use same global variables as Igor's CurveFit
	Execute /Q "variable /G V_chisq, V_q"
	NVAR chi2=V_chisq
	NVAR pChi=V_q
	
	chi2=sum(w_chiSq)
	variable DoF=numpnts(Xwave)-2
	variable /G V_MSWD=chi2/DoF
	
	// calculate uncertainties in fit coefficients
	Xcorr=Xbar+w_beta
	w1=W*Xcorr
	xbar=sum(w1)/sum(W) // xbar is now calculated for corrected points, small-xbar in York et al 2004
	w_U=Xcorr-xbar // this is small u in York et al 2004
	
	w1=W*w_U^2
	s2_b=1/sum(w1)
	s2_a=1/sum(W)+xbar^2*s2_b
	variable sigma_a=sqrt(s2_a)
	variable sigma_b=sqrt(S2_b)
	Make /D/O w_sigma={sigma_a,sigma_b}
	// these are LSE calculated for points adjusted to maximum liklihood positions,
	// and are equal to maximum likelihood errors
	
	// calculate p value for chi-squared
	pChi = 1 - StatsChiCDF(chi2, numpnts(Xwave)-2)
	
	if(studentize==0 || pChi > studentize)
		printf "Calculating %g%% confidence from a priori errors\r", 100*confidence
	endif
	
	// in the case of underdispersion, optionally use tValue for 'infinite' points
	DoF = studentize==0 || pChi > studentize ? 1e10 : DoF
	variable tValue = StatsInvStudentCDF(1-(1-confidence)/2, DoF)
	
	// calculate confidence interval for fit coefficients
	printf "Coefficient values ± %g%% Confidence Interval\r", 100*confidence
	printf "a = %g ± %g\r", a, tValue*w_sigma[0]
	printf "b = %g ± %g\r", b, tValue*w_sigma[1]
	Make /D/O W_ParamConfidenceInterval={tValue*w_sigma[0],tValue*w_sigma[1],confidence}
	
	// calculate covariance matrix
	variable cov = -xbar*s2_b // cov(a,b)
	Make /D/O/N=(2,2) M_Covar
	M_Covar = { {s2_a, cov}, {cov, s2_b} }
	
	Print w_coef
	Print w_sigma
	printf "M_covar = {{%g,%g},{%g,%g}}\r", s2_a, cov, cov, s2_b
	printf "χ2 = %g, MSWD = %g, p(χ2) = %g\r", chi2, V_MSWD, pChi
	
	
	// create confidence band waves
	Make /O/N=200 $"UC_"+NameOfWave(Ywave) /wave=UC, $"LC_"+NameOfWave(Ywave) /wave=LC
	SetScale /I x, leftx(wfit), pnt2x(wfit,numpnts(wfit)-1), UC, LC
	
	if(CB==2)
		// calculate confidence interval, following Ludwig (1980)
		// Calculation of uncertainties of U-Pb isotope data
		// Earth and Planetary Science Letters, 46: 212-220
		// switch to using variable names consistent with Ludwig reference:
		// i = intercept, s = slope
		
		// xbar is for corrected points. do the same for ybar:
		Ycorr = a + xCorr * b
		w1=W*Ycorr
		ybar=sum(w1)/sum(W)
		
		variable deltaTheta = tValue*sigma_b*cos(atan(b))^2
		variable sSym = (tan(atan(b)+deltaTheta) + tan(atan(b)-deltaTheta))/2
		variable iSym = ybar - sSym*xbar
		variable deltaS = (tan(atan(b) + deltaTheta) - tan(atan(b)-deltaTheta) )/2
		variable deltaI = tValue*sigma_a + (deltaS - tValue*sigma_b) * xbar
		UC = iSym + sSym*x + sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
		LC = iSym + sSym*x - sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
		printf "Symmetric confidence bands: %s, %s\r", PossiblyQuoteName(NameOfWave(UC)), PossiblyQuoteName(NameOfWave(LC))
	else
		// the 'normal' form for confidence interval, non-symmetrized.
		UC = a + b*x + sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
		LC = a + b*x - sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
	endif
	
	CheckDisplayed Ywave, wfit, UC, LC
	if(!(V_flag&0x01))
		return 1
	endif
	if(!(V_flag&0x02))
		AppendToGraph wfit
	endif
	if(CB && !(V_flag&0x04))
		AppendToGraph UC
	endif
	if(CB && !(V_flag&0x08))
		AppendToGraph LC
	endif
end

 

A few years ago I was working on line-fitting in three dimensions. I found the following references useful: (the first is a paper-back book)

S. Van Huffel and J. Vandewalle, “The total least squares problem,” Society for Industrial and Applied Mathematics (SIAM), (Philadelphia, 1991) ISBN 0-89871-275-0.

G. H. Golub and C. F. Van Loan, “An analysis of the total least squares problem,” SIAM J. Numer. Anal., 17, no. 6, December 1981, pp. 883-893.

S. Van Huffel and J. Vandewalle, “Analyses and properties of the generalized total least squares problem AXB when some or all columns in A are subject to error,” SIAM J. Matrix. Anal. Appl., 10, no. 3, July 1989, pp. 294-315.

T. J. Abatzoglou and J. M. Mendel, “Constrained total least squares,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’87), 12, 1987, pp. 1485-1488.

You're welcome, Tony.  I would also suggest looking at IgorPro's Singular Value Decomposition operation

DisplayHelpTopic "MatrixSVD"

That, in conjunction with diagonalization of the covariance matrix, might be useful. SVD can be handy for doing line fits through 2- and 3-D point clouds. Finding confidence levels of the resulting coefficients is beyond my skill set.

For anyone following this thread, I have edited the code snippet to add some additional options for calculating confidence bands.

Forum

Support

Gallery

Igor Pro 10

Learn More

Igor XOP Toolkit

Learn More

Igor NIDAQ Tools MX

Learn More