Lognormal Distributions

I'm using IGOR to fit data that has a nice lognormal distribution.

When I look at the fitting parameters I get X0 and "Width" which I suppose to be related to the mean and the standard deviation.
Can anyone please tell me what is the relationship between X0 and mean as well as "Width" and the standard deviation of the data...
The "width" appears to be exceedingly small...my data range is 200 and I get a width of 0,8... how does this translates to the standard deviation of the data?

Cheers and thanks for the help!

I just recently had to work through all this for the Multipeak Fit 2 package, and I documented it in the help for Multipeak Fit 2. That help hasn't been released yet, so I have attached a copy of the revised help file to this posting. See the extensive notes on the LogNormal peak shape in the section "Notes on the Peak Functions".

A side note: I can see that there is an offset of the fitted curve relative to the histogram data, an unfortunate consequence of the way histogram data is represented in Igor. Check out these help topics:

DisplayHelpTopic "Guided Tour 3 - Histograms and Curve Fitting"
DisplayHelpTopic "Graphing Histogram Results"
DisplayHelpTopic "Curve Fitting to a Histogram"

To view the topics, copy a command, paste it into Igor's command line and press Enter.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
As the Help file says, the most straightforward thing is to have the histogram generate x values centered in the bin. When I do this, I simply give up on using bars or cityscape and make the plot with markers. That is much easier than the kluges suggested in the manual (making an explicit x-wave shifted by half a bin, etc.).

Still, this does suggest that in their revisions to Igor, the good folks at WaveMetrics might want to add "bin-centered" options to bars and cityscape modes for IP7.
Thank you very much for all your input!
I will check these later with care and will come back in case I run into anything major!
Thank you again!

Cheers,

R.
A side question about the uncertainties reported by Multipeakfitting (MPF2)... posting it here because my question stems from the help file JohnWeeks posted, which contains a nice discussion of the lognormal curve (that's unfortunately missing from my Igor version of MPF2).

The help file posted here describes the calculation of the lognormal peak area, B, as:

B = A*x0*width*sqrt(pi)*exp(width^2/4) 

Using simple error propagation, I believe it then follows that the uncertainty in B, sigmaB, is:
sigmaB = (sigmaA / A)^2 + (sigmaX0 / x0)^2 + (sigmaWidth / width)^2 + (sigmaWidth / (2*width))^2
sigmaB = sqrt(sigmaB) * B


...using sigma(exp(width^2/4)) = sigma(width) / 2. Unless I've made an algebraic error, this is way off what MPF2 reports.

I dug a bit into the MPF2 code and found that LogNormalPeakParams() in Peakfunctions2.ipf is used to calculate the lognormal uncertainties. I pasted the code used for the area below. I believe cw=coefficient wave, sw=sigma wave (?). To be honest, I haven't 100% decoded what sw means, but I think I can already conclude that the approach here is way different from my approach above.

So... has the MPF code changed to make the equation for B above invalid, or am I making a big mistake somewhere?

Thanks a lot for any help!
j

    // area
    outWave[2][0] = cw[0]*cw[1]*cw[2]*sqrt(pi*exp(cw[1]*cw[1]/2))
   
    outWave[2][1] = sw[0][0]^2*LogNormalDAreaDw0(cw)^2
    outWave[2][1] += sw[1][1]^2*LogNormalDAreaDw1(cw)^2
    outWave[2][1] += sw[2][2]^2*LogNormalDAreaDw2(cw)^2
    outWave[2][1] += 2*sw[0][1]^2*LogNormalDAreaDw0(cw)*LogNormalDAreaDw1(cw)
    outWave[2][1] += 2*sw[0][2]^2*LogNormalDAreaDw0(cw)*LogNormalDAreaDw2(cw)
    outWave[2][1] += 2*sw[1][2]^2*LogNormalDAreaDw1(cw)*LogNormalDAreaDw2(cw)
    outWave[2][1] = sqrt(outWave[2][1])

// helper functions
Function LogNormalDAreaDw0(w)
    Wave w
    return SqrtPi*w[1]*w[2]*sqrt(exp((w[1]^2)/2))
end

Function LogNormalDAreaDw1(w)
    Wave w
    return 0.5*SqrtPi*w[0]*w[2]*(2+w[1]^2)*sqrt(exp((w[1]^2)/2))
end
   
Function LogNormalDAreaDw2(w)
    Wave w
    return sqrt(exp((w[1]^2)/2))*SqrtPi*w[0]*w[1]
end
   
Function LogNormalDFWHMDw0(w)
    Wave w
    return exp(w[1]*SqrtLn2) - exp(-w[1]*SqrtLn2)
end
   
Function LogNormalDFWHMDw1(w)
    Wave w
    return SqrtLn2 * w[0] * (exp(-w[1]*SqrtLn2) + exp(w[1]*SqrtLn2))
end
I love this kind of question, it is so interesting. The easiest way of determining uncertainties of this type is to do it numerically, and was shown to me by Rick Gerkin.

http://en.wikipedia.org/wiki/Cholesky_decomposition#Monte_Carlo_simulat…

You need the covariance matrix for this, which is M_Covar. The following code uses these parameters:
coefs - the fit parameters
M_covar - the covariance matrix
holdstring - which parameters were held, which were varied.
howmany - how many sample vectors you want. Make it a large number, say 1000.

Basically, the code makes a matrix filled with gaussian noise, all of standard deviation 1. You then multiply by the cholesky decomposition of the covariance matrix and take the transpose. This gives a matrix filled with correlated noise, to which you add the fitted parameters. This gives you a matrix of sample vectors. These sample vectors have the following properties:
1) The mean of the sample vectors is equal to the vector containing the fitted parameters.
2) The standard deviation of a given parameter in this matrix is equal to the uncertainty in the parameter (that's what the covariance matrix encodes).
3) In addition, the covariance between the parameters is the same as in the covariance matrix.

What you can then do is calculate the area under the log normal distribution for each of the sample vectors, using your equation above. THe mean and standard deviation of all these values give the mean area of the curve, and the uncertainty in the area. This is quite a robust method of doing the calculation you require.

Function gen_synthMCfmCovar(coefs, M_covar, holdstring, howmany)
    wave coefs, m_covar
    string holdstring
    variable howmany

    variable varying = 0, ii, whichcol
    duplicate/free coefs, tempcoefs
    duplicate/free M_covar, tempM_covar

    varying = strlen(holdstring)
    for(ii = dimsize(coefs, 0)-1 ; ii >= 0 ; ii-=1)
        if(str2num(holdstring[ii]))
            deletepoints/M=0 ii, 1, tempcoefs, tempM_covar
            deletepoints/M=1 ii, 1, tempM_covar
            varying -=1
        endif
    endfor

    //create some gaussian noise
    make /free/d/n=(varying, howmany) noises = gnoise(1., 2)
    //and create the correlated noise from the covariance matrix.
    matrixop/free correlatedNoises = (chol(tempm_covar) x noises)^t

    make/n=(howmany, dimsize(coefs, 0))/d/o M_montecarlo
    Multithread M_montecarlo[][] = coefs[q]

    //now add the correlated noise back to the parameters
    whichcol = dimsize(correlatedNoises, 1) - 1
    for(ii = dimsize(coefs, 0)-1 ; ii >= 0 ; ii-=1)
        if(!str2num(holdstring[ii]))
            M_montecarlo[][ii] += correlatedNoises[p][whichCol]
            whichCol -= 1
        endif
    endfor

End



sigma_B^2 = sigmaA^2 * (dB/dA)^2
+ sigmax0^2 * (dB/dx0)^2
+ sigmaw^2 * (dB/dw)^2
+ COVARIANCETERMS

COVARIANCETERMS=2*(dB/dA)*(dB/dxo)*cov_A_x0
+ 2*(dB/dA)*(dB/dw)*cov_A_w
+ 2*(dB/x0)*(dB/dw)*cov_x0_w

where dB/dA is the partial derivative of B with respect to A, etc.

dB/dA = x0 * w * sqrt(Pi) * exp((w^2)/4) = B/A
dB/dx0 = A * w * sqrt(Pi) * exp((w^2)/4) = B/x0
dB/dw = sqrt(Pi) * A * x0 * exp((w^2)/4)(1 + 0.5 * w^2) = B/w * (1+0.5*w^2)

Thus:
sigma_B^2 = sigmaA^2 * B^2/A^2 + sigmax0^2 * B^2/x0^2 + sigmaw^2 * (sqrt(Pi) * A * x0 * exp((w^2)/4)(1 + 0.5 * w^2))^2 + COVARIANCETERMS
= sigmaA^2 * B^2/A^2 + sigmax0^2 * B^2/x0^2 + sigmaw^2 * B^2/w^2 * (1 + 0.5 * w^2)^2 + COVARIANCETERMS

I would definitely plugin the covariance terms. You should have these in your covariance matrix.
You can see why it's easier to do a numerical calculation, as per the previous post.

A.



A.
@andyfaff, thanks for the detailed response.

As at the moment I am doing a kind-of-explorative calculation, I'd like to avoid the thoroughness of the Cholesky-Monte Carlo approach for now. But, your post points out that I am ignoring the covariance of variables when I propagate my uncertainty -- so that's likely why my approach was different (and less accurate) than the MPF code.
Indeed, the calculation of uncertainties in MPF2 includes the covariance terms. Note that references to sw include two indices (like sw[1][0]); it is a matrix wave containing the covariance matrix from the fit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
I might add that Andy Nelson (andyfaff) and Rick Gerkin are both very smart fellows... If I understand the technique correctly, it basically computes the errors in the derived quantities by actually passing them through the transformation. In the case of a nonlinear function, that will give more accurate results than the standard propagation of errors, which is just a first-order expansion of the transformation.

As a side note, you can compute the Cholesky decomposition with the chol keyword in the MatrixOP operation.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
johnweeks wrote:
If I understand the technique correctly, it basically computes the errors in the derived quantities by actually passing them through the transformation. In the case of a nonlinear function, that will give more accurate results than the standard propagation of errors, which is just a first-order expansion of the transformation.


I'm not sure if that's quite right John. I would expect the results from the numerical method and the propagation of error to be the same, as they are both directly calculated using the covariance matrix.
Whilst the error propagation route works well my fear is that a slight mathematical error would sink the ship. People also forget to include covariance, which is a big factor.
In comparison the numerical way is pretty much guaranteed to work, if the covariance matrix is "nice" (well conditioned, etc). It also has the advantage of working for all problems, and doesn't have to be amended.

It's not clear from your message if you actually understand the technique. What it boils down to is synthesis of many trial fit waves. If you took the mean of those fit waves it would be the same as the best fit coefficients. In addition the standard deviation of a given fit parameter over the fit waves would be the same as the parameter uncertainty, and the covariance between the parameters is correct.

These trial waves are synthesized from the covariance matrix itself (and a whole lot of gaussian noise), it's well worth your time reading the code more carefully.

Then what you do is work out the derived parameter from each of the trial waves, working out the mean and sd at the end.

A.
andyfaff wrote:
Then what you do is work out the derived parameter from each of the trial waves, working out the mean and sd at the end.

Well, see, that's the part where I think the methods differ. What you're telling me is that the method is a slick way to generate a bunch of coefficient vectors with statistics that reflect the estimated covariance matrix from the fit. But then you pass those coefficient vectors through whatever (possibly nonlinear) computation you are interested in (like computing area from width and height of a Voigt), so the output of the computation includes the full nonlinearity of the derived parameter. Standard propagation of errors applies a first derivative of the transformation and calls it good. It's a Taylor series truncated at one term.

So really your method is a hybrid- it accepts the covariance matrix from the fit as the basic coefficient error. For a nonlinear fit function that's just an approximation, sometimes not very good. But at least from that point you use exact math, and you could find asymmetric error bars if you wanted to.

Also your method has the distinct advantage of working even on computations that don't have an easy derivative. I might steal it some day to apply to the things in MPF2 that use numerical techniques to get derived parameters.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com