Q: How to normalize the histogram base all total count?

Hi there,

I have a set of data that i wish to analyze using the histogram function, and I would like the output to be the probability (i.e. number of count in each bin divided by the total number of points in the data set) instead of counts. I've tried the option of "Normalize Result to Probability Density", but apparently this function is normalizing the area underneath the histogram to 1.

Can someone show me the easiest way to normalize the histogram to probability in Igor?

Thanks,

Yu-Ja
Yu-Ja:

A probability density function has an area of 1, so if that's not what you want, I think you want something other than probability. Please tell us exactly what you want.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Hi,

I would like to make the sum of the probability to be 1 instead of the area.

For example, if I have a set of data [ 0.1 0.1 0.1 0.2] and analyze them using the histogram function with 2 bins, I would like to see the output to be 0.75 (3/4) for the first bin and 0.25 (1/4) for the second bin.

I hope that clarify the confusion.

Thanks again!

Yu-Ja
You may be confused about what the Histogram is doing. Try this simple example:
make/O/N=10000 wave0
wave0 = gnoise(1)  // this creates 10,000 samples of Gaussian distributed noise, std.dev.=1
Make/O/N=200 wave0_Hist
Histogram/C/B={-4,0.04,200} wave0, wave0_Hist
Display wave0_Hist
print sum(wave0_Hist)

The result is 10000, the number of "counts" (points in the wave). Now if you do the same Histogram but select "Normalize Result to Probablility Density"
Histogram/P/C/B={-4,0.04,200} wave0,wave0_Hist
print area(wave0_Hist)

The result is close to the expected unity result for integrating a Probability Density Function.
Remenber that the horizontal x-axis of the Histogram should span the range of vertical (y) values in the original data set. Part of the confusion is that you are neglecting the x-scaling of the Histogram wave in your "hand-made" normalization. To do your own normalization properly (without using the /P flag) you have to divide the histogram wave by the number of points, but also multiply by the number of bins per unit amplitude. This is a complicated way to get the same result as using the /P (probability density) flag.
To use the previous example, perhaps what you want is simply...

Histogram/C/B={-4,0.04,200} wave0, wave0_Hist
wave0_Hist /=  sum(wave0_Hist)


Jeff, regarding your reply I have two comments:
(1) Doing the division as you indicated caused some strange effects for me (IP 6.22A, Win Vista). Finding the sum first, and then dividing worked OK, but does not give the correct p.d.f. for reasons I stated above involving the x-axis scaling and bin density.
(2) I did the /P histogram and plotted it on the same graph (but right-hand axis, blue). Its area is close to one:
print area(wave0_Hist2)
  0.9999

The left-hand axis plot (red) using the sum division is not properly normalized for a p.d.f.
Histograms.png
(1) Doing the division as you indicated caused some strange effects for me (IP 6.22A, Win Vista).


In retrospect, this behavior is not strange. In dividing  Wave0_Hist /= sum(Wave0_Hist) each point on the LHS is being divided by a new and different point on the RHS as Wave0_Hist changes.
Yep- To get what the original poster wants, he needs
Variable temp = sum(wave0_Hist)
wave0_Hist /= temp

As pointed out by s.r.chinn, that is because if you write
wave0_hist /= sum(wave0_hist)
then the right side is doing a sum of a constantly changing left side.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
I'll admit that, although I did test my suggestion, it was only at one point. That was a seriously inept error on my part and I apologize for any confusion it may have caused.

With regard to the calculation, it seemed that the original poster wanted to express each bin as a fraction of the total counts. That should be the end result of my suggestion ... when implimented correctly.
If the desired result is a probability per bin, then a quick simple example is
function test() //  each bin is treated as a dimensionless bucket
    make/O/N=4 wtest = {0.1, 0.1, 0.1, 0.2}
    make/O/N=2 whist
    histogram/B = {0.05, 0.1, 2} wtest, whist
    whist /= numpnts(wtest) //  each point is a probability of that bin
end

The customer is always right, but the request to "normalize the histogram to probability " had an ambiguous meaning. I think this horse is dead now.
DiscreteHistogram.png