Q: How to normalize the histogram base all total count?

Yu-Ja:

A probability density function has an area of 1, so if that's not what you want, I think you want something other than probability. Please tell us exactly what you want.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

February 21, 2012 at 09:29 am - Permalink

yuja

Hi,

I would like to make the sum of the probability to be 1 instead of the area.

For example, if I have a set of data [ 0.1 0.1 0.1 0.2] and analyze them using the histogram function with 2 bins, I would like to see the output to be 0.75 (3/4) for the first bin and 0.25 (1/4) for the second bin.

I hope that clarify the confusion.

Thanks again!

Yu-Ja

Log in or register to post comments

February 21, 2012 at 09:40 am - Permalink

s.r.chinn

You may be confused about what the Histogram is doing. Try this simple example:

make/O/N=10000 wave0
wave0 = gnoise(1)  // this creates 10,000 samples of Gaussian distributed noise, std.dev.=1
Make/O/N=200 wave0_Hist
Histogram/C/B={-4,0.04,200} wave0, wave0_Hist
Display wave0_Hist
print sum(wave0_Hist)

The result is 10000, the number of "counts" (points in the wave). Now if you do the same Histogram but select "Normalize Result to Probablility Density"

Histogram/P/C/B={-4,0.04,200} wave0,wave0_Hist
print area(wave0_Hist)

The result is close to the expected unity result for integrating a Probability Density Function.
Remenber that the horizontal x-axis of the Histogram should span the range of vertical (y) values in the original data set. Part of the confusion is that you are neglecting the x-scaling of the Histogram wave in your "hand-made" normalization. To do your own normalization properly (without using the /P flag) you have to divide the histogram wave by the number of points, but also multiply by the number of bins per unit amplitude. This is a complicated way to get the same result as using the /P (probability density) flag.

Log in or register to post comments

February 21, 2012 at 10:22 am - Permalink

jtigor

To use the previous example, perhaps what you want is simply...

Histogram/C/B={-4,0.04,200} wave0, wave0_Hist
wave0_Hist /=  sum(wave0_Hist)

Log in or register to post comments

February 22, 2012 at 05:56 am - Permalink

s.r.chinn

Jeff, regarding your reply I have two comments:
(1) Doing the division as you indicated caused some strange effects for me (IP 6.22A, Win Vista). Finding the sum first, and then dividing worked OK, but does not give the correct p.d.f. for reasons I stated above involving the x-axis scaling and bin density.
(2) I did the /P histogram and plotted it on the same graph (but right-hand axis, blue). Its area is close to one:

•print area(wave0_Hist2)
  0.9999

The left-hand axis plot (red) using the sum division is not properly normalized for a p.d.f.

Attachments Histograms.png (54.08 KB)

Log in or register to post comments

February 22, 2012 at 07:28 am - Permalink

s.r.chinn

(1) Doing the division as you indicated caused some strange effects for me (IP 6.22A, Win Vista).

In retrospect, this behavior is not strange. In dividing Wave0_Hist /= sum(Wave0_Hist) each point on the LHS is being divided by a new and different point on the RHS as Wave0_Hist changes.

Log in or register to post comments

February 22, 2012 at 08:32 am - Permalink

johnweeks

Yep- To get what the original poster wants, he needs

Variable temp = sum(wave0_Hist)
wave0_Hist /= temp

As pointed out by s.r.chinn, that is because if you write
wave0_hist /= sum(wave0_hist)
then the right side is doing a sum of a constantly changing left side.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

February 22, 2012 at 09:19 am - Permalink

jtigor

I'll admit that, although I did test my suggestion, it was only at one point. That was a seriously inept error on my part and I apologize for any confusion it may have caused.

With regard to the calculation, it seemed that the original poster wanted to express each bin as a fraction of the total counts. That should be the end result of my suggestion ... when implimented correctly.

Log in or register to post comments

February 22, 2012 at 10:26 am - Permalink

s.r.chinn

If the desired result is a probability per bin, then a quick simple example is

function test() //  each bin is treated as a dimensionless bucket
    make/O/N=4 wtest = {0.1, 0.1, 0.1, 0.2}
    make/O/N=2 whist
    histogram/B = {0.05, 0.1, 2} wtest, whist
    whist /= numpnts(wtest) //  each point is a probability of that bin
end

The customer is always right, but the request to "normalize the histogram to probability " had an ambiguous meaning. I think this horse is dead now.

Attachments DiscreteHistogram.png (21.97 KB)

Log in or register to post comments

February 22, 2012 at 11:28 am - Permalink