Gaussian mean of waves

Hi everyone,

I am trying to find Gaussian mean of 6 waves. Can anyone tell me how to do that ? "Average wave" packet seems to do a kind of average, but I don't know, if that is a simple arithmatic mean or Gaussian mean.

To make the story complete, these waves have different length (3 waves haves contain 200 number, the other 3 contain 100 numbers), but all 6 have relatedx-axis waves with the same range (-1 until +1) .

thanks

I don't know how the Gaussian mean differs from the arithmetic mean. The average waves package will generate an average wave from your input waves. It can detect the differences in range and points of the input waves. It uses interpolation of the input waves to generate a result wave which has evenly spaced x values, with x scaling to show the average (arithmetic mean) of the interpolated input waves at each x point. Is this what you want?

It would be helpful to define what you mean by "Gaussian mean".  Are you trying to fit 6 numeric values to a Gaussian and then return the center of the Gaussian as the value of your "Gaussian mean"? 

Thanks sjr51 for the answer. I am clear now how the "average waves" package work. 

And you're right Igor. Gaussian average wave should be generated by distribution fitting of 6 numeric values (waves) at the same x-axis. The fitting parameters (mean and standard deviation) are used afterward to generate Gaussian average wave. In ideal experiment, arithmatic and Gaussian average should be identical. However, I should distinguish error from real data based on their distribution. I want only the meaningful data is included in the averaged wave.

In reply to by tony

If your 6 values are random samples of normally distributed values, the sample mean is an estimate of the population mean. Fitting will not improve upon this.

On the other hand, if the data collection is set up so that you are adjusting some quantity and collecting data that is distributed as a function of that quantity, then fitting a Gaussian will require some additional knowledge: what do your six waves represent (what is the independent variable)?

Even then, a least-squares fit of a Gaussian function to just 6 points sounds perilous to me!

I am not sure what you mean by " data independent in the x dimension". The data (wave) represents current density, which obviously depends on the x value (voltage). Gaussian fitting at each voltage is independent, or in another word there is no global fit.
In my current case, Gaussian average and Arithmatic average give almost identical result. However in the next case, I might have considerable error (30%), which comes from short circuit and open circuit of my setup. I hope to distinguish the real data from error, based on normal distribution. With arithmatic average, for sure both error and real data are included, which leads to some error in the final calculation.
I already tried the procedure in Origin, by combining interpolation, matrix transpose, and normal distribution fitting. It works, but looks slow. I might have problems later, when I try 30 or 200 data sets. However, the same procedure might also work in Igor.

Thank you very much for the suggestion Tony. Currently, I am exploring the method using 6 data, which are good. When I know for sure the method works, I will try with more data. I am still new with Igor, and still need time to master the tool.

In reply to by Andika Asyuda

I am struggling to understand what you are doing, but if your data is (going to be) prone to outlier (open circuit or short circuit) data points, then using the Median as a measure of central tendancy is a reasonably robust thing to do.

 

Can you post the equations that you would apply to determine an arithmetic mean and a Gaussian mean. Can you post an example of the plot or data set that you test with the respective values of what you want from arithmetic and Gaussian mean.

Hi KurtB,

on the contrary, you get it perfectly in my opinion. Median is another robust way to handle my data, probably even better depending on the case. These stuffs have been discussed in detail by Reus et al ( J. Phys. Chem. C 2012, 116, 11, 6714-6733 : Statistical Tools for Analyzing Measurements of Charge Transport). My experiment is transport characterization of monolayer film, exactly as discussed in the paper. Gaussian mean procedure might be preferable, because most recent papers which I read use this method

 

So the "Gaussian Mean" as described in the paper consists of fitting a Gaussian to a histogram. You can do this by using the Analysis->Histogram and Analysis->Curve Fitting functions.

However, I should point out that doing this would introduce quite some variability as the histogram binning is arbitrary. As others have pointed out, if the data contains outliers, the median would be the better choice.

As for using the curve fitting method, you could eliminate the binning issue by creating a cumulative histogram of only the data points and fitting that to a Gaussian CDF, i.e. an error function. However, if my statistics knowledge doesn't fail me, I'd expect the location value of this to converge to the MLE, which is the mean of your sample, so you wouldn't gain anything. Correct me on the last part if necessary.

So you want to histogram six measurements and then fit a Gaussian with three adjustable parameters?

Hi jjweimer,

I only used usual equations for both Arithmetic average and Gaussian fitting (model). The Arithmetic average is: yaverage = (y1+y2+y3+y4+y5+y6) /6.For the equation for Gaussian distribution fitting, please the attached figure normal_distribution . I am really sorry, that I cannot post my data at the moment. However it is still too ideal to demonstrate how  arithmetic and Gaussian mean might give slightly different result. Otherwise the paper, which I mentioned above ( J. Phys. Chem. C 2012, 116, 11, 6714-6733 ), demonstrate how it should work quite nicely. 

Thank you very much for the suggestion and explanation serrano. I will keep it mind.

Hi tony,
I believe Gaussian equation only has 2 adjustable parameters, mean and standard deviation. Please correct me if I am wrong. 

normal_distribution

Creating a histogram from just 6 data points is tricky. I don't think binning will work for that. What might work is to calculate the frequency of data points as the inverse distance between neighboring points.

Let's say we have 6 data points: (1.2), (3.1), (3.4), (4.0), (5.0), (7.1)

The frequency of data points half way between point 1 and point 2 can now be calculated as 1/(3.1 - 1.2) = 1/1.9 = 0.53

In terms of Igor code you would do:

Make/O RawData={1.2, 3.1, 3.4, 4.0, 5.0, 7.1}
Make/O/N=5 HistogramX, HistogramY
HistogramX=(RawData[p+1]+RawData[p])/2
HistogramY=1/(RawData[p+1]-RawData[p])
Display HistogramY vs HistogramX

Now you have a histogram you can fit with a Gaussian.

I still don't know if this is more accurate than simply calculating an average.

I guess calculating a simple average fails when the distribution is not symmetric but has extreme outliers in one direction because of a particular error which might occur in your measurement. In that case the average is more sensitive to the extreme asymmetric outliers than fitting the histogram is.

In reply to by Andika Asyuda

Andika Asyuda wrote:

I believe Gaussian equation only has 2 adjustable parameters, mean and standard deviation. Please correct me if I am wrong. 

sure, the PDF has 2 parameters, so you can normalise the histogram by (bin width * number of samples) and fit the PDF.

A histogram is usually employed to reduce the number of representative data points by dividing the original set into bins.  If you only have 6 points there is no advantage in using a histogram because that will reduce your number of data even further.  If you are trying to find a systematic approach to removing outliers (or their contribution) you may be able to use clustering (see documentation for FPClustering, kMeans, etc.)

 

Hi all,

I really appreciate all the feedback. It seems the amount of data set can be an issue in my case. How many number of datasets are suitable for a histogram-based analysis ? Currently I have 9-20 datasets. I am wondering, if I need to collect more data.

Forum

Support

Gallery

Igor Pro 9

Learn More

Igor XOP Toolkit

Learn More

Igor NIDAQ Tools MX

Learn More