Fitting a Gaussian to one peak in a mass spectrum and calculating area


Need advice on a situation. I have mass spectra with many peaks. I am writing a code that has to do the following:

Step 1: Create a graph zoomed onto an individual peak that is centered at a specified x-axis value (mz value). Zoomed meaning that the x-axis of the graph is limited to showing the full peak centered at the specified mz. 

Step 2: Fit a Gaussian curve specifically to this peak. In other words, fitting a Gaussian to only what is shown in this x-axis window ignoring the rest of the spectrum. 

Step 3: Calculate the area under the Gaussian curve. 

I can do step 2 and step 3 by manually adding cursors on both sides of the peak and then fitting a Gaussian curve but I need to automate this via code and hence I am struggling a little bit with step 1. I would be grateful for some advice on how to do Step 1. 

Thanks a lot, 


Hmm... You did not mention which of these steps should be automated. How should step 1 happen? If manually, then you can just drag a marquee and zoom in manually, you know. Then, step 2 and 3 could also be done relatively easy with my Super Quick Fit package:

In the tool, set 'Fit Within Axis Range' in the settings, so that you do not need to place any cursors. The output of a Gaussian fit includes the area as well if you use the package.

If you want to automate things, it would be first good to know how step 1 should play out. Should the code find each peak automatically? => Maybe have a look at the peak finder from the Multipeak Fit (MPF) package. This package would also be a good way to completely automate the whole thing, i.e., run the peak finder, then fit the whole spectrum, then extract the areas. This of course depends on your spectrum, e.g., whether the regions between peaks are well defined.

If you want that the whole thing happens when, for example, an user clicks manually at the general position of the peak, then you can again, use the peak finder to single out the region where the found peak is closest to the user click and then run step 2 and 3 after that. Just use the start and end points of the limited region as fit range then.

Hi chozo, 

Thanks a lot for your thoughts. Indeed, I need to automate this whole process from Step 1-3. Basically, I am creating a panel where I will enter an m/z value and with a click of a button, I want the program to locate the peak centered (or nearest to this m/z value) in the mass spectra, fit a gaussian to it and calculate peak area under the gaussian curve. Then I also want the code to calculate peak resolution by dividing the maximum value of the peak by peak width at half the maximum value. 

Hence, for this purpose, I don't need the program to identify every peak in the spectra but rather just the peak nearest to the user specified m/z. 

I am going to be loading multiple HDF5 files into the program and asking it to perform this operation in an automated fashion. So as little manual intervention as possible. 



I guess then I would start either using the peak finder or some variation of FindPeak to first get the peak in question near the requested m/z. You can then restrict the fit range to an appropriate distance from the peak center (depending on how far apart your peaks are usually) and then automatically follow with a peak fit using a Gaussian. So, there are many ways to do step one, depending on how complicated your spectra are, but I think the peak finder might be a robust start, being able to find also overlapping peaks.

Fitting a peak to a subrange of your data is relatively simple, but it all depends on what your data looks like. For a mass spectrum you could have peaks at every mass, so you may want to only fit a +/- 0.5 mass range around the selected mass. To only show that range in your graph use SetAxis. You will probably also want to seriously constrain the peak width to reasonable values. If neighboring peaks begin to overlap things get tricky, so it really all depends on what your data looks like.

Are the data clean? That is, are there other peaks overlapping? If so, you need to fit them also so that you get the area for only the one peak. Is there a lot of noise? That can make automated peak finding difficult. Is there a baseline that's not just a Y offset? You may need to either subtract a baseline or fit it along with the peak.

Thank you all very much for your thoughts and ideas! I really appreciate it! I was able to build with these ideas.