Running combinations (e.g. assigning chemical formulas to molar masses)

Hi, 

I have a list of molecular masses (accurate to four places after decimal) to which I need to assign chemical formulas. Lets call this list A. I want to write a program that can do this. Basically, I will have a list of elements CHNOS with their molar masses (in list B), and my program needs to pick a combination of elements that gets me closest to the molecular masses in list A. 

I am wondering what is the best way to do this without involving a bunch of for loops going in circles to narrow down on the mass. Fundamentally, it is just about trying out different combinations of elements to get closest to the measured mass. Is there a library in IGOR that can perform this kind of a task? Or a simpler technique? 

As an idea, I am thinking whether writing something like C(x) + H(y) + N(z) + O(z1) + S(z2) = mass  where each of x,y,z,z1 and z2 are generated by random seed generator abs(enoise) could do this.  

Thanks a ton, 

Peeyush 

I don't know of a cunning function to do this, but if it is just a few calculations then running loops is not too bad.

For example, run the following with:

FindBestCombo(1000)
Function MakeCHNOS()
	Make/D/O/N=5 wAtomicMasses
	SetDimLabel 0,0,S,wAtomicMasses
	SetDimLabel 0,1,O,wAtomicMasses
	SetDimLabel 0,2,N,wAtomicMasses
	SetDimLabel 0,3,C,wAtomicMasses
	SetDimLabel 0,4,H,wAtomicMasses
	wAtomicMasses[%S] = 32.066
	wAtomicMasses[%O] = 15.9994
	wAtomicMasses[%N] = 14.00674
	wAtomicMasses[%C] = 12.0107
	wAtomicMasses[%H] = 1.00794
End

Function FindBestCombo(vMr)
	variable vMr // target Molecular mass
	MakeCHNOS()
	wave/D wAtomicMasses
	
	variable vMaxS, vMaxO, vMaxN, vMaxC
	
	variable vRows =1e5 // some large number
	
	Make/O/W/U/N=(vRows, 5) wCount
	SetDimLabel 1,0,S,wCount
	SetDimLabel 1,1,O,wCount
	SetDimLabel 1,2,N,wCount
	SetDimLabel 1,3,C,wCount
	SetDimLabel 1,4,H,wCount
	Make/O/D/N=(vRows) wResidual
	wResidual = NaN
	
	variable vRow = 0
	variable vResMassS, vResMassO, vResMassN, vResMassC
	variable vS, vO, vN, vC, vH
	
	vMaxS = ceil(vMr / wAtomicMasses[%S])
	for(vS = 0; vS < vMaxS; vS +=  1)
		vResMassS = vMr - vS * wAtomicMasses[%S]
		vMaxO = ceil( vResMassS / wAtomicMasses[%O] )
		for(vO = 0; vO < vMaxO; vO +=  1)
			vResMassO = vResMassS - vO * wAtomicMasses[%O]
			vMaxN = ceil( vResMassO / wAtomicMasses[%N] )
			for(vN = 0; vN < vMaxN; vN +=  1)
				vResMassN = vResMassO - vN * wAtomicMasses[%N]
				vMaxC = ceil( vResMassN / wAtomicMasses[%C] )
				for(vC = 0; vC < vMaxC; vC +=  1)
					vResMassC = vResMassN - vC * wAtomicMasses[%C]
					vH = round(vResMassC / wAtomicMasses[%H])
					wCount[vRow][%S] = vS
					wCount[vRow][%O] = vO
					wCount[vRow][%N] = vN
					wCount[vRow][%C] = vC
					wCount[vRow][%H] = vH
					wResidual[vRow] = vResMassC - vH * wAtomicMasses[%H]
					vRow += 1
					// add a load more rows if needed
					if (vRow >= DimSize(wResidual,0))
						InsertPoints DimSize(wResidual,0), vRows, wResidual
						InsertPoints DimSize(wResidual,0), vRows, wCount
					endif
				endfor
			endfor
		endfor
	endfor
	DeletePoints vRow,DimSize(wResidual,0)-vRow, wCount,wResidual
	Duplicate/O wResidual, wAbsRes, wIndexSort
	wAbsRes[] = abs(wResidual[p])
	MakeIndex wAbsRes, wIndexSort
	
	wCount[][] = wCount[wIndexSort[p]][q]
	wResidual[] = wResidual[wIndexSort[p]]
	Edit wCount.ld
	AppendToTable wResidual
End

EDIT: Deleted my nonsense comment about isotopes.

 

 

The problem seems to me to be prone to fail either by finding non-sense local minima (e.g. CH32 instead of CS) or by taking an inordinate amount of time through what amounts to about a 5^N search grid.

Have you considered generating the sets of possible combinations of masses in advance, sorting them by mass, and then doing a simple find level operation.

M         value

1            H
2            H2
12          C
13          CH
14          CH2
...
 

You can always continue to "improve upon" (expand) the M (numeric molar mass) and value (text) waves in a spreadsheet, import that sheet, and work with it in Igor Pro. I imagine this is how most library searches work -- Not by searching over a space using a random walk with enoise in a multi-parameter function fit but rather by searching on a manually pre-built library.

Thank you so much for these great suggestions. I'll try them out and see what works for me.. really appreciate the help!