# Need help on structuring programming problem... Dear all,

I am trying to implement a program that for testing purposes should generate synthetic data.
Generally, I have a certain value X, which I want to emulate with a system that has only a predefined set of possible values (these be A,B,C,D).
So A,B,C or D can be chosen to emulate X or come as close as possible to X in the restrictions of the system.

For creating synthetic data, I now want to program a small macro that "decides" on the outcome of a simulated experiment using the probabilities extracted from this data.

For example, I have the following data:

Value: counts/1000:
System 1:
X 12.03
System: 2
A 11.91
B 11.91
C 10.54
D 15.23

I now would have to formulate this input into probabilities, like this:

- A is very close to X, so it should get the highest probability to be chosen to stand for X in the new system
- B is exactly as close to X as A, so A and B should have the same probability to be chosen as best suited replacement for X
- C and D deviate (to different) extent from X, so this should be considered when calculating their probabilities to be chosen with respect to A and B
- probabilities of A,B,C,D should sum to 1

How would you implement this? I'm not asking for the exact code, just the general idea how to solve this problem.

Really would appreciate your input here, thanks a lot in advance for any help!

Regards,
Peter
Interesting. I think you have a matrix algebra problem here.

* Take fc_i = abs(T - S_i)/T as the metric of "closeness", where T is the target and S_i is the signal
* Eliminate duplicate values of fc_i
* Generate the equation sum ac_i * fc_i = 1, which in matrix notation becomes Ac * Fc = I
* Solve for the coefficient terms in Ac

For example, the first Fc term in your set is fc_A = (12.03 - 11.91)/12.03

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
jjweimer wrote:
Interesting. I think you have a matrix algebra problem here.

* Take fc_i = abs(T - S_i)/T as the metric of "closeness", where T is the target and S_i is the signal
* Eliminate duplicate values of fc_i
* Generate the equation sum ac_i * fc_i = 1, which in matrix notation becomes Ac * Fc = I
* Solve for the coefficient terms in Ac

For example, the first Fc term in your set is fc_A = (12.03 - 11.91)/12.03

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville

Thanks al lot for the quick reply :)
Sounds promising, will try to implement this.

But would elimination of duplicates not result in neglecting A (or B) because they have the same closeness?

Is there a Command in IGOR to solve for coefficients? (I am not very fond of matrix algebra ^^)
How would the Matrix look like in the case of the example?

Another big question mark for me is at the moment: Considering I have the probabilities calculated correctly, how do I decide on which af the replacements is chosen depending on their probability?
I know there has to be randomization process in the code somewhere, but I cannot figure out at the moment how exactly it should be done.

Regards,
Peter

If your probabilities are something like 0.4, 0.4, 0.15, 0.05 (which add up to 1) you might have something like
Variable rn = enoise(0.5)+0.5       // make a "random" number between 0 and 1.0
if (rn < 0.4)
(choose A)
elseif (rn < 0.8)
(choose B)
elseif (rn < 0.95)
(choose C)
else
(choose D)
endif

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Thanks for the input! :)

Using your proposed structure, I have meanwhile managed to calculate the "closeness" as proposed by jjweimer.

For X=12.03 they are (same example as above):
A = 73.39 %
B = 87.61 %
C = 99.00 %
D = 99.00 %

The appropriate probabilities I calculated so far are:

A = 0.204445
B = 0.244038
C = 0.275758
D = 0.275758

The question I have now is: how to deal with the Values that have the exact same probability..? And the probabilities sum up to 1, but they are so close to each other that I have the feeling that choosing a random number between 0 and 1 would not make the correct decision...

Any ideas?

PeterR wrote:

The question I have now is: how to deal with the Values that have the exact same probability..? And the probabilities sum up to 1, but they are so close to each other that I have the feeling that choosing a random number between 0 and 1 would not make the correct decision...

Any ideas?

Notice that in John's example the if...elseif... chain contains cumulative probabilities. Hence a random number between 0 and 1 can be used to provide a sensible bias in the decision.
PeterR wrote:

The appropriate probabilities I calculated so far are:

A = 0.204445
B = 0.244038
C = 0.275758
D = 0.275758

Methinks something is not right here - you wanted A and B to have the highest probabilities.

I would suggest something like the following:
1. Calculate the absolute difference of each value from the target X:
D_A = abs (A - X) , and similarly for B, C & D
2. Sum these:
Sum = D_A + D_B + D_C + D_D
3. Calculate a probability based on these differences:
P_A = (1 - D_A / Sum) / (N - 1) , and similarly for B, C & D
where N = number of values (4 in this case)
4. Construct a decision function along the lines of:
if (rn < P_A)
(choose A)
elseif (rn < P_A + P_B)
(choose B)
elseif (rn < P_A + P_B + P_C)
(choose C)
else
(choose D)
endif

For the values you provided, the probabilities are (approximately):
P_A = 0.325
P_B = 0.325
P_C = 0.233
P_D = 0.117

HTH,

Kurt

PeterR wrote:

But I still wonder whether A and B are really treated equal in this decision making if-construct...?

Hi Peter,

Perhaps thinking of it like this will help:

The random number rn lies between 0 and 1, and have a uniform distribution. This means that, for example, the probability of 0.2 <= rn < 0.3 has the same as the probability of 0.5 <= rn < 0.6, which has the same probability of 0.9 <= rn < 1.0, and so on.

The if...elseif... construct is basically saying
if 0.0 <= rn < 0.325 then do 'A'
if 0.325 <= rn < 0.650 then do 'B'
( and so on for C and D).
In other words, the 'range' of values of rn that will give rise to 'A' is the same as 'range' of values that will give rise to 'B'. Given the uniform probability of rn to have any value (within the 0 to 1 range), the outcomes 'A' and 'B' must have the same probability.

HTH,
Kurt
OK, that was really descriptive, I think now I can imagine what's going on ;)

I meanwhile calculated the correct probabilities, put them in a wave and sorted them with ` Sort`.

Variable random = enoise(0.5)+0.5       // make a "random" number between 0 and 1.0

String Substitution

Variable j
for(j=0;j<(numpnts(Pool));j+=1)
if (random < Probabilities_Sorted [j])
Substitution = Pool_Sorted [j]
break
elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1]))
Substitution = Pool_Sorted [j+1]
break
elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1] + Probabilities_Sorted [j+2]))
Substitution = Pool_Sorted [j+2]
break
else
Substitution = Pool_Sorted [j+3]
endif
endfor

The "pool" of possible values is made up of 4 values in this case.
In order to redesign the construct to achieve applicability for any size of pool, I think a Do-Loop must be applied...

Thanks for all your help, was a pleasure!
OK I think now I got it:

Variable j
for(j=0;j<(numpnts(Pool));j+=1,k+=1)
if (random < sum(Probabilities_Sorted,0,j)
Substituted = Pool_Sorted [j]
break
endif
endfor

Thanks again for all the help!

Best regards,
Peter
I have meanwhile during testing discovered a rather annoying behavior of the procedure: In the case when the values of system 1 and system 2 are exactly the same, their similarity becomes 0 (absolute difference between the two values minimized) and the calculated probability therefore is calculated to be 100 %, even when there are other values that are e.g. 80 % similar...

How can I circumvent this unwanted anomaly?
PeterR wrote:
I have meanwhile during testing discovered a rather annoying behavior of the procedure: In the case when the values of system 1 and system 2 are exactly the same, their similarity becomes 0 (absolute difference between the two values minimized) and the calculated probability therefore is calculated to be 100 %, even when there are other values that are e.g. 80 % similar...

How can I circumvent this unwanted anomaly?

I may be missing something here, but I can't see the code for where you have calculated the probabilities?
I have re-checked the algorithm I presented previously and changing the target to X=11.91 (i.e. the same as A and B) I get the following probabilities:
P_A = 0.3333
P_B = 0.3333
P_C = 0.2360
P_D = 0.0974

The question of whether this method for calculating the probabilities is appropriate for your needs is one I cannot answer.

HTH,
Kurt