Grep match issue

Hello, 

I am using Grep to match some chemical formulas between two 1D text waves, and align 2 corresponding numeric waves. Kindly see the attached table (TablePK) for the text waves. I am noticing an interesting behavior. Grep matches a lot of formulas but skips some of them somehow. For example, it won't match CO even though CO exists in both waves. 

My syntax: 

for(i=0;i<dimsize(text2match,0);i+=1)

Grep/INDX/Q/E = text2match[i] referencetextwave

wave W_Index

MatchedData[W_Index[0]] = data2match[i]

endfor

 

The MatchedData wave appears with all NaN values in the row corresponding to "CO" in the text wave. Same at a few other places as well. I am not sure why this happens. Please advise.

Sincerely, 

Peeyush 

TablePK.pxp (27.45 KB)

If you use the debugger you will find that Grep does match, e.g., 'CO2plus2' with the input 'CO' instead of 'CO' itself. I am not fully sure why. But since you are just using the first match, and it seems to me that you rather want an exact match, why not use FindValue instead. Here is a start:

function compareTextWaves(wave/T matchThis, wave/T withThis)
    int i
    for(i=0;i<dimsize(matchThis,0);i+=1)
        FindValue/TEXT=(matchThis[i])/TXOP=4 withThis
        if (V_value > -1 && V_value < numpnts(withThis))
            print i, withThis[V_value]
        endif
    endfor
end

Sorry if I misunderstood the question. Could you give a more complete picture of the task you want to do, including the MatchData and data2match waves.

Hi chozo, 

Thanks a lot for the advice! Sure, I'll switch to FindValue/Text to accomplish the task. Thanks also for the code! 

Meanwhile, since I am indeed looking for exact matches, I am still a little perplexed why Grep doesn't match CO in one text wave to the CO in the other text wave.. I think the situation is same in a few other cases where the formula string contains only two letters.. 

It would be good to have some thoughts on this as well since I use Grep a lot and sometimes the waves are big enough that such an error can evade detection.. 

Sincerely, 

Peeyush 

Grep should work as well. Maybe you can try to restrict to the full expression and see how this goes:

Grep/INDX/Q/E=("^"+text2match[i]+"$") referencetextwave

 

As chozo wrote, Grep is matching all wave points that contain CO. You are saving only the first of those matches.

Here is a useful function that tells you how many points in a textwave w match a regular expression:

function IsInWave(wave/T w, string RegEx)
    Grep/E=RegEx/Q w
    return v_value
end

 

print IsInWave(SpeciesMassText_Acet, "^CO$")
  1
print IsInWave(SpeciesMassText_Acet, "CO")
  6

 

you can then use this function, for instance, to extract matches:

make/T toMatch = {"C", "CO", "N", "foo"}
extract SpeciesMassText_Acet, matches, IsInWave(toMatch, "^"+SpeciesMassText_Acet+"$")
print matches
  matches[0]= {"C","N","CO"}

 

Using the regex with Grep did the trick! 

Thanks a lot for the advice and all the very helpful info, chozo and tony. I highly appreciate it. :) 

Sincerely, 

Peeyush

good.

In case it wasn't clear, you can avoid loops completely if you use extract (with the /INDX flag if you need index values) together with the IsInWave function to find matches in the text2match (not data2match) wave.

// extract points of data2match that match one of the points in text2match
extract/INDX data2match, matches, IsInWave(text2match, "^"+data2match+"$")

Great, thanks for the advice, Tony. I hadn't used "IsInWave" until now, but this appears to be very useful. I'll stitch it into my codes moving forward. 

Sincerely,

Peeyush 

IsInWave is not a built-in function, it's the little wrapper for grep that I wrote in my post somewhere above.

Right, right.. sorry I mean I hadn't used something akin to your "IsInWave" function until now, but this is pretty useful. I'll include a function to this effect moving forward, especially since I really enjoy avoiding loops wherever I can..