Replacestring for a text wave

Is there an efficient way to replace string entries in a text wave. For instance: I have a text wave like the one below where I want to replace all the "PSI " entries with"Psi".
make /n=10 /t TxtWave
TxtWave[0,4]="Psi"
TxtWave[5,9]="PSI "
//In the form of replacestring
Replace4txtWave("PSI ",TxtWave,"Psi")
function Replace4txtWave(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable i
    for(i=0;i<numpnts(TxtWave);i+=1)
        if(cmpstr(replaceThisStr,TxtWave[i])==0)
            TxtWave[i]=withThisStr
        endif
    endfor
end

Some of my text waves are long and my Replace4txtWave function is slowing my code down quite a bit.

The following command works for your example. I haven't tested whether it executes faster, but it avoids your explicit loop.
TxtWave = SelectString( cmpstr(TxtWave[p],"PSI")==0, "Psi", TxtWave[p])

Note that you can't use the  <expression> ? <TRUE> : <FALSE> construction for strings.
I haven't tested this but, as a first pass at improving the performance, you could try replacing the for loop with the following

TxtWave[] = cmpstr(replaceThisStr, TxtWave[p]==0) ? withThisStr : TxtWave[p]
You should move the numpnts(txtWave) out of the loop, as this called on every loop iteration.
What are your typical text waves sizes and do you know the ratio between matching entries and non matching entries?
I would use FindValue in a do-while loop. Use the V_Value output of the FindValue operation to specify where to start searching again. This operation has the advantage of being case sensitive, etc.
With this version replaceThisStr does not have to be a whole wave element (although you can make it do that by changing the TXOP flag). It's efficient because FindValue does all the searching outside of a loop.

function Replace4txtWave(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable V_Value = 0
    do
        //Look at documentation for what TXOP does.
        FindValue/S=V_Value /Text=replaceThisStr/TXOP=1/z TxtWave
        if (V_Value == -1)
            break
        endif
        TxtWave[V_Value] = replacestring(replaceThisStr, TxtWave[V_Value], withThisStr)    
    while(1)
end
Thanks a bunch; I appreciate everyone's suggestions! The text waves that I'm operating on are typically 200,000 + points which inherently makes everything slow. I timed every one of the suggested alternatives. However each one (including my original for..loop based Replace4txtWave() took the same amount of time (on a PC). I suspect that I will have to look into an alternative to having such a large text wave (I'm stuck with however several non-text waves of similar size).


@mhuber: Can you post the benchmarking code? I'm really really suprised that all solutions take the same amount of time.
Maybe it is faster to recreate the text wave with the changed entries.
The "programmer notes" in
displayhelptopic "text waves"
indicate a bottle neck in the memory management for text waves (which is hard/expensive to avoid).
HJ
Here is some code I used to test the different approaches.

Function test(waveSize)
    Variable waveSize
   
    String targetString = "PSI"
    String replacementString = "Psi"
   
    Make/N=(waveSize)/T/FREE/O originalTextWave
    originalTextWave[0,(waveSize/2) - 1] = replacementString
    originalTextWave[(waveSize/2), *] = targetString

       
    Variable method
    For (method = 0; method < 4; method +=1)
        Duplicate/O/FREE originalTextWave, testTextWave
       
        Variable start = StopMSTimer(-2)
        Switch (method)
            case 0:     // Original method
                Replace4txtWave(targetString, testTextWave, replacementString)
                break;
               
            case 1:     // SelectString method
                Replace4txtWave1(targetString, testTextWave, replacementString)
                break;
               
            case 2:     // FindValue method
                Replace4txtWave2(targetString, testTextWave, replacementString)
                break;
       
            case 3:     // MultiThread text wave assignment method
#if IgorVersion() >= 7 
                Replace4txtWave3(targetString, testTextWave, replacementString)
#else
                print "Method 3 requires Igor 7."
#endif
                break;
               
        EndSwitch
        printf "Execution took %g ms using method %d.\r", (StopMSTimer(-2) - start)/1e3, method
    EndFor
End

function Replace4txtWave(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable i
    Variable waveNumPnts = numpnts(TxtWave)
    for(i=0;i<waveNumPnts;i+=1)
        if(cmpstr(replaceThisStr,TxtWave[i])==0)
            TxtWave[i]=withThisStr
        endif
    endfor
end

function Replace4txtWave1(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    TxtWave = SelectString( cmpstr(TxtWave[p],replaceThisStr)==0, TxtWave[p], withThisStr)
end

function Replace4txtWave2(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable startPoint = 0
    Variable waveNumPnts = numpnts(TxtWave)
    do
        FindValue/S=(startPoint)/TEXT=replaceThisStr/TXOP=4 TxtWave
        if (V_value == -1)  // Value not found
            break
        endif
       
        TxtWave[V_value]=withThisStr
        startPoint = V_value + 1
    while (startPoint < waveNumPnts)
end

#if IgorVersion() >= 7
function Replace4txtWave3(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
   
    // MultiThread with a text wave requires Igor 7 built 21Feb2015 or later.
    MultiThread TxtWave = Replace4txtWave3_worker(TxtWave[p], replaceThisStr, withThisStr)
end

ThreadSafe Function/S Replace4txtWave3_worker(actualString, replaceThisStr, withThisStr)
    String actualString
    string replaceThisStr
    string withThisStr
   
    if(cmpstr(replaceThisStr,actualString)==0)
        return withThisStr
    endif
    return actualString     // No replacement necessary
End
#endif          // IgorVersion() >= 7


Here are the results:
Igor 6
test(200000) Execution took 341.875 ms using method 0. Execution took 383.267 ms using method 1. Execution took 330.678 ms using method 2. Method 3 requires Igor 7. Execution took 15.8403 ms using method 3.

Igor 7
test(200000) Execution took 108.052 ms using method 0. Execution took 220.391 ms using method 1. Execution took 328.766 ms using method 2. Execution took 88.5857 ms using method 3.

If any Igor 7 preview testers want to reproduce these results, you must use the latest build dated 21Feb2015 or later in order to use method 3.

For even larger waves, method 3 performs even better relative to the other methods.

One note--If you're using Igor 6, the following three lines take longer to execute than the rest of the code:
Make/N=(waveSize)/T/FREE/O originalTextWave
originalTextWave[0,(waveSize/2) - 1] = replacementString
originalTextWave[(waveSize/2), *] = targetString


Igor 7 has an optimization for text waves that apparently makes the creation of the text wave much faster.
I came up with a few additional methods that could be used for this task, and one of them is substantially faster than the previous methods.

Here is the test code:
Function test(waveSize)
    Variable waveSize
 
    String targetString = "PSI"
    String replacementString = "Psi"
 
    Make/N=(waveSize)/T/FREE/O originalTextWave
    originalTextWave[0,(waveSize/2) - 1] = replacementString
    originalTextWave[(waveSize/2), *] = targetString
 
 
    Variable method
    For (method = 0; method < 6; method +=1)
        Duplicate/O/FREE originalTextWave, testTextWave
       
//      print "before:", testTextWave
 
        Variable start = StopMSTimer(-2)
        Switch (method)
            case 0:     // Original method
                Replace4txtWave(targetString, testTextWave, replacementString)
                break;
 
            case 1:     // SelectString method
                Replace4txtWave1(targetString, testTextWave, replacementString)
                break;
 
            case 2:     // FindValue method
                Replace4txtWave2(targetString, testTextWave, replacementString)
                break;
 
            case 3:     // MultiThread text wave assignment method
#if IgorVersion() >= 7 
                Replace4txtWave3(targetString, testTextWave, replacementString)
#else
                print "Method 3 requires Igor 7."
#endif
                break;
               
            case 4:     // Extract method
                Replace4txtWave4(targetString, testTextWave, replacementString)
                break;
               
            case 5:     // Extract method
                Replace4txtWave5(targetString, testTextWave, replacementString)
                break;
 
        EndSwitch
//      print "after:", testTextWave
       
        printf "Execution took %g ms using method %d.\r", (StopMSTimer(-2) - start)/1e3, method
       
    EndFor
End
 
function Replace4txtWave(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable i
    Variable waveNumPnts = numpnts(TxtWave)
    for(i=0;i<waveNumPnts;i+=1)
        if(cmpstr(replaceThisStr,TxtWave[i])==0)
            TxtWave[i]=withThisStr
        endif
    endfor
end
 
function Replace4txtWave1(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    TxtWave = SelectString( cmpstr(TxtWave[p],replaceThisStr)==0, TxtWave[p], withThisStr)
end
 
function Replace4txtWave2(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
    variable startPoint = 0
    Variable waveNumPnts = numpnts(TxtWave)
    do
        FindValue/S=(startPoint)/TEXT=replaceThisStr/TXOP=4 TxtWave
        if (V_value == -1)  // Value not found
            break
        endif
 
        TxtWave[V_value]=withThisStr
        startPoint = V_value + 1
    while (startPoint < waveNumPnts)
end
 
#if IgorVersion() >= 7
function Replace4txtWave3(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
 
    // MultiThread with a text wave requires Igor 7 built 21Feb2015 or later.
    MultiThread TxtWave = Replace4txtWave3_worker(TxtWave[p], replaceThisStr, withThisStr)
end
 
ThreadSafe Function/S Replace4txtWave3_worker(actualString, replaceThisStr, withThisStr)
    String actualString
    string replaceThisStr
    string withThisStr
 
    if(cmpstr(replaceThisStr,actualString)==0)
        return withThisStr
    endif
    return actualString     // No replacement necessary
End
#endif          // IgorVersion() >= 7

function Replace4txtWave4(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
   
    Extract/FREE/O/INDX/T TxtWave, ExtractedWave, (cmpstr(replaceThisStr,TxtWave[p])==0)
   
   
    variable i
    Variable numToReplace = numpnts(ExtractedWave)
    for(i=0;i<numToReplace;i+=1)
        TxtWave[ExtractedWave[i]]=withThisStr
    endfor
end

function Replace4txtWave5(replaceThisStr, TxtWave, withThisStr)
    string replaceThisStr
    wave /t TxtWave
    string withThisStr
   
    Grep/INDX/Q/E="^" + replaceThisStr + "$" TxtWave
    WAVE W_Index
       
    variable i
    Variable numToReplace = numpnts(W_Index)
    for(i=0;i<numToReplace;i+=1)
        TxtWave[W_Index[i]]=withThisStr
    endfor
end


Here are results on my Windows machine (the previous test results were on my Macintosh machine):
Igor 6
•test(200000) Execution took 246.57 ms using method 0. Execution took 274.317 ms using method 1. Execution took 258.047 ms using method 2. Method 3 requires Igor 7. Execution took 0.846398 ms using method 3. Execution took 272.347 ms using method 4. Execution took 180.707 ms using method 5.

Igor 7
•test(200000) Execution took 104.597 ms using method 0. Execution took 87.317 ms using method 1. Execution took 219.459 ms using method 2. Execution took 47.842 ms using method 3. Execution took 125.289 ms using method 4. Execution took 166.079 ms using method 5.
Thanks for sharing the code aclight.

In that special case here one can speed up the initial wave assignment with preallocated storage
    Make/N=(waveSize)/T=(strlen(targetString))/FREE/O originalTextWave

Nice to see IP7 perform so quickly with the new text wave memory layout mentioned in [1]. Looking at your code the new memory layout seems also to be the default in IP7.

I'd extend the grep code as follows
    Grep/INDX/Q/E="^\\Q" + replaceThisStr + "\\E$" TxtWave

so that any funny characters in replaceThisStr are taken literally.

[1]: http://wavemetrics.com/search/viewmlid.php?mid=27741
thomas_braun wrote:
Thanks for sharing the code aclight.

In that special case here one can speed up the initial wave assignment with preallocated storage
    Make/N=(waveSize)/T=(strlen(targetString))/FREE/O originalTextWave


Using Igor 6, using /T=size does decrease the time it takes to execute the Make statement that creates the originalTextWave wave, but it doesn't change the performance of the text replacement by more than a few ms one way or another.

Using Igor 7, the text wave is created quickly whether or not /T=size is used. However, if it's created using /T=size, the actual replacement methods execute slower. In particular, the Igor 7 only method 3, that uses a MultiThread wave assignment statement, is slower because the MultiThread keyword is ignored if the target of a text wave assignment stores the text data in contiguous bytes instead as an array of pointers. Using /T=size forces the wave's text data to be stored as contiguous bytes. So I don't recommend it's use in this situation unless the real-life bottleneck is in the initial creation of the wave and not in the replacement.

Here are the timings on Macintosh using Igor 7:
// Line 8 is: Make/N=(waveSize)/T/FREE/O originalTextWave •test(200000) Execution took 112.238 ms using method 0. Execution took 77.6911 ms using method 1. Execution took 344.949 ms using method 2. Execution took 72.5701 ms using method 3. Execution took 109.374 ms using method 4. Execution took 143.778 ms using method 5. // Line 8 is: Make/N=(waveSize)/T=(strlen(targetString))/FREE/O originalTextWave •test(200000) Execution took 197.69 ms using method 0. Execution took 168.698 ms using method 1. Execution took 443.477 ms using method 2. Execution took 287.046 ms using method 3. Execution took 205.962 ms using method 4. Execution took 181.9 ms using method 5.


thomas_braun wrote:


I'd extend the grep code as follows
    Grep/INDX/Q/E="^\\Q" + replaceThisStr + "\\E$" TxtWave

so that any funny characters in replaceThisStr are taken literally.


Yes, that's a good idea when creating a regular expression from user input.