substring count in string

Hi all,

I am facing a small challenge checking strings with Igor.

I tried to build a function to count all occurrences of a substring (motif) within a string (sequence) using stringmatch.

My string is about 8000 characters long, and I have more than one substring (actually hundreds) to find.

I have to add the beginning of the string to the end as well, so don't wonder why I create the "search_string"

One run takes approximately 30s - do you have any ideas to speed this up?

 

function find_motif(sequence, motif)
string sequence, motif
string search_string = sequence+sequence[0,strlen(motif)-1]
variable motif_count = 0
variable i = 0

for(i=0; i<(strlen(sequence)-1); i+=1)
    if(stringmatch(search_string[i,(i+strlen(motif)-1)], motif))
        //add warning if motif in last part
        if(i>=(strlen(sequence)-strlen(motif)+1))
            print "motif found in end_to_start region"
        endif
        motif_count += 1
        endif
endfor


return motif_count
end

 

Thanks for your help!

 

Best,

Fabian

not tested, but precalculating the string lengths outside the loop should help.

StrSearch is probably also much faster than manually stepping through the whole string one character at a time.

 

If you want to make it complicated you could even multithread several StrSearches with different starting points.

thanks to both of you!

string length outside the loops did not improve the speed.

StrSearch had a drastic impact of a factor of 100 :-)

 

If you're searching through DNA/RNA sequences, you might look at the FindSequence operation and the associated demo experiment.