FindDuplicates improvements for text waves

Apologies if this has been posted already. FindDuplicates is pretty handy, but it seems to be missing some features when used on text waves. 1. It would be nice if there was a flag to turn off case sensitivity. StringMatch is already case insensitive, and strsearch and cmpstr can be either, so adding this feature would make the behavior of FindDuplicates more consistent with the other functions. 2. It would also be nice if there was a text wave equivalent of the UN and UNC flags for numeric waves. (Also, a description of those flags isn't in either the pdf of the manual or in the help files in Igor).

johnweeks

Maybe you can use Extract?

•make/O/N=100/T junk
•junk[0,;2]="w"+num2str(p)
•junk[1,;2]="W"+num2str(p)
•Extract junk, lowercase, CmpStr((junk)[0,1], "w1", 1)==0
•print lowercase
  lowercase[0]= {"w10","w12","w14","w16","w18"}

Log in or register to post comments

August 27, 2019 at 09:40 am - Permalink

KZarzana

I should have mentioned that I have workarounds using lowerstr and upperstr, and adding the case sensitive flag is more about convenience and having nice compact code than not being able to do something.

Log in or register to post comments

August 27, 2019 at 02:12 pm - Permalink

thomas_braun

I second the request of KZarzana. Both would be nice improvements.

@KZarzana: What do the UN/UNC flags?

Log in or register to post comments

August 28, 2019 at 03:05 am - Permalink

KZarzana

The description of those flags isn't in the help but can be found here: https://www.wavemetrics.com/products/igorpro/newfeatures/whatsnew8

"Added /UN flag to FindDuplicates that generates a wave containing the unique numerical values in the input wave.

Added /UNC flag to FindDuplicates that generates a wave containing the count of occurrences in the input wave of each unique numerical value."

Log in or register to post comments

August 28, 2019 at 06:59 am - Permalink

thomas_braun

@KZarzana: Thx.

Log in or register to post comments

August 28, 2019 at 01:26 pm - Permalink

Igor

The conversion of arbitrary encoded text to upper/lower is not simple. If you happen to know that the contents of your wave are ascii characters it is simple enough for you to convert before calling FindDuplicates as mentioned above. I will add your request to the wish list.

A.G.

Log in or register to post comments

August 29, 2019 at 12:42 pm - Permalink

KZarzana

Great, thanks!

Log in or register to post comments

September 3, 2019 at 08:27 am - Permalink

Igor

FWIW, case insensitivity support was added for IP9.

A.G.

Log in or register to post comments

September 9, 2019 at 10:23 am - Permalink

hegedus

Hi,

I noticed today that if the input wave has a length of 1, it throws an error of insufficient number of points. Since I am am only looking for unique values, a wave with length 1 should return that one point. I have coded around it but it would be nice if findduplicate could handle a length of 1.

Andy

Log in or register to post comments

March 19, 2020 at 08:29 am - Permalink

Igor

Hi Andy,

I find it logically impossible to look for duplicates when you have less than two points in the wave.

A.G.

Log in or register to post comments

March 19, 2020 at 12:54 pm - Permalink

hegedus

Yes this is true. But I was just suggesting that the algorithm be robust so I do have to trap the incoming wave for that condition.

Andy

Log in or register to post comments

March 19, 2020 at 01:16 pm - Permalink

Igor

We will have to disagree on what it means to be "robust".

My programming philosophy is that I want an operation or function to return an error as soon as one is encountered. Otherwise you may find that something did not work 15 steps later and you would have to trace it all the way to an operation or function that silently returned a zero point wave or a NaN. In other words, sooner or later you need to implement some tests in your code. They could be before you use bad input data or after.

Log in or register to post comments

March 19, 2020 at 01:59 pm - Permalink

hegedus

Can we compromise and at least make a note in the documentation? It took some troubleshooting to figure out that that specific entry had only 1 point. I was scanning 6000+ input sets and it barfed in the middle.

Andy

Log in or register to post comments

March 19, 2020 at 02:04 pm - Permalink

Igor

It's always a good idea to improve the documentation. Please email me a specific suggestion.

A.G.

Log in or register to post comments

March 19, 2020 at 02:37 pm - Permalink