Rearranging vertical data based on string labels

function subsetDataExample() Make/T labels={"a","a","b","a","b","b"} Make values={1.2,1.8,1.5,2.4,1.9,2.6} FindDuplicates/RT=UniqueLabels labels Make/O/N=(numpnts(labels))/FREE LabelsCRC=StringCRC(0,labels) Make/O/N=(numpnts(UniqueLabels))/FREE UniqueLabelsCRC=StringCRC(0,UniqueLabels) //Loop through the devices (labels) and put the matching data in waves with the label name variable i=0 variable imax=numpnts(UniqueLabels) string Wname="" for(i=0;i<imax;i+=1) Wname=UniqueLabels[i] Extract values, $wname, LabelsCRC==UniqueLabelsCRC[i] endfor KillWaves UniqueLabels end

jjweimer

I cannot think of an implicit loop that would replace the explicit for-loop. The only starting point that I can call up that is different is to sort the letter wave with a dependency on the value wave. This puts the data in "blocks" rather than in random order.

January 20, 2020 at 02:04 pm - Permalink

ajleenheer

I'm open to sorting the data before processing, and in fact I do have a Sort in place beforehand for my more complex data set could put everything in blocks. I don't think Extract cares if they are ordered in blocks for my described approach, but if the sort would enable a simpler approach (like FindLevels to determine where the CRC-coded labels change value?) then let me know!

January 20, 2020 at 03:43 pm - Permalink

Igor

My approach would be to convert the strings into numbers. CRC or any other hash would make it simple. You follow that with a Sort/R for both the converted text wave and your numerical wave. Then use Something like EdgeStats to figure out the boundaries of the groups.

If EdgeStats is not efficient and if you know the number of groups you could sift the data using MatrixOP setNaNs() and then WaveTransform zapNaNs.

I hope this helps,

A.G.

January 20, 2020 at 05:02 pm - Permalink

aclight

I think AG's approach is the best you can get except for one part. After you've sorted the CRC values, use Extract to get the unique groups. Something like this (untested)

Extract sortedSourceWave, outWave, (p==0 || sortedSourceWave[p-1] != sortedSourceWave[p])

Depending on what you're going to do with the output, you might want to use the /INDX flag with Extract to get the row indexes with unique values instead of the values themselves.

January 20, 2020 at 05:43 pm - Permalink

Igor

... and obviously, when you get IP9 you could simply run the new TextHistogram operation.

January 20, 2020 at 05:45 pm - Permalink