Creating a box plot

Hello! I am trying to create a box and whisker plot but am struggling to get it to display the data how I would like. I would like to create a box plot where text labels are from text column "Unit4" and data points are from "modeled_kT_DPB4". Not every data point from "modeled_kT_DPB4" has a "Unit4" label so I want to skip those points. Is that possible? Attaching a screenshot of what my data look like.

screen shot of data

Hi,

 

I would use Extract to create subsets of the waves where you have a label. A caveat is a null str will return a NaN, we can test for that with numtype and 2 being a NaN.

Extract unit4, unit4_sub, numtype(strlen(Unit4))!=2

Extract modeled_kT_DPB4, modeled_kT_DPB4_sub, numtype(strlen(Unit4))!=2

Create the box and whisker on the "sub" waves

Andy

Hi Andy,

When I do this and create a box plot I am still only getting 1 box that combines all of my modeled_kT data. How do I split it up to individual boxes based on Unit?

You'll want to make a multi-column wave for your data, so that each unit has its own column. That way each Unit column will become its own category in the plot

Hi,

Worked the problem a bit and wrote a little function.

It takes three inputs, the text wave with labels, the wave with the data, and a flag on what to do with labels that are empty,"", keep them as "Other" or ignore them.  It creates/uses a directory called "box" with the data waves and a corresponding label wave. In the dialog box select the data waves as waves to plot and the unique labels as the x wave.

function box_plot_input(wave/T LW, wave DW,variable keep)

    //LW is label wave with the names of groupings
    //Dw is data wave that need to get reconfigured
    //Keep is a flag as retain unlabelled entries as "other", 0=delete,1 change to other
   
    //Get number of entries
    variable index,maxindex
    DFREF Saved = getDataFolderDFR()
    Newdatafolder/S/O root:box
    findduplicates /RT=uniqueLabels lw
    maxindex=numpnts(uniquelabels)
    for(index=0;index<maxindex;index+=1)
        if(strlen(uniquelabels[index])!=0 || keep)
            extract/O DW,$"DW_"+num2str(index), stringmatch(LW,Uniquelabels[index])
            if(strlen(uniquelabels[index])==0)
                uniquelabels[index] = "Other"
            endif
        endif
    endfor
   
    setdataFolder saved
end

Andy

Thanks so much! I had to edit a little to save the data into a different folder but otherwise seems to work well. I really appreciate the help!

Kind of as a follow up to this...I am wanting to also calculate the lognormal mean of my data for each new DW. They seem to have a  lognormal distribution so I am using the attached equation. 

I am first creating a table...

Make/O/N=(numnts(uniquelabels)) MLEMean

Then trying to populate my table with this mean values by 

MLEMean[0]=1/numpnts(DW_1)*sum(ln(DW_1))

MLEMean[1]=1/numpnts(DW_2)*sum(ln(DW_2))

MLEMean[2]=1/numpnts(DW_3)*sum(ln(DW_3))

Etc...

However, Igor doesn't like that I have a function and not a wave variable within the sum brackets. Any good way around this?

Thanks!

The error is because the ln function expects a number, but if you put that expression in a wave assignment, each point in the wave will get passed to the ln function. You could do something like this:

Function/WAVE naturalLog(Wave dw)
    Make/FREE lnWave
    lnWave = ln(dw)  // operates on each point in the wave b/c of the assignment
    return lnWave
End

and then call:

MLEMean[0]=1/numpnts(DW_1)*sum(naturalLog(DW_1))

 

Yeah, the sum() function requires a wave, not a number returned from a function.

The MatrixOP operation offers (in this case) a more natural way to do this. Here is an example:

MatrixOP/O term1 = (1/numPoints(DW_1))*sumcols(ln(DW_1))

The peculiarity here is the "term1" is a one-point wave, not a variable. But it's easy to handle that. And MatrixOP is all about functions being applied to waves and returning waves. As the name implies, it is more oriented toward matrices; and that's why you need "sumcols" and not just "sum". You are working with a one-column matrix here :)

And my colleague AG here has pointed out that you can replace that expression with this:

MatrixOP averageCols(ln(DW_1))

 

In reply to by Ben Murphy-Baum

lnWave needs to be the same length as the input wave dw, methinks:

Function/WAVE naturalLog(Wave dw)
    Make/FREE/N=(DimSize(dw,0)) lnWave // assuming 1-D dw
    lnWave = ln(dw)  // operates on each point in the wave b/c of the assignment
    return lnWave
End