Principal Component Analysis (PCA) | Igor Pro by WaveMetrics

Hello R.,

You may want to take a quick look at the PCA demo (File Menu->Example Experiments->Analysis->PCADemo). Note that Igor also has an ICA operation but the PCA is probably what you want.

A.G.

WaveMetrics, Inc.

Log in or register to post comments

July 19, 2018 at 01:17 pm - Permalink

rhjpires

Hey A.G.,

Thanks for the note. Yes, I did have a look at that example, and did attempt to run my data through it also.

Please have a look at the image attached.

What I did so far was the following:

I've input my data as shown on the top table. My components names, are the wave called Parameters, and waves wave1 until wave42 are my events, for which I have measured 18 parameters.
I called the PCA procedure and edited the procedure so that it would get my data.
Using the PCA Demo Control I run the PCA as specified in the procedure and I got the table in the lower right

To me, it seems I might have a lot of redundancy in my data. Is this a fair conclusion? However, I am still not sure how to connect the observation of more or less redundant parameters with the exact identification of which parameters they are. Could you give me some insight?

Many thanks!!

R.

Attachments PCA_Igor_01.jpg (1.17 MB)

Log in or register to post comments

July 20, 2018 at 07:55 am - Permalink

Igor

Hello R.,

If I only look at the eigenvalues, it seems that you have only two significant factors that together account for 99.8% of the variance.

I think it is important to note that even if you determine that there are only two important factors, these may not necessarily map to your input data. Since I am not familiar with the details of your application let me try and use an example from physics: suppose you have a bunch of waves containing xyz distribution in space and you compute the PCA and find that 99.85 of the variation is explained by two factors. That would imply that your distribution is pretty much planar. To determine the orientation of this plane in 3D space you need to look at the first two eigenvectors of the solution.

If you carry this analogy to your application, you might want to look at the eigenvectors and your inputs. In complicated cases it may be helpful to compute the projections (dot product) of your input vectors on the first two eigenvectors so you get a sense of what's meaningful and what is noise.

A.G.

Log in or register to post comments

July 20, 2018 at 04:24 pm - Permalink

rhjpires

Hi A.G.,

Many thanks for the input. Indeed, the events I included in this analysis are from one single "run" of measurements (i.e. they refer to a single subject being observed). Essentially, I am looking at the same subject at different instances, so the fact that I have 2 significant factors accounting for almost 99.85% of data variance, is likely to suggest that whenever subject is stimulated, a reaction tends to be fairly homogeneous with respect to most of the parameters analyzed. This is something I anticipated by looking at normalized distributions for each of the parameters. The situation will be different as I choose to feed PCA with averages of different runs of the experiment, where I include different subjects. So, to me, these preliminary results seem to be going the right way.

The part I am mostly struggling right now, is on converting this variance information, into to PCA scores for each of the parameters. Would you be able to help me on this part?

Many thanks,

Ricardo

Log in or register to post comments

July 23, 2018 at 12:15 am - Permalink

Igor

Hello Ricardo,

My approach is to think about the eigenvectors as a set of orthonormal vectors that span your data space. In your example, we determined based on the data that we only care about two eigenvectors (say e0 and e1). To represent your data using the two new axes you need to compute the dot product (or projection) of your parameter columns with the first two eigenvectors. For this to make sense you need to "standardize" your parameter columns by subtracting their mean and dividing by their standard deviation.

If you denote the standardized columns by P_i then you will be effectively creating a new representation of your data as P_i=c0i*e0 + c1i*e1 where the coefficients c0i and c1i are the projections of P_i on the respective eigenvectors.

You can use MatrixOP to perform these calculations. For example, to standardize a 1D parameter wave you can execute

MatrixOP/O P_0=normalize(subtractMean(w,1))

To calculate the projection:

MatrixOP/O c00=e0.P_0
MatrixOP/O c10=e1.P_0

which gives you {c00,c10} as the representation of the first parameter column in the new space.

I hope this helps,

A.G.

Log in or register to post comments

July 24, 2018 at 02:12 pm - Permalink