# Thread: PCA for identifying bacteria species & concentration

1. ## PCA for identifying bacteria species & concentration

Hi all,

For an experiment I'm trying to see whether I can distinguish bacteria species and concentration using fluorescence, and I'm stuck on using Principal Component Analysis to do so.

I have:
- 3 bacteria species
- 4 concentrations per species
- 50 spectrum sweeps per concentration
- 88 wavelengths per spectrum (300-400nm, some values get lost in smoothing)

I'm trying to produce score plots similar to those in figure 5 and 6 from the paper that can be found here: http://naldc.nal.usda.gov/download/44518/PDF.

I'm completely new to PCA but did some reading and I think I have a very basic qualitative understanding of what happens (finding principal components, new orthogonal dimensions, along which the data makes more sense). However, I'm not sure how to make correct plots.

I use the matlab command
[coeff,score]=pca(A,'NumComponents',2);
and then plot the first column of 'score' on the x-axis and the 2nd column of 'score' on the y-axis. This way, I get a score plot of the data in matrix A. However, how to properly fill matrix A?

Until now, I put all the sweep results in separate rows, so that I got 3*4*50=600 rows (for all bacteria, concentrations and measurements) with 88 rows (for fluorescence intensity at the wavelength). This produced a plot with some clustering, but I can't shake the feeling that it's just wrong.

I find it a bit difficult to explain, but I hope someone understands and can help me out. If you need anything more, please ask!

Thanks for any help in advance,
Sven

2. ## Re: PCA for identifying bacteria species & concentration

I have PCA experience but not MATLAB experience.

Sounds like you are having doubts about how you have formatted your data, is that correct?

3. ## Re: PCA for identifying bacteria species & concentration

Hi,

Yeah, I think I can figure Matlab out (or have figured it out mostly already)
My main issue is data formatting / organizing. Right now I've just thrown everything together. However, in the paper mentioned they talk about using data as calibration/teaching data, and other as verification data. I can imagine that calibrating means finding a suitable set of PC's for one specific bacteria, but then I'd be confused as to how they've plotted 3 different species in one plot with just one set of PC's.

4. ## Re: PCA for identifying bacteria species & concentration

Hi Sven,

Yes, something wrong with your coding. In order to see the bacterial species as dots on the PCA, you have to code them as rows in your matrix.

The questions you need to ask yourself are :
- what is the measure you want to separate on the PCA : these are your samples, encoded in rows
- what are the factors used for differenciation : these are the variables, encoded in columns.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts