MDS and PCA doubt

#1
Hi. I have abundance data for biological species from different samples. I've been trying to study the relationship between the samples. I've tried nom-metrical MDS and PCA in Primer 6. My doubt is about the difference of this two tests. The results are similar (in terms of the grouped samples in space). I've read that PCA needs normally distributed data. In my data, there are species normally distributed and some that are not.

What is the difference between these two tests? Which one is more recommended for abundance data? In case of PCA, do I need to normalize data?

Thank you so much
 

bugman

Super Moderator
#2
NMDS compresses the relationships into low (2 or 3d ) space. PCA is a variance reduction proceedure with relies on euclidean distances and linear relationships between variables. NMDS is based on ranks and is usually the best option for noisy data. If you are looking at decribing relationships between sites and you want to avoid sahrd zeros, use NMDS. Given your brief description o fyour data and objectives, I would not hestiate using NMDs on Bray Curtis.
 
#6
First, I'd like to thank all help in this topic.

Let me detail a little my problem. I analyze microbial communities based on RFLP (Restriction polymorphism analysis) patterns. I'll call the patterns as species to facilitate our discussion, ok? So, my aim is to compare the microbial communities from different sites. Till the moment, I have nine samples each one with around 50 species.

For that, I tested MDS, PCA, ANOSIM (with some factors based on groups defined a priori), Clustering with Simprof. For MDS and Clustering, I used Bray-Curtis for resemblance matrix calculation. I think this one is more appropriate.

I did not test Correspondence Analysis. Thanks for your suggestion, gianmarco. I'll read about it.

As PCA is based on euclidian distance, I don't know which disturbs it may cause to analysis.

Thank you all once more!
 
Last edited:

bugman

Super Moderator
#10
I think you want to know what problems can happen with PCA as opposed to NMDS.

NMDS is usually best because it is based on ranks. If data are badly behaved, PCA can give really funny results especilly if the scales are very different in your varaibles. Also, PCA will consider samples with zeros as simialar to one another, whihc in a biological study doesn't always make sense.
 

gianmarco

TS Contributor
#11
Hi Bugman
I thought that abundance data (i.e., counts) were the very type of data that CA is capable to successfully "handle"....
At least, it should hold true as far as the aim of the analyst is to depict the deviation from independence in large crosstabulation and, for instance, to seek for groupings.
Besides, I have read that in mano circumstance the outcome of MDS and CA are quite similar.

I loop forward to know your point of view.

Regards
Gm

links:
1
2
 
Last edited:

bugman

Super Moderator
#12
GM,

I am not trying to put anyone the use of CA. But, for biological data, it has its problems. One is that it puts hgeavy emphasis on rare species in a given data set, which, depending on the number of samples can really skew the ordination.
 

gianmarco

TS Contributor
#13
Hi Bugman,
thanks for your reply.
Ok! That was a problem I have heard about, and that poses problems in archaeological ordination (i.e., seriation) too.
Nevertheless, Prof. Greenacre has recently tackled the problem of rare objects (i.e. "species" in ecology/biology AFAIK) in some articles of his:
-Tying up the loose ends in simple correspondence analysis -link-
-The contributions of rare objects in correspondence analysis -link-

Hope this can be of some use,
regards
Gm
 
#14
NMDS is usually best because it is based on ranks. If data are badly behaved, PCA can give really funny results especilly if the scales are very different in your varaibles. Also, PCA will consider samples with zeros as simialar to one another, whihc in a biological study doesn't always make sense.
This would be a problem in my samples, cause I have lots of double 0 when comparing two samples. Is this problem in PCA related with the fact it is based on Euclidian distances? If I use MDS with a resemblance matrix based on Bray-CUrtis distance, I would minimize this problem.
 

bugman

Super Moderator
#15
This would be a problem in my samples, cause I have lots of double 0 when comparing two samples. Is this problem in PCA related with the fact it is based on Euclidian distances? If I use MDS with a resemblance matrix based on Bray-CUrtis distance, I would minimize this problem.
Yes it is. Bray Curtis in NMDS will help solve this, but once you get to know other distance measures, it is always worth exploring different ordinations. PCoA (sometimes called Metric Dimensional Scaling) can be useful.