What is the difference between these two tests? Which one is more recommended for abundance data? In case of PCA, do I need to normalize data?

Thank you so much

- Thread starter renan.c.soares
- Start date

What is the difference between these two tests? Which one is more recommended for abundance data? In case of PCA, do I need to normalize data?

Thank you so much

First, I'd like to thank all help in this topic.

Let me detail a little my problem. I analyze microbial communities based on RFLP (Restriction polymorphism analysis) patterns. I'll call the patterns as species to facilitate our discussion, ok? So, my aim is to compare the microbial communities from different sites. Till the moment, I have nine samples each one with around 50 species.

For that, I tested MDS, PCA, ANOSIM (with some factors based on groups defined a priori), Clustering with Simprof. For MDS and Clustering, I used Bray-Curtis for resemblance matrix calculation. I think this one is more appropriate.

I did not test Correspondence Analysis. Thanks for your suggestion, gianmarco. I'll read about it.

As PCA is based on euclidian distance, I don't know which disturbs it may cause to analysis.

Thank you all once more!

Let me detail a little my problem. I analyze microbial communities based on RFLP (Restriction polymorphism analysis) patterns. I'll call the patterns as species to facilitate our discussion, ok? So, my aim is to compare the microbial communities from different sites. Till the moment, I have nine samples each one with around 50 species.

For that, I tested MDS, PCA, ANOSIM (with some factors based on groups defined a priori), Clustering with Simprof. For MDS and Clustering, I used Bray-Curtis for resemblance matrix calculation. I think this one is more appropriate.

I did not test Correspondence Analysis. Thanks for your suggestion, gianmarco. I'll read about it.

As PCA is based on euclidian distance, I don't know which disturbs it may cause to analysis.

Thank you all once more!

Last edited:

NMDS is usually best because it is based on ranks. If data are badly behaved, PCA can give really funny results especilly if the scales are very different in your varaibles. Also, PCA will consider samples with zeros as simialar to one another, whihc in a biological study doesn't always make sense.

Hi Bugman

I thought that abundance data (i.e., counts) were the very type of data that CA is capable to successfully "handle"....

At least, it should hold true as far as the aim of the analyst is to depict the deviation from independence in large crosstabulation and, for instance, to seek for groupings.

Besides, I have read that in mano circumstance the outcome of MDS and CA are quite similar.

I loop forward to know your point of view.

Regards

Gm

links:

1

2

I thought that abundance data (i.e., counts) were the very type of data that CA is capable to successfully "handle"....

At least, it should hold true as far as the aim of the analyst is to depict the deviation from independence in large crosstabulation and, for instance, to seek for groupings.

Besides, I have read that in mano circumstance the outcome of MDS and CA are quite similar.

I loop forward to know your point of view.

Regards

Gm

links:

1

2

Last edited:

thanks for your reply.

Ok! That was a problem I have heard about, and that poses problems in archaeological ordination (i.e., seriation) too.

Nevertheless, Prof. Greenacre has recently tackled the problem of rare objects (i.e. "species" in ecology/biology AFAIK) in some articles of his:

-Tying up the loose ends in simple correspondence analysis -link-

-The contributions of rare objects in correspondence analysis -link-

Hope this can be of some use,

regards

Gm

NMDS is usually best because it is based on ranks. If data are badly behaved, PCA can give really funny results especilly if the scales are very different in your varaibles. Also, PCA will consider samples with zeros as simialar to one another, whihc in a biological study doesn't always make sense.

This would be a problem in my samples, cause I have lots of double 0 when comparing two samples. Is this problem in PCA related with the fact it is based on Euclidian distances? If I use MDS with a resemblance matrix based on Bray-CUrtis distance, I would minimize this problem.

I use the PERMANOVA add-on to PRIMER and trhe labDSV package in R.

There is also a FORTRAN programme available see below.

http://www.stat.auckland.ac.nz/~mja/prog/PCO_UserNotes.pdf