Canonical Correspondence Analysis: how to interpret results

Hi, I am using Canonical Correspondence Analysis (CCA) to analyze phytolith abundances (similar to pollen) over environmental gradients. As I am new to CCA, I read some background info. The following section from explains how to look at the visualization of results (Buttigieg & Ramette, 2014):

How to relate samples and taxa to environmental gradients in a triplot
To assess the impact of the environmental gradients on samples or taxa, they should be projected orthogonally on the arrows representing the environmental gradients (samples: figure a, taxa: figure b ). Close to the arrowhead indicates strong representation, far away from the arrowhead (represented by dashed prolongations of the gradients in figures x a and b) indicates weak representation (Buttigieg & Ramette, 2014).

Figures a and b: how to relate samples (a) and taxa (b) to environmental gradients in a triplot? Samples (a) or taxa (b) are projected orthogonally on the arrows representing environmental gradients or their prolongations (dashed lines). Close to the arrowhead means strong representation, far from the arrowhead means weak representation (Buttigieg & Ramette, 2014).

My question relates to the dashed prolongation lines of the arrows (which represent environmental gradients): the direction of the arrow represents a stronger impact of that particular environmental gradient (Buttigieg & Ramette, 2014). But what when the orthogonal projection of a sample/ taxon ends up on the far end of a dashed prolongation line of such an arrow/ environmental gradient? Does that mean that this environmental gradient has a strong negative impact on the overall abundance of a sample/ abundance of an individual taxon?

Thank you in advance



TS Contributor
I am not familiar with CCA, but it has indeed some aspects in common to other dimensionality-reduction techniques which I do know. I will try to give you my two cents on the matter.

I think that your interpretation is essentially correct.

If a dashed prolongation intersects the vector representing an environmental variable far beyond the plot origin, that should be interpreted as a sort of negative correlation between a given unit of analysis (either sample or taxon in your example) and the variable represented by the vector. If the intersection takes place at the origin, this should indicate no correlation, while if the intersection takes place on the same side of the vector, this (as you said) indicates a positive correlation. Of course, the larger the distance between the intersection point and the origin (in either the directions), the larger the correlation (either positive or negative).

This worked example might help (LINK).

Hope this helps,
Hi, thank you for your response with helpful link. Since time is limited and work overwhelming, I posted on several locations. I was lucky enough to receive an answer from an absolute authority on the matter, in one of the authors of the source article. It confirms what you say. In order to share the answer, I pasted it here below:

Dear Serge,

Close: it suggests that that explanatory variable negatively correlates with the abundances of species (or whatever your response variables are) and the sites that are ordinated close to them. You can check this in your data to confirm it's accurate. One would expect to see low values of that explanatory variable in sites with high abundances of the species ordinated away from the direction of the vector.

Keep in mind, the CCA only shows you the variation in the response matrix that can be explained by the explanatory matrix. If the proportion of constrained inertia is low (<30%) , be careful how much you read into it.

Also, be careful when saying "impact " : correlation is not causation (you need some stronger evidence that the explanatory is directly responsible for the response variables). There could be another factor driving both the response and explanatory variables in the directions you observed.
Last edited:
Continuing this topic, I would like to ask the following: how can the strength of the explanatory variables ( = lengths of the arrows in the CCA triplot) be assessed? Do the result supply some numbers that relate to the strength of each explanatory variable?