I want to split a column of data into stastically significant groups

I want to split a column of data into stastically different groups

I have a column of average annual temperature data. According to theory, whether the temperature is very cold or very hot (or moderate) it will have an effect on a species that I am interested in.
So what I want to know, is if there is a way to cluster the column of data into two or three different groups, depending on the difference from the mean of the overall group? This would allow me to test the theory that all 'hot years' would elicit an expected response from the species in question.
I have looked at different cluster analyses and also at anova, but (and my understanding is limited here) does this not require multiple columns of data (i just have one)?

Many thanks in advance
Last edited:
The number of columns has a lot to do with the program you're using. That's simply how your data is organized based on how a program wants it (i.e. In R you would have a single column with response and a column with your explanatory variable levels, in SigmaPlot you would have a separate column for each level of explanatory variable). Do you have an excel or something with your data? Is your response variable continuous, categorical etc...?
my data is in excel and I intend to use minitab, spss or matlab for the analysis. There is no response variable as such, as what I would like to do is group the annual temperature values depending on how different they are from the group as a whole. Perhaps that is over simplifying things? There are (continuous) species distributions associated with each annual temperature, and what I really want to see is: if the years are separated out into groups/clusters, will the distributions for each of the groups be similar to one another.
Typically when you're working with phenological data you are working with degree days: (Tmax + Tmin)/2 and then subtract your base temperature. This way you can determine the insect's development (and distribution, assuming all other things like host material are equal) based on cumulative heat units. There are programs designed specifically to calculate degree day models. I don't think I've ever read any literature describing distributions simply by hot/cold type of data - unless it's speculative. Typically it's a temperature model that allows you to say "this area is xx degrees cooler/hotter on average so we expect insect y to emerge so many days later/earlier compared to this range". Unfortunately, most insects do not have a linear relationship with temperature so simply splitting it into groups doesn't really work.
jhartsho; thanks for your informed reply.
While I agree with you; I am not using degree days in this analysis; but rather a different measure of physiological time.
But that doesn't impact what I am trying to do either with the data mentioned above. Depending on whether you have a 'hot' or 'cold' year; the phenological response of the species will probably be very different. It could even alter the pattern or distribution in which the species is recorded from its initial first appearance. So; what I'd like to do is robustly split the temperature data into groups : hot and cold (or even include a third group) and check to see if the phenological characteristics of the species can be partly explained by this categorisation.

Many thanks