# I want to split a column of data into stastically significant groups

#### catbelize

##### New Member
I want to split a column of data into stastically different groups

I have a column of average annual temperature data. According to theory, whether the temperature is very cold or very hot (or moderate) it will have an effect on a species that I am interested in.
So what I want to know, is if there is a way to cluster the column of data into two or three different groups, depending on the difference from the mean of the overall group? This would allow me to test the theory that all 'hot years' would elicit an expected response from the species in question.
I have looked at different cluster analyses and also at anova, but (and my understanding is limited here) does this not require multiple columns of data (i just have one)?

Last edited:

#### jhartsho

##### New Member
The number of columns has a lot to do with the program you're using. That's simply how your data is organized based on how a program wants it (i.e. In R you would have a single column with response and a column with your explanatory variable levels, in SigmaPlot you would have a separate column for each level of explanatory variable). Do you have an excel or something with your data? Is your response variable continuous, categorical etc...?

#### catbelize

##### New Member
my data is in excel and I intend to use minitab, spss or matlab for the analysis. There is no response variable as such, as what I would like to do is group the annual temperature values depending on how different they are from the group as a whole. Perhaps that is over simplifying things? There are (continuous) species distributions associated with each annual temperature, and what I really want to see is: if the years are separated out into groups/clusters, will the distributions for each of the groups be similar to one another.

#### jhartsho

##### New Member
Typically when you're working with phenological data you are working with degree days: (Tmax + Tmin)/2 and then subtract your base temperature. This way you can determine the insect's development (and distribution, assuming all other things like host material are equal) based on cumulative heat units. There are programs designed specifically to calculate degree day models. I don't think I've ever read any literature describing distributions simply by hot/cold type of data - unless it's speculative. Typically it's a temperature model that allows you to say "this area is xx degrees cooler/hotter on average so we expect insect y to emerge so many days later/earlier compared to this range". Unfortunately, most insects do not have a linear relationship with temperature so simply splitting it into groups doesn't really work.