# Thread: How do I select a method of regression analysis - Allometry - Animal Growth - Gnu

1. ## How do I select a method of regression analysis - Allometry - Animal Growth - Gnu

I am analysing the growth of the wildebeest or gnu.
To do this I am comparing the changes in A: Body length (L), B: Girth (G) and C: Shoulder height (Sh) to Body mass (Mb). I have a sample of 42 animals, each measured once (after being culled). They range in body mass from foetuses of about 20kg to adult bulls of about 260kg.

I would like to see whether Sex and Season have an effect on the regression equations generated from my data. (I want to know if for example male and female girths change at the same rate during growth). In the end I should produce formulae that could be used to predict a wildebeests body mass from any of the other measurements or a combination of measurements, in that case I want to know if the same formula can be used for males and females.

I would also like to be able to use my data to test the predictive power of some equations already published on Wildebeest.

On visual inspection the data fit a power curve. (I plotted each of L,G and Sh against Body Mass).

Guided by what many students of allometry have done I plotted power curves by fitting straight lines to the data after I log transformed it. I then also plotted the 95% confidence intervals of these curves. I then compared the data in different groups (for example Male and Female) by plotting the male data with 95% Confidence intervals, then plotting the female data on the same set of axes. If most of the female data fell within the 95% Confidence intervals of the male data I concluded that the data sets were not different.

I used these simple methods because I didn't think that my data lent themselves to more formal analysis.

The more I read the less I like my methods. I now question the validity of my original regressions because:

1. I do not believe that any of the variables are normally distributed in the population. (In a natural population of animals of all ages there will not be more of average body mass, girth or length than animals of extreme values of those measurements)

2. I have measured both the dependant and independent variables so they all have some error

3. I did not select the individuals randomly from the population but rather tried (not very successfully) to have the same number of animals from each body mass class.

4. The residuals will be dependant on the value of X. (The foetuses weighing 20 kg will have a variation in girth of only a few cm, whereas the natural variation in girth of adults weighing 200kg will be far more).

5. The variables definitely affect one another. Mass, Girth, Length and Height are all related. Sex and Season are not.

I find that my data do not lend themselves to simple analysis. If anyone can recommend a new starting point for me. A better method of regression analysis to begin with I would be very grateful. Am I possibly mistake? Will I still be able to make useful predictions using the methods that I have already used, if so how can I improve them.

My supervisor suggests that I use the t-test to compare the measured body masses with those predicted by the East African equations, or the Male equation on the Female data to see whether the two sets of Data are significantly different. To my mind that is a totally inappropriate test to use on these data. Is that so?

I have looked at beginning with multiple linear regression but the same constraints as for simple linear regression will apply.

I looked at Generalized Linear Regression (mainly because it was recommended to me) but I can't even begin to understand how to choose which parameters to select for my data.

Have a Great Day

Gnu

2. ## Re: How do I select a method of regression analysis - Allometry - Animal Growth - Gnu

1. I do not believe that any of the variables are normally distributed in the population. (In a natural population of animals of all ages there will not be more of average body mass, girth or length than animals of extreme values of those measurements)
It does not matter if the population or the sample is normally distributed. It matters if the residuals are. You should run them and see if they are normally distributed before you reject a method for this. Also normal distributions are not so critical with a large sample size because of the central limit theory. You can transform the data to be normal often as well.

2. I have measured both the dependant and independent variables so they all have some error
That is particularly critical with the IV, the error in the DV will end up in the error term. In practice I would suspect this occurs with most research. I don't think you are going to find a method that somehow addresses this type of error. You can do outlier analysis and then consider whether this indicates measurement error in a case [although you won't know for sure if it is]. You can perform sensitivity analysis and determine what would happen if the error is say ten percent off. But if there is measurement error there is. Other than not doing non-exploratory statistics I don't think there are many options here.

3. I did not select the individuals randomly from the population but rather tried (not very successfully) to have the same number of animals from each body mass class.
Then you can't generalize to a larger population. You can essentially do a case study and note the limitation of this design. I personally never found this approach very satisfying, but it is common in analysis. You could suggest follow up research using a random sample.

4. The residuals will be dependant on the value of X. (The foetuses weighing 20 kg will have a variation in girth of only a few cm, whereas the natural variation in girth of adults weighing 200kg will be far more).
This is a violation of the regression assumption of independence. It is in fact common and causes heteroskedacity. One solution is weighted least squares. You should look up heteroskedacity for other solutions.

5. The variables definitely affect one another. Mass, Girth, Length and Height are all related. Sex and Season are not.
It does not matter if the IV influence each other or not [commonly they will with real data]. It matters if you have multicolinearity which you can test for with various tests. There are disagnostic packages for this on all commerical software. If you do have multicolinearity there are no easy solutions. You can end up getting stuck with a good overall model, but not being able to speak to what individual variables do.

The real key here is to run various diagnostics and see what problems you actually have. I would think all you can do given data limits is do exploratory analysis. Your problem is at heart your design - how you gathered the data and the relationship of variables and no statistics addresses that. At least that is my opinion (other more expert commentators may have other solutions).

How is your dependent variable measured?

3. ## The Following User Says Thank You to noetsi For This Useful Post:

Gnu (03-17-2015)

4. ## Re: How do I select a method of regression analysis - Allometry - Animal Growth - Gnu

1.
It does not matter if the population or the sample is normally distributed. It matters if the residuals are.
Normallity: I will have a look at the distribution of my residuals. In theory they should be Normally distributed.
2.
Other than not doing non-exploratory statistics I don't think there are many options here.
Errors: Thanks, I'll just have to work with what I have
3. As you say many of my problems are due to the limitations of my study design. I can at least justify my study design. Randomly selecting 40 wildebeest from a game reserve would be difficult, probably much easier if my sample size was bigger.
4.
It matters if you have multicolinearity
I will test for multicolinearity now and report on it.

How is your dependent variable measured?
My dependant variable is always Body Mass (Mb) in kg. Measured with a spring balance accurate to 500g. Animals less than 200kg were weighed whole, those more than 200kg were dissected and then weighed.

My supervisor suggests that I use the t-test to compare the measured body masses with those predicted by the East African equations, or the Male equation on the Female data to see whether the two sets of Data are significantly different. To my mind that is a totally inappropriate test to use on these data. Is that so?
Can anyone help with this T-test question? Should I rather open another thread for it?

Thank you all for reading this post.

Gnu

 Tweet