well,one of the questions in my book wants us to "predict the litter size from the age of the breeding female at conception. from the following data would you say that this is a legitimate goal? support your explanation with statistical analysis" i have the answer in the book for this question, but if i didnt, i would not know how to go about answering this question, which is what i need help with. knowing what to do.
You are asking for a lot; I hope my little bit is helpful.
In a problem where they ask you to "predict <y variable> from <x variable>", then you should do a linear regression.
Having a linear regression, you can create a formula like y=2x+1. So then for a given X, you can determine what Y should be.
On the other hand, having a correlation, you cannot determine Y for given values of X. Instead, a correlation that is high (close to +1, or close to -1) can tell you that there is a strong linear dependence between the X and Y variables. ie a correlation can be used by you to figure out how much you can rely on your linear regression.
You often want to check correlations, if you are planning to do regressions (linear/logistic/etc).
On the other hand, having a correlation, you cannot determine Y for given values of X. Instead, a correlation that is high (close to +1, or close to -1) can tell you that there is a strong linear dependence between the X and Y variables.
I don't quite 'get' your first sentence there. If you want to determine y for given values of x (ie regression) then you probably want a good correlation. I suspect you were getting at the fact that sometimes it doesn't make sense to treat one of the variables as a predictor and one as the response and in that case then you can still look at correlations?
1) Correlation gives a measure of strength of linear relationship between y and x.
2) Regression explains how much of variance in y can be explained by x
1) and 2) are related: 1) gives correlation r and; 2) gives explained variance squared-r. So lower correlation means lower explained variance and poor prediction.
Sometimes you could have a reasonable correlation, say 0.5. This means the regression relationship explains only 25% of the total variance, meaning you may not get a reasonable prediction of y using x.