# Thread: Multivariate Regression - I think!

1. ## Multivariate Regression - I think!

Hi,
I'm performing a psychological study about gender differences in attraction. Participants select between a number of figures that vary in Waist-To-Hip Ratio (WHR) or Shoulder-To-Hip Ratio (SHR). Figures varying in WHR/SHR also covary in BMI, so I ask participants to estimate the BMI of their chosen figure, and plan to factor this variance out.
Thus, my aim is to find out if, with variance due to BMI removed, the genders vary in their selection of the most attractive figure - does this sound ok up to this point?

I initially planned to use a Two-Way ANOVA, as this allows you to factor out a variable. However, I think that this can only be used for nominal variables, such as gender? And EstimatedBMI and FigureChoice are scale variables (there are 9 figures to choose between).

I then thought of multi-variate regression as another technique that allows one to isolate how much of the variance is explained by different factors. My thought is that my correlation table would consist of FigureChoice, EstimatedBMI, and Gender. How does this sound - am I on the right track?

If so, does this translate in SPSS13 into Analyze > General Linear Model > Multivariate...? If so, are FigureChoice and EstimatedBMI dependent variables and gender a Fixed Factor? Or perhaps EstimatedBMI would be a covariate?

Iain

2. I would say that Figure Choice is the dependent variable, Gender is the independent variable, and Estimated BMI is a covariate.

An ANOVA would work fine here - you would have a categorical independent variable (Gender), a scale-type dependent variable (Figure Choice), and a covariate (Estimated BMI).

Just one comment/question -

I assume that the figures that vary on SHR don't vary at all on WHR (or very little), and vice-versa?

If they did, that could potentially cloud the results (i.e., for Gender = male, is it the WHR, or a linear combination of WHR + SHR?)

3. John, thanks very much for your prompt reply - apologies for not replying sooner - I've been thinking things through. Firstly, however, an answer to your question:

The female figures vary in waist size, and thus in WHR. Thus, they also vary in BMI, but, along the lines of your question, also in shoulder-waist ratio. Males vary in shoulder size, and thus SHR. Thus, they also vary in BMI, and shoulder-waist ratio. Are you raising this issue to point out that, like variance in BMI that I intend to include in my analysist, variance in these additional ratios might explain to some degree the variance in attractiveness? If so, this is a good point. However, given the scope of this project, I think I will have to limit my statistical analysis to the combination of BMI and Waist (female) / Shoulder (male) size, and note the lack of statistical analyses of further body shape variations as a weakness of my study.

If I may, I'd like to outline my understanding of your feedback in the context one one hypothesis: The selection of most attractive figures will not exhibit gender differences.
• IV1: Sex
• IV2: Nationality
• IV3: Location of being brought up
• IV4: Visual Condition (media/control)
• DV1: IdealMaleBMI (estimated BMI)
• DV2: IdealMaleFig (shape)
• DV3: IdealFemaleBMI (estimated BMI)
• DV4: IdealFemaleFig (shape)
I imagine then, that my first task is to assess the normality. I intend to split the file by the IVs and get histograms, and then descriptives for each group, including Kurtosis & Skewness. I will remove outliers where it seems appropriate.

Lats assume normality. My intention then is to use a Univariate GLM to compare the genders' average IdealMaleFig selections for each group (nationality, location, visual condition). To do this, I would stop using sex to split the file, and then apply sex as a fixed factor, apply IdealMaleFig as the dependent variable, and apply IdealMaleBMI as a covariate. I would then do the same for IdealFemaleFig and IdealFemaleBMI. If neither test produces significant differences, the hypothesis is supported.

This sounds cool - what do you think?
Many thanks,
Iain

4. Sounds good to me.

5. This is too cool for school! You're so quick . I'll get to work!

Thanks so much!
Iain

6. You're welcome - let us know how it comes out.

7. Hello again, another quesiton. In my earlier post, my intention was to perform several ANCOVAs, one for each combination of the three IVs: nationality, location, and visual condition (based on a file split using these variables in SPSS).

I'm aware that multiple comparisons increases the probablity of a Type 1 error, however. Do you think this applies here? Should I instead by including these variables in the ANCOVA, and thus having four fixed factors, (when you include gender)? This makes for a messy output, with 4 main effects and 11 interactions! Are there implications of such a complex ANCOVA that I should be aware of?

Many thanks,
Iain

8. Personally, my inclination is to keep things simple - just my opinion.

Don't worry so much about increasing Type I errors - it doesn't appear that your original hypotheses included any conjecture(s) about interactions among the IV's, but try it both ways - see what you get. The interactions may prove interesting!

9. Thanks very much John, I will consider both - initial indications are that the interactions are not significant - thus I guess it's not a problem to do separate tests?

I am bumbling through another hypothesis. When subjects select a figure, they also rate it for attractiveness, health, and fertility. The hypothesis is that fertility and particularly health predict attractiveness. I expect the relationship to be curvilinear (health and fecundity will increase as figure WHR increases to 0.7 (the most attractive WHR), but decrease as WHR increases further).

My approach has been to do a correlation - I graphed an overlay scatterplot with Figure-Fecundity and Figure-Health pairs, and then got a Fit Line, and compared the r^2 values. This produces what I expect/hope to find, but does this sound appropriate to you? Further, given my expectation of curvilinearity, and that the data looked roughly curvilinear, I set the Fit Line to be quadratic. Is there a test however, to know if this is appropriate or not?

Additionally, however, I'm getting confused because I have two sets of data about attractiveness. Subjects select the most attractive figure (1-9), but they also rate that figure for how attractive it is (0-100). Should I be combining these numbers somehow, and be correlating this with fecundity and health? In my correlations above, I also included Figure-Attractiveness, and found this to have the highest r^2 value.

Thanks again,
Iain

10. Hyperstat has a good discussion of "trend analysis" which covers ways to test (it's basically a post-hoc test done after ANOVA) whether your relationship fits a polynomial / curvilinear trend:

http://davidmlane.com/hyperstat/B103184.html

You could also do something quickly in Excel - do an XY scatterplot, add a trend line, and specify 2nd(?) degree polynomial, and include the R^2 and best-fit equation on the chart.

On the ratings, I would just include this as a "side" discussion if it's not a central theme of your study (i.e., if you want to just focus on figure selection) - it could provide basis for a future study....I wouldn't try to "combine" them - maybe just include the correlations as an "interesting" side note.

11. Okay, got a bit beyond me here.

I tried to work through the equation on hyperstat, figuring the average health ratings for each figure. There are 9, so I worked though squaring the coefficients and dividing by the number of subjects who selected and rated that figure. However, I then got stuck, because I didn't know how to figure the MSE, and I wasn't quite sure why I was doing what I was doing anyway. Was the idea to calculate t, and that if t was significant, then the relationship is quadratic / curvilinear?

I'm not quite sure I understand this - how to determine whether a relationship is quadratic or linear. Some things seem to indicate that there are linear and quadratic elements to the one relationship - is there any way you can make this clearer? I'm assuming that I can't just assume the relationship is quadratic because it looks vaguely so, and because that's what I'm expecting.

Thanks for the advice regarding the attraction rating - sounds sensible. Other than the above, things seem to be going okay - 6 hypotheses are hopefully figured out, just this and one other to go - thanks again for your help.

Iain

12. You start at linear, then do quadratic, then cubic, etc. and stop at the highest order trend that is significant.

A trend line could have significant linear and quadratic "components." If it is a general downward or upward increase - i.e., a more-or-less constant slope, then the linear component is significant. If there's a significant change in the slope, then the quadratic component is significant, and the trend would be called "quadratic" (assuming the cubic trend isn't significant).

13. Ah !! I totally understand now, thanks!

14. ## Assumptions

Believe it or not, I'm still working on these problems. It looks like I am not going to be able to acheive equal sample sizes - I think instead I will have about 20 males in each visual condition, and 30 females in each visual condition. I imagine that this discrepancy is serious enough such that I must fulfill the assumptions behind ANOVA of normal distributions and equal variances between cells?

If so, how do I do I validate that I've fulfilled these assumptions? I have graphed then (histograms and boxplots), but this obviously doesn't give me any indication of statistical significance. I believe I can divide the kurtosis and skewness by std. errors, with a figure within +/-2 for both suggesting validating a normal distribution? How can I validate that the variances are not significantly different?

Are there any other assumptions for test under the GLM that I should be accounting for? I got advice from a statistician, and they started out by highlighting ceiling effects (meaning non-normality?) and the like that I hadn't picked up - is there anything beyond the mentioned assumptions I should be looking out for?

Thanks very much again,
Iain

15. My bad - I just read another post which explained about Shapiro-Wilks and Kolmogorov-Smirnov Tests for normality, which I found in SPSS. I assume that a significant output in these tests indicated normality and homogeneity of variances respectively?

However, if there are further things I should be looking out for, please let me know. Or is it a case of isolating where just these two assumptions aren't fulfilled, and then looking at the data to find out why (i.e. a ceiling effect)?

Many thanks,
Iain