The QoL scores have 5 different well-being domains (u2 urinary, bowel, sexual, hormonal). Preliminary examination of the data showed poor reliability with low Cronbach alpha coefficient (0.52-0.66). Due to incomplete responses, 8.5%-15% of the responses could not be scored – depending on the domain. Skewness of the QoL scores range from -0.2 to -2.9. Running univariate statistics, the normal probability plots take on a concave shape.

**Data**

DV = (continuous QoL scores – each will be examined in a separate model)

Main effect = (categorical race/ethnicity)

IVs = (age, marital status, BMI, income) combination of continuous and categorical

**Research Questions**

1) Are there racial differences in QoL scores after controlling for multiple confounders? (Address this in a poster & manuscript)

Approach for Research Question #1

I think of this as an ANCOVA portion of ‘Proc GLM’ procedure since I’m only reporting adjusted means and adjusted mean differences.

(Questions) Is this an appropriate of stating this? Can I really call it that since ‘raceth’ is not a continuous covariate?

Proc glm data=qol order=internal;

Class raceth (other categorical IVs);

Model famwb = raceth IVs /solution ss3;

Lsmeans raceth/pdiff;

Run; Quit;

2) What is the effect of race/ethnicity on QoL scores after controlling for multiple confounders? (Address this in a manuscript only)

Approach for Research Question #2

I think of this as a multivariable linear regression.

(Question) If I reported parameter estimates would this be considered the regression portion of the ‘Proc GLM’ procedure?

Proc glm data=qol order=internal;

Class raceth (other categorical IVs);

Model famwb = raceth IVs /solution ss3;

Run; Quit;

(Question) – I want ‘Caucasian’ to be my reference and compare the other racial groups to the reference. What’s the syntax for this? I know there is a contrast option but I’ve never used it.

(Final questions)

*I plotted the residuals of some of the continuous IVs but the patterns were not random. I tried several transformations but they didn’t make a huge difference. Would it be better to categorize the variables or not use them at all?

After adjusting for all confounders, the R^2 were low ~0.04. Doesn’t this raise a red flag? Could this be because of the large percentage of unscored data?

I apologize for the lengthiness. I’d appreciate any light you can shed on these issues.