adjusting a proportion score (controling for age)

#1
I have 20 yes/no ratings (did this product help you). People were able to rate as many of the 20 products as they used (on average people rated 5 products).

The outcome we are looking at is % rated helpful for each product. But some products were used by older people than were others (and age correlates with perceptions of helpfulness).

So I want to adjust ratings for age but given that I have multiple outcomes am not sure what would work best. Also, we are just comparing ratings here and there are no other predictors.

I was thinking an ANOVA comparing all 20 products with age as a covariate and then getting adjusted means ... but I'm not sure this is the right path.
 
#2
I have 20 yes/no ratings (did this product help you). People were able to rate as many of the 20 products as they used (on average people rated 5 products).

The outcome we are looking at is % rated helpful for each product. But some products were used by older people than were others (and age correlates with perceptions of helpfulness).

So I want to adjust ratings for age but given that I have multiple outcomes am not sure what would work best. Also, we are just comparing ratings here and there are no other predictors.

I was thinking an ANOVA comparing all 20 products with age as a covariate and then getting adjusted means ... but I'm not sure this is the right path.
So, are you interested in comparing the helpfulness of products to eachother, or what groups/characteristics were identified/correlated with helpfulness of each product individually? In the latter case, a discrete choice model may be useful, since the possible outcome is Bernoulli distributed (0/1). I assume that "helpful" =1. not "helpful" =0.

Assume that we can represent the error logistically, then:

outcome (0/1) = a + b*X +e, where b is a vector of coefficients and X is your covariate matrix. You can show the probability expression analytically( but can probably skip this step) that you used in estimation. Estimate by maximum likelihood.

Be careful in interpretation: coefficients do not represent total effects like OLS, but instead give the direction of the effect on probability of the product being classified as "helpful". You can estimate total effects by calculating the marginal effects. See a stats textbook for this.

Good Luck!

PS: you have other regression options as well: probit, linear probability, etc.
 
Last edited:
#5
So, are you interested in comparing the helpfulness of products to eachother, or what groups/characteristics were identified/correlated with helpfulness of each product individually? In the latter case, a discrete choice model may be useful, since the possible outcome is Bernoulli distributed (0/1). I assume that "helpful" =1. not "helpful" =0.

Assume that we can represent the error logistically, then:

outcome (0/1) = a + b*X +e, where b is a vector of coefficients and X is your covariate matrix. You can show the probability expression analytically( but can probably skip this step) that you used in estimation. Estimate by maximum likelihood.

Be careful in interpretation: coefficients do not represent total effects like OLS, but instead give the direction of the effect on probability of the product being classified as "helpful". You can estimate total effects by calculating the marginal effects. See a stats textbook for this.

Good Luck!

PS: you have other regression options as well: probit, linear probability, etc.
Thanks! This is useful for other projects but it is actually the former I am aiming to achieve. These are indeed 0 (not helpful) 1 (helpful) and we are ordering the products from most to least helpful. But we need to adjust for age of rater.
 
#6
Can you clarify your variables?

is the response y/n? or a percentage?

how many people in each age group?
There was actually a scale for each rating. It ranged from not at all helpful to very helpful. For a variety of reasons we converted this to a 0 1 scale, with the top "very helpful" rating being 1 and all other being 0. So we are looking at the percent of people who rated each product very helpful.

Each product was rated by a different number of people. We let people rate as many of the 20 products they had tried in the past 12 months.

Some products were tried by (and rated by) more people in the 50-60 and 60-70 range than were others. And those in these age ranges tended to be less satisfied with the products. So we want to adjust the % rated very helpful in order to correct for any differences due to age.
 
#7
Thanks! This is useful for other projects but it is actually the former I am aiming to achieve. These are indeed 0 (not helpful) 1 (helpful) and we are ordering the products from most to least helpful. But we need to adjust for age of rater.
In that case, it seems to me that you can use ANOVA to evaluate the variations in the helpfulness score assigned to each product.
 
#9
Oh, I am also concerned because of the repeated measure aspect. But the measures aren't fully repeated. Can I use a repeated measures analysis with a covariate ... like a MANCOVA ... even if not every rater completed every rating?