# What to do with proportion response variable

#### Emma Doherty

##### New Member
Hey!

I am so confused as to what to do with my data. I have 5 types of communication signal A,B,C,D,E recorded. For 26 individuals, I have counted the total number of signals they produced, and converted the number of A,B,C,D,E from the total into mean proportions eg. mean proportion of total for signal E = E/total.

My 26 individuals are aged between 1-11. I am interested in testing the relationship between age (years) and mean proportion A produced (then B,C etc).

I am just not sure what model to use with a continuous predictor (age) and proportional response variable (0-0.32). It is also important for me to include 0's and 1's here.

Any help greatly appreciated!

Emma

#### Karabiner

##### TS Contributor
So each participant has a) a value for age b) a value for "individual proportion of type A responses, relative to his/her total number of responses" ?
If not, please clarify how your data collection was conducted.

With kind regards

Karabiner

#### Emma Doherty

##### New Member
Hi Karabiner,

Thanks for your reply. Yes, each individual has an age value and then a value for the proportion of a total number of responses were type A (plus B, C,D,E).

For example. Individual 1 is 11 years old. Individual 1 produced 130 signals. 50 of those signals were of type A. The mean proportion of type A produced is 50/130 =0.39. The same has been done for the other 25 individuals and I want to see if the mean prop of individual totals that is type A (and later the other types) increases significantly with age.

Some individuals may have a proportion of 0 and 1 so it is important for me to account for these also. I have tried to do this by transforming my values with the formula : (y*(n-1)+1/C)/n, where y=the proportion value, n=total observations in a data set, C is number of categories (in this case 5) (Maier, 2014; Smithson & Verkuilen, 2006)..

Many thanks,

Emma

#### Karabiner

##### TS Contributor
So why don't you just regress "individual proportion of A" on age (linear regression)?
Or, if you want to perform statistical tests, use Spearman rank correlation? What do
you want to achieve by the transformtion?

I would create 5 scatterplots first, each with proportion of response of type X versus age.

With kind regards

Karabiner

#### Emma Doherty

##### New Member
Hi Karabiner,

A colleague had said that maybe a linear may not be the way to go because it could make predictions less than 0 and more than 1 which may not work with proportion data? Although I am definitely happy to go look into this more if you think that's a way to go! (I haven't used proportions before so what to do with them is all new to me).

Thanks for your advice!

Best,

Emma

#### GretaGarbo

##### Human
A colleague had said that maybe a linear may not be the way to go because it could make predictions less than 0 and more than 1 which may not work with proportion data?
Yes, I agreee.

So why don't you just regress "individual proportion of A" on age (linear regression)?
But I interpret this comment to suggest yo use logit model (a logistic model). (Where the expected value can never go under zero or above one.)

I would create 5 scatterplots first, each with proportion of response of type X versus age.
I think this is the best advice.

If the graph does not seem to fit with the model, then reject the model. And suggest a new model.
(Show us the graph and maybe someone can suggest a better model.)