GLM multivariate analysis with proportion & continuous data

Hi there,

I am looking to perform some multivariate analysis on my data (such as canonical correlation analysis or multivariate multiple regression).

However, my data is not all continuous, is not normally distributed (assessed using QQ plots and Shapiro-Wilk tests) and some variables are correlated with each other.

Essentially, I have a set of 2 predictor variables and a set of 7 response variables.

The predictor variables are both continuous. The response variables are a mixture: 3 are count data, 3 are percentage data, and the last is continuous.

Now, my problem is, I do not know which model to use to analyse the data considering these different types of variable.

I have looked into transforming the count and percentage data but have read that this is potentially problematic and unhelpful, despite it's tradition in statistics (e.g. the arcsine square root transformation). I then considered performing separate logistic regressions on each non-continuous response variable using a binomial error distribution with the count/percentage data.

I am conducting the analysis in R.

Can anyone give me any advice to what I should be looking to do with my data to prepare it for analysis, or any other analyses I could conduct that take into account these different data types?

Thanks so much in advance!


Okay, so I've read a few articles recommending a logit-transformation for percentage variables, and it appears have made it more normally distributed.

Any other advice though, is very welcome.
Last edited:
I was wondering if you managed your issue ?

I am also having trouble with that kind of analysis. I need to analyse budget time. I have 4 percentages responses (percent of the time used to feed, percent of time used to socially interact) variables that I would like analyse together.

I though about multivariate regression glmm. MCMCglmm seems to be a good package. However, Bayesian approach is not what I am looking for. Do you have other ideas ?