Multilevel modeling - dependent variable is not normally distributed

Hello everybody,

I am doing the research, using Multilevel linear modeling (MLM), with dependent variable number of days of payment delays, which is not normally distributed. Is that a problem for MLM? Is it necessary for Y to be normally distributed? I have done a winsorisation- have limited variable's values at 0 downwards and at 30 days upwards (to reduce outliers), however, values are mostly distributed up to 3 days.

Any help is really appreciated. Thank you!


Super Moderator
What's your sample size? Technically normally distributed errors are required for confidence intervals and significance tests to be trustworthy, but the larger your sample size the less important this is (because the sampling distribution of the coefficients will converge toward a normal distribution anyway, per the CLT).


Phineas Packard
Is Gaussian really the right functional form for this sort of outcome variable? It strikes me, hopefully, that most people pay on time so have a payment delay of zero and that there are only a few naughty people that are really really late. I don't know sounds like count data to me and maybe even zero inflated?
Although MLM does not assume that the dependent variable is normal, linear MLM (like linear regression) assumes that the DV is continuous. Your DV is a count. Therefore, you want a nonlinear MLM with either Poisson or Negative Binomial or maybe a zero inflated function.

In SAS you can do this with PROC GLIMMIX. In R I think nlme has these.


Ambassador to the humans
Ignore Lazar. You probably want to use a model that allows for counts as your dependent variable. Maybe even something zero inflated.
Hello everybody,

Thank you very much for your help.

As dependent variable - number of days of payment delays (for each company) I can use the average of annual payment delays (approximately 60 payments/year), in that case dependent variable is continuous variable and is acceptable for MLM, right?
And about the sample size. I actually have really big sample of companies (9000), but they are on 2nd level. The levels are following: 1st years (observations over 6 years), 2nd companies, 3rd industries, 4th country. As it is stated in many books the minimum sample size refers to the highest level in the data hierarchy (at least 20-25 groups for accurate estimates). Is it a problem, that I have only 1 country on the highest level? Is it the best if I just include 3 levels (with industries on the highest level)? If that is the case, how should I use explanatory variables, related to country level (eg. GDP, inflation)? Should I just include them on industry level (values will be the same for all industries).

Thank you!!
Well yes ,I guess its not real level then, but I want to include variables on country level that influence y (BDP, Inflation), so maybe country should be included somewhere?!


Fortran must die
It has been a while since I did multilevel but I don't think you can nest something at one level inside another level when you only have on unit at the higher level. It wouldn't make any sense for example to talk about something nested inside classroom (a stratifying factor in practice) if you only had one class room.

If you want state to be a level than you need data on more than one state.