# Have population, use inferential stats? Non-normal dependent variable, what to do?

#### alea iacta est

##### New Member
Statistics people! I have two questions for you. I am NOT a statistician so please be nice and answer simply

Background: We are looking at parental leave in Iceland. We are particularly interested in whether the economic crisis and the resulting changes in parental leave legislation affected the time taken for parental leave.

We have reason to believe that the effect of the crisis/new laws will be different for mothers and fathers (who have equal rights to a leave), will depend on income and education, and that there might be an interaction between factors (e.g. that the length of leave for fathers would be independent of income before the new laws, but would start to depend on income after the laws were passed).

1) We actually have not just a sample, but the entire population (about 50000 kids). Do we even need inferential statistics at all? Can we just describe the results either numerically or graphically, because whatever difference there is, that is the actual difference in the population?

2) If we do need to do inferential statistics, then we have some potential problems.

First, the dependent variable (length of leave) is not even close to normally distributed. Instead it is multimodal, e.g. people are likely to take 0 days, 30 days, 60 days, 90 days etc. but not e.g. 3 days or 34 days. I cannot transform this distribution to look anything like it is normally distributed.

I first considered using some kind of non-parametric test that looks at differences in medians, but the problem is that the medians might actually always be close to same (e.g. 90 days) while the distribution is still changing.

I then considered binarizing the dependent variable (e.g. takes less than standard leave vs. takes standard leave or more). This would allow me to use logistic regression and the weirdness of the distribution would be gone. I am fine with this.

However, I am interested not just in main effects (e.g. main effect of time and main effect of income) but also in interactions (e.g. interaction between time and income). I am not sure how to deal with interactions in logistic regression, especially since I might have to treat the factors as categorical (e.g. I am not expecting the length of leave to, say, linearly increase or decrease with time -- I am expecting a curvilinear relationship between length of leave and time).

What to do?

I mainly use SPSS for analysis, in case that is relevant.

#### jamajor

##### New Member
Re: Have population, use inferential stats? Non-normal dependent variable, what to do

I would recommend John W. Tukey's classic "Exploratory Data Analysis" for inspiration as to how to discover and portray structure in your data.