I investigate if physical activity is related to learning outcomes in adult students participating in distance learning. Physical activity is measured via online survey research in 1728 students and after half a year their learning outcomes are mapped. Learning outcomes are measured in terms of study progress. My problem: only 20% of the students (359 to be precise) has acquired at least one or more modules (worth of 4,3 EC's[European Credits]). Because roughly 80% has a zero on study progress, spss considers everyone with more than zero in my outcome variable as an outlier. Also, normality assumptions are violated. However, I can cover that by performing a log transform on my outcome variable, but then these cases remain an outlier (logically).

In addition, my outcome variable has in my data 15 possible values, from 0 to 14. Each increment is 4.3 EC's attained more. Technically, more values are possible.

The outcome represents the number of modules attained in half a year of studying. One module is 4.3 EC.s

To conclude: I want to perform linear regression on these data.

My questions are:

1. Is my outcome variable truly a continuous variable? As this is needed for the linear regression (assumption).

2. How do I handle the fact that only 20% of the students have attained any progress?