Some Basic (ish) Regression Advice Required

#1
Hi All,
I was wondering whether you could help me with this. I’m undertaking a piece of short coursework (assessed – pass/fail grading). Only going as far as some descriptive statistics is sufficient, so I'm at that stage at the moment, but for my own development I wanted to go a little further.
I’m looking at the impact of different factors on international students attainment at university. Forgive my lay talk about this, I don’t have much background in statistics.

EDIT: Research question is "What factors influence the grade attainment of international students". I also have a 'control' dataset of domestic students that I also plan to perform the below on, but the descriptions below are for the international students.

The measurements I have (amongst others, but these are the focus of my analysis) are:
(1) (ordinal, 5 questions, Likert 1-5) pedagogic familiarity
(2) (ordinal, 5 questions, Likert 1-5) language familiarity
(3) (continuous) 'additional' study hours
(4) (continuous, dependent variable) grade difference (pre-university gpa - current gpa)

In descriptive terms, if I look at the relationships between the above, I see that:
1 vs 2 no relationship
1 (x) vs 3 (y) creates a loose y = log(x) curve.
2 (x) vs 3 (y) creates a y = a.(x) linear relationship.
3 (x) vs 4 (y) creates a loose y = log(x) curve.

Looking into different methods and having some very basic familiarity with regression analysis (primarily from looking at results in journals), I thought that multiple regression analysis would be useful (with 4 as dependent and 1-3 as independent).

My questions are:
1) 1&2 are not related, but 1/2&3 are – can I reduce this to just a nonlinear regression analysis of 3 (independent) and 4 (dependent)?
2) If not, is it ‘OK’ to mix ordinal and continuous data (with a warning that end result of any regression can only be viewed as ‘approximate’)?
3) Should I even be thinking about nonlinear regression? Or is that overkill (should I be looking at a log transformation - I understand the concept but probably not as much as I need to)?
4) Is it useful or totally pointless to look at multiple regression to examine the relationship of 1/2&3, then as a separate regression 3&4?
5) Is there any other statistical method I could/should use to explore the relationships here? I'm not dead-set on regression, but from investigating different methods it seemed to be the best fit for me.

Sorry if this all sounds stupid – I’ve recently read a couple of regression textbooks and I’ve been googling but I can’t quite get my head around published guides and advice in the context of my research so I would really appreciate any guidance.
 
Last edited:
#2
I don't think grade difference is actually interval because you are subtracting one ordinal grade from another. But if you have enough distinct levels that result (and you assume the differences between each grade are reasonably the same) you might be able to treat it as interval like. There is a lot of disagreement on this in the literature.

I don't understand what you mean when you say !&2 are not related by 1/& 3 are.

What do you mean to mix ordinal and continuous data? You can use ordinal data to predict continuous predictors in the form of dummy variables. Whether you can use Likert scale variables (as non-dummies) to predict interval data is disputed. I sat at a table where one statistical professor said yes and another no when I raised this. :) What I mean is to use a variable measured from 1 to 5 to predict another variable. It formally does not violate the rules of regression, but some feel it leads to invalid ('nonsensical') results.

I am not clear what you mean by non-linear regression here. I stay away from non-linear regression (like loess for example) if I can possible avoid it. It is not for the faint of heart. If you are unfamiliar with logs than this is not the way to go. But then I am not really sure again what you mean by non-linear regression. Ordinal predictors are not non-linear models.

I think the question to 5 depends on your expertise. If you have worked with a lot of regression than complex methods make sense. If you have limited expertise it does not. If this is for a class and you have no experience with complex methods than I would not think your instructor expects such.