Help creating my first logistic Regression model

Hi everyone,

I`m new here, and new in the stats world, and i have several questions and need for help on how to proceed.

First, what i`m doing is try to predict an outcome, wich is named Phase in the picture here

What i`m doing is the follow:

1- for every column i`m changing the variables to a scale (EX: 1, 2, 3 etc) with no particular ranking. Question, should this matter i mean, which variable i use as a 1, and which i use as a 6, for example.

2- Im using R to create a model, but how do i know my model is good?

3- Is there any advice on how to proceed? how do i perfect my data so i can have a better model?

Thank you!


Ambassador to the humans
If you're using R then "changing variables to a scale" when there is no particular ranking is useless and potentially harmful. Just keep it as is. If you instead turn the variable into a factor (in R) then it will do the necessary stuff for you.

When i use str command to chech the file i just loaded it shows me all my columns, except the one i`m trying to predict are Factors.

but when i use glm function it returns me the error:
Warning message: fitted probabilities numerically 0 or 1 occurred

and the summary is all wrong.
What should i do here?

Even if I change Phase to factor, it still returns me the warning message

But i don t think the warning message is the problem, the problem is when i run a summary from my regression it returns each of my variables a scale, and not each of my columns

Something like this:
Product.Fit.Level2. Low Fit 1.000
Product.Fit.Level3. Regular Fit 1.000
Product.Fit.Level4. High Fit 1.000
Product.Fit.Level5. Very High Fit 1.000
Product.Fit.LevelNo Info / doesn’t Know 1.000
Product.Fit.LevelNo Info / Old Account 1.000
Omega Contributor
We see from your link that Phase is 0 or 1, binary, and is your dependent variable correct?

Which of the other variables are you adding to your model as independent variables?

Yes, Phase is binary and my dependent variable.

All the others, numbers and texts, are the ones i`m going to use as independent variables.

What would be the ideial way to proceed now?


Omega Contributor
Not overly familiar with R, but traditionally text fields can cause problems for software users. I may convert long test strings into categorical groups if possible (e.g., 1, 2, 3, 4,..,n). Other wise a program may treat each word/letter combination as its own unique category.
Good, but one thing i don`t understand is, if i change them to a number do the software interprets a variable, for example the number 4, as being better, or more, than a variable 1? or it doesnt matter since it is no assigning of weight to individuals variables to for their groups only ( i meant the columns)


Fortran must die
I don't know R in SAS you tell the software if the variable is categorical variable (in which case no level is higher than another and normally you are comparing a level to a reference level) or an interval variable in which case it is going to assume numbers are ordered. I am sure R has a similar system.

There is a lot to be said analytically to converting nominal independent variables into a series of dummy variables. When you have 7 unordered levels what is it really telling you to move from level 1 to 2 and 2 to 3? Even if they are ordered like likert data what that is telling you is subject to dispute (it depends on how you interpret the differences between each level substantively).

If you are new to statistics starting with logistic regression was brave :p My advice is to entirely ignore slopes and focus on the Odds Ratios.


Omega Contributor
You could also label them A, B, C, D,..,n. My recommendation was to just get away from those long strings, that you probably don't want to type into your code all of the time if running anything else.