logit model without constant: ''no convergence''?

#1
Hi dear all
My logistic model's dependent variable is cost/benefit ratio of an investment. It takes value ''1'' if c/b ratio>=1 and ''0'' if ratio<1. independents are REGION (takes value ''1'' if investment includes one region, and ''0'' if two or more regions), TYPE (''1'' if its new investment and ''0'' otherwise) SECTOR (manufacturing, agriculture and serving -serving is base category). responses (1) if an investment’s cost/benefit ratio≥1 and (0) if c/b ratio<1.

When constant term included, the model cannot be estimated because of endless iterations saying ''no convergence'' When constant excluded, all parameters are significant at %5 and %10, prob>chi2=0.0082 wald(4)=13.74.

It’s clear that constant term is not good for my model. As far as i understand, cost/benefit ratio may not exist under the assumption all independents equal to zero. But how can i run this model in STATA? how could i interpret the coefficients and Which statistics or goodness of fit measure should i use for evaluating the significance of model the model without constant term? And what happens to logit model without constant (respect to the convergence error)?
Thanks for now.
 
#2
You have asked about this before. (It is generally better to stick to the same thread.)

Can you make a small reproducible example (so that we can test it, maybe in different software)?
 
#4
i thought i wrote in wrong forum, so wrote here again. sorry for this. i didnt understand what you meant with 'reproducible example'. would you mind explaining it. I do otherwise!
 
#5
Can you make a small reproducible example.
That means "Can you show us a few lines of data" so that we can check if there is some strange things with these data.
(Of course you need check that you get the same kind of error with your software for these data.)
 
#6
I attached observations from 25-54. (54,total). I have more variables (also versions of these), as well as more different model tryings, giving the same'' not concave'' error. this is most basic one.
I can send whole dataset of these variables.
Great Thanks for your interest..
 
#7
You seems to have some multicolinearity. You can see that on the first linear regression (which is irrelevant here but it reveals the colinarity among the IVs).

There seems to be some other lack of information in the data. Maybe that can be seen by doing 4 two-by-two cross tables among the IVs (with 4 variables each on 2 levels gives 2^4 = 16 combinations).

Code:
# this is a program in R

d <- read.table(header = TRUE, text = "
c_b	invest	highway	seaway	airway
                0	1	0	1	0
                0	1	0	1	0
                1	1	1	0	0
                1	1	1	0	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                1	1	0	0	1
                0	0	0	0	1
                1	1	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	0	1	0	0
                1	1	1	0	0
                1	0	1	0	0
                0	1	0	1	0
                1	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0
                0	1	0	1	0")
     
length(d$c_b)


###############

summary(lm(d$c_b ~ d$invest + d$highway + d$seaway + d$airway) )
summary(lm(d$c_b ~ d$invest + d$highway + d$seaway           ) )

summary(glm(d$c_b ~ d$invest + d$highway + d$seaway + d$airway, family = binomial ) )
summary(glm(d$c_b ~ d$invest + d$highway + d$seaway           , family = binomial ) )
summary(glm(d$c_b ~ d$invest + d$highway                      , family = binomial ) )
summary(glm(d$c_b ~ d$invest                                  , family = binomial ) )



Code:
> summary(lm(d$c_b ~ d$invest + d$highway + d$seaway + d$airway) )

Call:
lm(formula = d$c_b ~ d$invest + d$highway + d$seaway + d$airway)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42553 -0.07143 -0.07143  0.04255  0.92857 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.4255     0.1730   2.460  0.02085 * 
d$invest      0.1489     0.1246   1.196  0.24258   
d$highway     0.5319     0.1746   3.047  0.00525 **
d$seaway     -0.5030     0.1834  -2.743  0.01088 * 
d$airway          NA         NA      NA       NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2282 on 26 degrees of freedom
Multiple R-squared:  0.8186,	Adjusted R-squared:  0.7977 
F-statistic: 39.12 on 3 and 26 DF,  p-value: 8.758e-10

> summary(lm(d$c_b ~ d$invest + d$highway + d$seaway           ) )

Call:
lm(formula = d$c_b ~ d$invest + d$highway + d$seaway)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42553 -0.07143 -0.07143  0.04255  0.92857 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.4255     0.1730   2.460  0.02085 * 
d$invest      0.1489     0.1246   1.196  0.24258   
d$highway     0.5319     0.1746   3.047  0.00525 **
d$seaway     -0.5030     0.1834  -2.743  0.01088 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2282 on 26 degrees of freedom
Multiple R-squared:  0.8186,	Adjusted R-squared:  0.7977 
F-statistic: 39.12 on 3 and 26 DF,  p-value: 8.758e-10

> summary(glm(d$c_b ~ d$invest + d$highway + d$seaway + d$airway, family = binomial ) )

Call:
glm(formula = d$c_b ~ d$invest + d$highway + d$seaway + d$airway, 
    family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.38499  -0.38499   0.00000   0.00004   2.29741  

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -19.33    9577.21  -0.002    0.998
d$invest       38.67   13544.22   0.003    0.998
d$highway      40.36   11891.45   0.003    0.997
d$seaway      -21.90    9577.21  -0.002    0.998
d$airway          NA         NA      NA       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.4554  on 29  degrees of freedom
Residual deviance:  7.2049  on 26  degrees of freedom
AIC: 15.205

Number of Fisher Scoring iterations: 20

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(glm(d$c_b ~ d$invest + d$highway + d$seaway           , family = binomial ) )

Call:
glm(formula = d$c_b ~ d$invest + d$highway + d$seaway, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.38499  -0.38499   0.00000   0.00004   2.29741  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -19.33    9577.21  -0.002    0.998
d$invest       38.67   13544.22   0.003    0.998
d$highway      40.36   11891.45   0.003    0.997
d$seaway      -21.90    9577.21  -0.002    0.998

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.4554  on 29  degrees of freedom
Residual deviance:  7.2049  on 26  degrees of freedom
AIC: 15.205

Number of Fisher Scoring iterations: 20

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(glm(d$c_b ~ d$invest + d$highway                      , family = binomial ) )

Call:
glm(formula = d$c_b ~ d$invest + d$highway, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.53498  -0.53498   0.00000   0.00006   2.00744  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -18.60    6644.47  -0.003    0.998
d$invest       16.73    6644.47   0.003    0.998
d$highway      38.74    8036.74   0.005    0.996

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.455  on 29  degrees of freedom
Residual deviance: 11.780  on 27  degrees of freedom
AIC: 17.78

Number of Fisher Scoring iterations: 19

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(glm(d$c_b ~ d$invest                                  , family = binomial ) )

Call:
glm(formula = d$c_b ~ d$invest, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1899  -0.8712   0.4366   0.4366   1.5183  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)    2.303      1.049   2.196  0.02810 * 
d$invest      -3.076      1.159  -2.654  0.00796 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 41.455  on 29  degrees of freedom
Residual deviance: 30.401  on 28  degrees of freedom
AIC: 34.401

Number of Fisher Scoring iterations: 4

>
 
#8
I'll try your advices one by one. Studying with small number of observations always difficult; but i have to.
Before implying your suggestions, i really want to know one more thing: is modelling the dependent variable one-by-one with independents by individual logistic models an appropriate way? for example, like correlations, logit c/b REGION ,,,, logit c/b sector (highway, seaway, etc.),,
Great and greater thanks. Ill try all and come back..
 
#9
Thats right, youre right. Few variables dont create problem but when i add more, stata cannot estimate because of variables seem each other (i think). so,, is it appropriate modelling variables one by one as i asked before? It doesnt make trouble so...
Isnt it very simplistic for a sedate survey, right?
And how can i explain this situation in my survey? What should i say about this?
 

hlsmith

Omega Contributor
#10
I haven't directly looked at your data, but collinearity results in inflated standard errors. A typical solution is dropping variables. Running a bunch of simple models can be troublesome because you don't get to see how variables interplay with each other or that some variables have explanatory overlap.


Was that your question?
 
#11
thats exactly what i asked. if i cannot increase number of obs., if stata doesnt estimate the model full of variables (because of concave error),, i dont know any other ways to make the analysis (doctorate degree).
estimating simple models is the rest of the solutions. but this time, interaction -un-efficiency problem in interpreting occurs.
what solution you prefer best for the sake of the study,. any other way. any other offer...
Great thanks for dealing and help.
 

hlsmith

Omega Contributor
#13
I would draw (make a picture) out your model as nodes and how all of the variables are related and see if there are any you can remove and still answer your question.


Analytic alternatives are using exact or Firth logistic regression or regularized logistic (which removes variables or applies a penalty to them). The first could help the model converge and the second is used to whittle down the number of variables used.


I have never heard of a concave error, can you take a screenshot of it and upload?
 
#15
I cannot find a word to say..
I think this attachment is the one i've been searching nearly for a month (understood at first glance).
Ill translate it word by word and try to find which way is the best for my dataset. now im organizing the dataset to send you.
Great thanks.
 
#16
Here's all what you demand. There's one more variable called region of 4 responses but it is more nerve-wracking. If it will be nurse, i can sent it too.
Thanks for time and favor.
 

hlsmith

Omega Contributor
#18
I can't open your file but if you present it like Greta did (with R input statement) I can take a look at it. I also, looked at greta's code and was wondering if that was a piece of your data? If so, it appears your last three variables are dummy coded and using any two of them make the use of the third useless, since you can solve the value of the third using the other two.


Perhaps try collapsing the three variables into a single variable taking on the values 0, 1, 2 for the three groups. Models will complain when you do what you did and say the third variable is a composite of other variables. This may be contributing to your issue.
 
#19
it was a piece of my data, right. my full number of obs.:53. i create the third ones so as to leave it out as base category.
I didnt go the way creating the dummy of three responses (0-1-2) because of interpretting difficulty. Wrong?
If only you could see the dataset. Ill try to upload it you again. its better you see. i think i'll come closer the ''right'' solution thanks to you.
with thankfulnesses.