+ Reply to Thread
Results 1 to 12 of 12

Thread: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-16)

  1. #1
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-16)




    I am doing a test logistic regression to predict whether employees will stay in the company for more than 3 years.

    After the model is trained, the predictions done using the model gives only the probabilities of "1" and "2.2204E-16 (essentially 0)".

    I thought normally the probabilities will lies somewhere between 0 and 1. Is this case due to the lack of training data? Or model convergence problem? Are there ways to solve this problem?

  2. #2
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    How many predictors are you using? What are your sample sizes?
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Quote Originally Posted by Dason View Post
    How many predictors are you using? What are your sample sizes?
    I have 533 predictors, and 18000 pieces of training data.

    The training phase sometimes gives 2 warnings:
    "Iteration limit is reached"
    "Regression design matrix is rank deficient to within machine precision"

    Would these cause the "1" and "0" probabilities predicted?

  4. #4
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Can you post your full output?
    Stop cowardice, ban guns!

  5. #5
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Quote Originally Posted by hlsmith View Post
    Can you post your full output?
    Thanks for helping

    I used the Matlab function "fitglm" to implement the logistic regression by setting the 'Distribution' parameter equals to 'binomial' :

    Logi_COE_P = fitglm(training_data_matrix, result_data_matrix, 'linear', 'CategoricalVars', CategorialVariables, 'Distribution', 'binomial', 'Link', 'logit', 'BinomialSize', 1, 'DispersionFlag', true, 'Weights', OverllDataWeight);


    During the training process, it gives the warnings:

    Warning: Removing terms where categorical variables
    appear in powers higher than linear.
    > In FormulaProcessor>FormulaProcessor.removeCategoricalPowers at 510
    In TermsRegression>TermsRegression.removeCategoricalPowers at 396
    In GeneralizedLinearModel>GeneralizedLinearModel.fit at 1244
    In fitglm at 133
    In Forecast at 248
    Warning: Iteration limit reached.
    > In glmfit at 368
    In GeneralizedLinearModel>GeneralizedLinearModel.fitter at 919
    In FitObject>FitObject.doFit at 220
    In GeneralizedLinearModel>GeneralizedLinearModel.fit at 1245
    In fitglm at 133
    In Forecast at 248
    Warning: Regression design matrix is rank deficient
    to within machine precision.
    > In TermsRegression>TermsRegression.checkDesignRank at 98
    In GeneralizedLinearModel>GeneralizedLinearModel.fit at 1262
    In fitglm at 133
    In Forecast at 248


    For the predictions given by the trained model, it gives:

    Probability of employee staying more than 3 years: 1 1 1 1 1 1 2.22E-16 2.22E-16 2.22E-16 2.22E-16 2.22E-16 2.22E-16 2.22E-16 2.22E-16
    Employee number: 1 2 3 4 5 6 7 8 9 10 11 12 13 14


    Any idea what all these mean?

  6. #6
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    No idea, not familiar enough with STATA or the procedure. I would consult the documentation for the procedure. Is this a crossvalidation procedure? Some times you can change the number of iterations in programs, but you seem to have other issues as well.
    Stop cowardice, ban guns!

  7. #7
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Why do you have some many predictors?
    I don't have emotions and sometimes that makes me very sad.

  8. #8
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Thanks for the effort

    Ya there are a few issues, not sure which cause the unwanted results...

  9. #9
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Quote Originally Posted by Dason View Post
    Why do you have some many predictors?
    My thought was that I can start with many possibly meaningful predictors, then those non-meaningful ones will be fitted with close to 0 coefficients, or with high p-value as the fit results.

    Would 18000 pieces of training data normally be enough for 500-ish predictors?

  10. #10
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    Of the 180000 observations how many have the outcome of interest? The general rule is the you take the smaller proportion group of the outcome (e.g., 50%, so 9,000) and you my be able to support a predictor for each 10-20 values in that group (so 450 to 900).

    Though big picture you seem to be fishing for results instead of making advances base on prior knowledge. You should work on building the model up. Can you get you model to run with a few predictors?
    Stop cowardice, ban guns!

  11. #11
    Points: 26, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-

    The observations with desired outcome is about 1/10 of the sample size.

    By taking a smaller proportion group of the outcome, do you mean I should pick a portion which contains similar number of desired and undesired outcome?

    I think you are right. I shouldn't be fishing for results and should try to use a few predictors first, then improve upon that.

    Thanks for you suggestions

  12. #12
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Logistic Regression predicted probability is either 1 or 0 (or literally 2.2204E-


    So if you had 18000 observation, with 1800 1s and 16200 0s then you may be powered for 90 to 180 predictors. That is a pseudo generality.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats