+ Reply to Thread
Results 1 to 8 of 8

Thread: Using OLS vs Probit/Logit to estimate a threshold

  1. #1
    Points: 76, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Using OLS vs Probit/Logit to estimate a threshold




    Suppose you're a meteorologist and want to predict whether tomorrow the temperature will exceed 20 degrees. Your boss gives you data [t, x] where t is temperature and x is a matrix of relevant temperature-predicting variables.

    One approach would be to simply use OLS to predict t. If t > 20 then you at least know you think there's at least a 50% chance the temperature will exceed 20 tomorrow. Post-estimation you could probably create confidence intervals surrounding your predicted t value and then get a probability estimate.

    Another approach is to generate a variable g which is 0 if t < 20 and 1 if t > 20 then run a probit of g on x. That will directly give you the probability estimate you're looking for.

    While the second approach seems easiest, it feels a bit clumsy to me and I wonder how reliable the estimates are.

    Any thoughts on this?

    Thanks!
    aboluk

  2. #2
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    The first way seems clumsy to me too. Why not just calculate the probability the temperature exceeds 20 degrees under the assumption of normally distributed errors? Why only use the point estimate?
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    Points: 76, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    Quote Originally Posted by Dason View Post
    The first way seems clumsy to me too. Why not just calculate the probability the temperature exceeds 20 degrees under the assumption of normally distributed errors? Why only use the point estimate?

    By this you mean use OLS to obtain a point estimate (suppose the estimate exceeds 20) and then calculate the probability the true temperature is indeed greater than 20 (i.e. test the null of t = 20 vs. t >= 20)?

  4. #4
    Points: 76, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    I forgot to mention my data is somewhat right skewed, this is why I am leaning towards probit

  5. #5
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    ... how exactly does having skew lean you toward probit?
    I don't have emotions and sometimes that makes me very sad.

  6. #6
    Points: 76, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    Because I don't care how far above the threshold the temperature is, I just want to know if it's above it.

    When I use OLS I am getting estimates that are too high.

    I have one explanatory variable "x1" that is by far the main determinant of temperature. My OLS predictions are consistently higher than the median temperature when I break it down by x1.

    For example, here are some raw numbers from the data

    x1,threshold,% above threshold for given x, OLS prediction
    8,27,45%, 27.4
    9,28,46%, 28.3
    10,29,45%, 29.6
    11,30,46%, 30.3

    Therefore the OLS predictions are saying given x1=8 it is more likely temperature exceeds 27, but simply looking at the raw data, for x1=8 the temperature only exceeds 27 45% of the time. As you can see I am not very confident in OLS here.

    Some points that people might want to know:
    1. Sample sizes for each x1 are over 2000 so sample size is not an issue
    2. The other x's are basically insignificant so there isn't some x2 that's pushing the OLS predictions up so far

    Maybe saying the data is right skewed isn't quite accurate -- I have some extreme outliers on the upper end of the temperature scale that don't exist on the lower end.

  7. #7
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold

    Can you provide a histogram for the response variable given x1=8 or something like that? It might make sense to use a generalized linear model (which probit regression is a special case of) to model the actual response variable - then using that you could calculate probabilities.

    But if you don't like OLS then I'm not sure probit is necessarily the best route. It might work but there is a certain interpretation of probit regression that leads me to believe that it isn't necessarily appropriate here. Essentially one way to think of probit regression is that there is some latent variable that conditioned on the predictors follows a normal distribution. We don't get to see that variable - we just get to see whether or not it exceeds a certain cutoff. This interpretation fits pretty darn well with how you would actually go about fitting the probit model - but you already said you don't like the OLS too much which doesn't necessarily rely on normally distributed error terms but we get the same results if we do assume normally distributed residuals so really... I'd say given your concerns probit isn't the route you want to go.

    But like I said a generalized linear model might make sense using some other response distribution.
    I don't have emotions and sometimes that makes me very sad.

  8. The Following User Says Thank You to Dason For This Useful Post:

    aboluk (07-04-2012)

  9. #8
    Points: 76, Level: 1
    Level completed: 52%, Points required for next Level: 24

    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Using OLS vs Probit/Logit to estimate a threshold


    Thank you guys for your replies

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats