# Thread: Using OLS vs Probit/Logit to estimate a threshold

1. ## Using OLS vs Probit/Logit to estimate a threshold

Suppose you're a meteorologist and want to predict whether tomorrow the temperature will exceed 20 degrees. Your boss gives you data [t, x] where t is temperature and x is a matrix of relevant temperature-predicting variables.

One approach would be to simply use OLS to predict t. If t > 20 then you at least know you think there's at least a 50% chance the temperature will exceed 20 tomorrow. Post-estimation you could probably create confidence intervals surrounding your predicted t value and then get a probability estimate.

Another approach is to generate a variable g which is 0 if t < 20 and 1 if t > 20 then run a probit of g on x. That will directly give you the probability estimate you're looking for.

While the second approach seems easiest, it feels a bit clumsy to me and I wonder how reliable the estimates are.

Any thoughts on this?

Thanks!
aboluk

2. ## Re: Using OLS vs Probit/Logit to estimate a threshold

The first way seems clumsy to me too. Why not just calculate the probability the temperature exceeds 20 degrees under the assumption of normally distributed errors? Why only use the point estimate?

3. ## Re: Using OLS vs Probit/Logit to estimate a threshold

Originally Posted by Dason
The first way seems clumsy to me too. Why not just calculate the probability the temperature exceeds 20 degrees under the assumption of normally distributed errors? Why only use the point estimate?

By this you mean use OLS to obtain a point estimate (suppose the estimate exceeds 20) and then calculate the probability the true temperature is indeed greater than 20 (i.e. test the null of t = 20 vs. t >= 20)?

4. ## Re: Using OLS vs Probit/Logit to estimate a threshold

I forgot to mention my data is somewhat right skewed, this is why I am leaning towards probit

5. ## Re: Using OLS vs Probit/Logit to estimate a threshold

... how exactly does having skew lean you toward probit?

6. ## Re: Using OLS vs Probit/Logit to estimate a threshold

Because I don't care how far above the threshold the temperature is, I just want to know if it's above it.

When I use OLS I am getting estimates that are too high.

I have one explanatory variable "x1" that is by far the main determinant of temperature. My OLS predictions are consistently higher than the median temperature when I break it down by x1.

For example, here are some raw numbers from the data

x1,threshold,% above threshold for given x, OLS prediction
8,27,45%, 27.4
9,28,46%, 28.3
10,29,45%, 29.6
11,30,46%, 30.3

Therefore the OLS predictions are saying given x1=8 it is more likely temperature exceeds 27, but simply looking at the raw data, for x1=8 the temperature only exceeds 27 45% of the time. As you can see I am not very confident in OLS here.

Some points that people might want to know:
1. Sample sizes for each x1 are over 2000 so sample size is not an issue
2. The other x's are basically insignificant so there isn't some x2 that's pushing the OLS predictions up so far

Maybe saying the data is right skewed isn't quite accurate -- I have some extreme outliers on the upper end of the temperature scale that don't exist on the lower end.

7. ## Re: Using OLS vs Probit/Logit to estimate a threshold

Can you provide a histogram for the response variable given x1=8 or something like that? It might make sense to use a generalized linear model (which probit regression is a special case of) to model the actual response variable - then using that you could calculate probabilities.

But if you don't like OLS then I'm not sure probit is necessarily the best route. It might work but there is a certain interpretation of probit regression that leads me to believe that it isn't necessarily appropriate here. Essentially one way to think of probit regression is that there is some latent variable that conditioned on the predictors follows a normal distribution. We don't get to see that variable - we just get to see whether or not it exceeds a certain cutoff. This interpretation fits pretty darn well with how you would actually go about fitting the probit model - but you already said you don't like the OLS too much which doesn't necessarily rely on normally distributed error terms but we get the same results if we do assume normally distributed residuals so really... I'd say given your concerns probit isn't the route you want to go.

But like I said a generalized linear model might make sense using some other response distribution.

8. ## The Following User Says Thank You to Dason For This Useful Post:

aboluk (07-04-2012)

9. ## Re: Using OLS vs Probit/Logit to estimate a threshold

Thank you guys for your replies

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts