Squared Error vs Absolute Error loss functions

#1
The two most popular types of loss functions are

1) squared error: (actual-estimate)^2 --> best estimate is the mean
2) absolute error: |actual-estimate| --> best estimate is the median

I have two questions.

1) Why do people use the squared error method? The absolute error method makes much more intuitive sense. You get the difference between the actual and the estimate. Plain and simple. If you square the difference, then won't you get "warped" values depending on the size of the difference?

2) This also got me thinking about what is "expected value." Expected value is defined as the mean. However, the best estimate under the absolute error loss function is the median. So is the "expected value" the median? I would very much appreciate it if someone can help me clarify my thinking. Thanks.
 
#2
I know when 'actual' and 'estimate' are vector quantities there are easier to explain reasons you might want one over the other.

In the scalar situation it is less obvious but you captured a bit of it with:
'If you square the difference, then won't you get "warped" values depending on the size of the difference?'

One mans warping is another mans 'realistic'

For example in the absolute value this statement is true:
Seven 1-unit losses are just as bad as one 7-unit losses.

Which depending on the application may not as closely characterize peoples opinions as:

One 7-unit loss is just as bad as forty-nine 1-unit losses.

The reason is maybe there are grave implications to getting far from the center? While being near center can be happily absorbed.

That sort of thing.
 

Dragan

Super Moderator
#3
The two most popular types of loss functions are

1) squared error: (actual-estimate)^2 --> best estimate is the mean
2) absolute error: |actual-estimate| --> best estimate is the median

Why do people use the squared error method?
Let's remember that, (1), OLS provides unique unbiased linear estimates available in closed form. And, I would point out that if the distribution of the error terms is normally distributed then MLE estimates are the same (asymptotically) as the OLS estimates.

Other methods, (2), based on absolute deviations are useful techinques e.g. Robust Regression, but require iterative solutions to estimates that are, in general, neither unique nor available in closed form and can be computationally expensive.