How to deal with outliers in stock returns? OLS regression January effect

#1
Hey everyone!
So I am conducting an analysis on daily stock prices of a stock to analyze whether there is a particular month which returns are higher/lower than others (Month of the year effect / January effect)

The thing is, during the period of study from 2000 to 2015 with thousands of observations the returns of this stock are relatively close except for one day in 2001 where the stock plummeted 150%!

This outlier is skewing the results of the OLS coefficients badly, and leading to very high kurtosis and skewness numbers!

Should I keep the outlier and interpret the results as they are, or remove it?

How does one generally deal with outliers in stock returns?

Thanks in advance!
 

noetsi

Fortran must die
#2
There are many solutions, which is best I don't know.

1) Robust regression
2) Putting another value in for the outlier that seems reasonable to you.
3) Creating a dummy variable that takes on a value of 1 when there is an outlier (I don't really understand this one).

1 is probably best but is very different than OLS.

I am amazed with thousands of point one makes that much difference. What is the DFBETA for the point. Have you run leverage analysis?
 

ondansetron

TS Contributor
#3
Hey everyone!
So I am conducting an analysis on daily stock prices of a stock to analyze whether there is a particular month which returns are higher/lower than others (Month of the year effect / January effect)

The thing is, during the period of study from 2000 to 2015 with thousands of observations the returns of this stock are relatively close except for one day in 2001 where the stock plummeted 150%!

This outlier is skewing the results of the OLS coefficients badly, and leading to very high kurtosis and skewness numbers!

Should I keep the outlier and interpret the results as they are, or remove it?

How does one generally deal with outliers in stock returns?

Thanks in advance!
I would not do anything. Outliers are expected and as long as this value is truly from the population of interest, it is OK; you should not remove outliers because they make things look different than you want. The idea is to accurately represent a phenomenon which may include a lot of typical values and some really wacky ones. A poor fitting model due to outliers might tell you that you have the wrong model, especially if the outliers are real values. Long story short, deleting outliers because they are outliers is never the answer. Deleting them because of data entry (correct if possible), because they are from a different population, or because they are impossible are basically the only reasons to delete them.
 

noetsi

Fortran must die
#4
If an outlier was a one time event that is unlikely to reoccur, like COVID19 happened and you don't think it will be there next year - a building burned down that is unlikely to reoccur, then I think you should modify an outlier. Time series varies from regression in that regard I think. Structural breaks are a reality.
 

ondansetron

TS Contributor
#5
If an outlier was a one time event that is unlikely to reoccur, like COVID19 happened and you don't think it will be there next year - a building burned down that is unlikely to reoccur, then I think you should modify an outlier. Time series varies from regression in that regard I think. Structural breaks are a reality.
I think your post actually makes the argument to leave the outlier as is because it does represent reality and you have no hope to properly understand or model the process if you take out outliers because they are "rare" (which is a loose definition for an outlier).
 

noetsi

Fortran must die
#6
The problem with leaving it in if you are interested in prediction not understanding the results (and I think that is what most in time series are interested in) is that you will get worse results if you leave it in.

If you want to understand its impact running the model with it and without it and seeing the results is probably better.