Having more data is useful, but forecasting error can be tied to many sources. It could be structural breaks in your data, violations of assumptions, the wrong predictors, outliers etc. Depending on the type of data you have, cross sectional versus time series, it could be tied to the wrong type of regression (that is not dealing with autoregression which linear regression usually does not address).
What are you trying to predict and what type of data do you have?
I am basically tracking the number of errors/issues logged and keeping a track of whether its normal or not (time series data set). Using linear regression model to predict the errors for the next day. Model is trained on daily basis.
Do you recommend - i filter out out-liners ? Looking forward to your recommendation.
Experts recommend that you not use linear regression with time series data because of autocorrelation. That is bad enough, if your DV and IV have trends in them and are not cointegrated you will have bias in your model which is worse.
If you are predicting error for the next day you likely have hundreds of data points (because even a year would be be 365 data points). Why not try exponential smoothing for prediction. This is a simple type of time series which historically has been shown to be accurate and robust to violations of assumptions. Once you learn it it is easy to do. Look for Holt Winters on the internet.
If you meant outliers I am conflicted on that point. Many statisticians, which I am not, reccomend against removing them. But it will distort your results according to one source I read which makes sense to me. So I do it - but I say this knowing many strongly disagree.