I am new to stats and this may be very easy for Seniors. In my regression analysis I want to find, how “% change” is affected by other variables, the day of the month (Sunday, Monday etc.) and the occurrence number of that that day in that particular month (first occurrence of Sunday=1, second occurrence=2 so on and so forth for other days too). Please note and not confuse yourself by thinking the number as the week number in which the particular days is in (Sunday of week one, Sunday of week two and so on).
I have few questions here:
1) In the dataset there are some “% change” values that are deviate from the small values that it generally has (5% and over are considered as outliers). These large values are not errors but are part of the business and they have occurrences now and then. Should I remove large values in regression? Will their make presence affect my regression? If you suggest reducing the value of the large values, how should I do that? Also, you will see on Saturday, Sunday and holidays the %change is 0% as there is no business on that day. Should I remove this too in the analysis? How will it affect if removed?
Also I am using Linear Regression hope it is the type of regression that I should use. Correct me if I am wrong.
The data below has data of 4 years of Month of April 2007, 2008, 2009 and 2010. Ignore the 1st column as I just gave it for your reference. Want to find how column 4 is affected by column 2, 3. So, basically I want to find if I can get some pattern out of April and make predictions about April 11, April 12 and so on.