+ Reply to Thread
Results 1 to 13 of 13

Thread: Simple regerssion

  1. #1
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Simple regerssion



    I am new to stats and this may be very easy for Seniors. In my regression analysis I want to find, how “% change” is affected by other variables, the day of the month (Sunday, Monday etc.) and the occurrence number of that that day in that particular month (first occurrence of Sunday=1, second occurrence=2 so on and so forth for other days too). Please note and not confuse yourself by thinking the number as the week number in which the particular days is in (Sunday of week one, Sunday of week two and so on).

    I have few questions here:

    1) In the dataset there are some “% change” values that are deviate from the small values that it generally has (5% and over are considered as outliers). These large values are not errors but are part of the business and they have occurrences now and then. Should I remove large values in regression? Will their make presence affect my regression? If you suggest reducing the value of the large values, how should I do that? Also, you will see on Saturday, Sunday and holidays the %change is 0% as there is no business on that day. Should I remove this too in the analysis? How will it affect if removed?
    Also I am using Linear Regression hope it is the type of regression that I should use. Correct me if I am wrong.

    The data below has data of 4 years of Month of April 2007, 2008, 2009 and 2010. Ignore the 1st column as I just gave it for your reference. Want to find how column 4 is affected by column 2, 3. So, basically I want to find if I can get some pattern out of April and make predictions about April 11, April 12 and so on.

    Date Occurrence # Day %change

    04/01/2007 1 Sunday 0.00%
    04/02/2007 1 Monday 3.57%
    04/03/2007 1 Tuesday -1.32%
    04/04/2007 1 Wednesday-1.88%
    04/05/2007 1 Thursday -1.95%
    04/06/2007 1 Friday 4.05%
    04/07/2007 1 Saturday 0.00%
    04/08/2007 2 Sunday 0.00%
    04/09/2007 2 Monday 1.59%
    04/10/2007 2 Tuesday 1.27%
    04/11/2007 2 Wednesday-0.64%
    04/12/2007 2 Thursday -0.39%
    04/13/2007 2 Friday -1.01%
    04/14/2007 2 Saturday 0.00%
    04/15/2007 3 Sunday 0.00%
    04/16/2007 3 Monday 3.77%
    04/17/2007 3 Tuesday 1.30%
    04/18/2007 3 Wednesday-1.47%
    04/19/2007 3 Thursday -0.40%
    04/20/2007 3 Friday 0.15%
    04/21/2007 3 Saturday 0.00%
    04/22/2007 4 Sunday 0.00%
    04/23/2007 4 Monday 1.01%
    04/24/2007 4 Tuesday -9.59%
    04/25/2007 4 Wednesday-1.07%
    04/26/2007 4 Thursday -0.60%
    04/27/2007 4 Friday -0.88%
    04/28/2007 4 Saturday 0.00%
    04/29/2007 5 Sunday 0.00%
    04/30/2007 5 Monday 5.31%
    04/01/2008 1 Tuesday 0.29%
    04/02/2008 1 Wednesday-0.28%
    04/03/2008 1 Thursday -1.48%
    04/04/2008 1 Friday 1.25%
    04/05/2008 1 Saturday 0.00%
    04/06/2008 1 Sunday 0.00%
    04/07/2008 1 Monday 2.03%
    04/08/2008 2 Tuesday -1.49%
    04/09/2008 2 Wednesday-0.93%
    04/10/2008 2 Thursday -0.39%
    04/11/2008 2 Friday -0.21%
    04/12/2008 2 Saturday 0.00%
    04/13/2008 2 Sunday 0.00%
    04/14/2008 2 Monday 2.67%
    04/15/2008 3 Tuesday -1.60%
    04/16/2008 3 Wednesday-1.24%
    04/17/2008 3 Thursday -0.60%
    04/18/2008 3 Friday 0.12%
    04/19/2008 3 Saturday 0.00%
    04/20/2008 3 Sunday 0.00%
    04/21/2008 3 Monday -0.10%
    04/22/2008 4 Tuesday -1.13%
    04/23/2008 4 Wednesday-1.52%
    04/24/2008 4 Thursday -1.07%
    04/25/2008 4 Friday 0.48%
    04/26/2008 4 Saturday 0.00%
    04/27/2008 4 Sunday 0.00%
    04/28/2008 4 Monday 3.65%
    04/29/2008 5 Tuesday -0.66%
    04/30/2008 5 Wednesday-1.69%
    04/01/2009 1 Wednesday0.57%
    04/02/2009 1 Thursday -1.19%
    04/03/2009 1 Friday 0.41%
    04/04/2009 1 Saturday 0.00%
    04/05/2009 1 Sunday 0.00%
    04/06/2009 1 Monday 1.60%
    04/07/2009 1 Tuesday 0.30%
    04/08/2009 2 Wednesday-0.96%
    04/09/2009 2 Thursday -0.45%
    04/10/2009 2 Friday 10.88%
    04/11/2009 2 Saturday 0.00%
    04/12/2009 2 Sunday 0.00%
    04/13/2009 2 Monday 1.40%
    04/14/2009 2 Tuesday -0.17%
    04/15/2009 3 Wednesday-0.88%
    04/16/2009 3 Thursday -1.44%
    04/17/2009 3 Friday 0.69%
    04/18/2009 3 Saturday 0.00%
    04/19/2009 3 Sunday 0.00%
    04/20/2009 3 Monday 1.76%
    04/21/2009 3 Tuesday 0.35%
    04/22/2009 4 Wednesday-1.35%
    04/23/2009 4 Thursday -0.89%
    04/24/2009 4 Friday 0.68%
    04/25/2009 4 Saturday 0.00%
    04/26/2009 4 Sunday 0.00%
    04/27/2009 4 Monday 1.84%
    04/28/2009 4 Tuesday 1.86%
    04/29/2009 5 Wednesday-0.59%
    04/30/2009 5 Thursday 0.04%
    04/01/2010 1 Thursday -0.84%
    04/02/2010 1 Friday -0.69%
    04/03/2010 1 Saturday 0.00%
    04/04/2010 1 Sunday 0.00%
    04/05/2010 1 Monday 1.59%
    04/06/2010 1 Tuesday 0.58%
    04/07/2010 1 Wednesday0.67%
    04/08/2010 2 Thursday -0.19%
    04/09/2010 2 Friday -0.18%
    04/10/2010 2 Saturday 0.00%
    04/11/2010 2 Sunday 0.00%
    04/12/2010 2 Monday 1.26%
    04/13/2010 2 Tuesday -0.57%
    04/14/2010 2 Wednesday-1.20%
    04/15/2010 3 Thursday -1.62%
    04/16/2010 3 Friday -0.31%
    04/17/2010 3 Saturday 0.00%
    04/18/2010 3 Sunday 0.00%
    04/19/2010 3 Monday 1.23%
    04/20/2010 3 Tuesday 1.72%
    04/21/2010 3 Wednesday-1.03%
    04/22/2010 4 Thursday -0.79%
    04/23/2010 4 Friday -0.81%
    04/24/2010 4 Saturday 0.00%
    04/25/2010 4 Sunday 0.00%
    04/26/2010 4 Monday -0.20%
    04/27/2010 4 Tuesday 0.08%
    04/28/2010 4 Wednesday-0.28%
    04/29/2010 5 Thursday 0.93%
    04/30/2010 5 Friday -0.79%


    If you can please help me with this example other similar examples would be a cake walk.
    Eagerly waiting for your replies

    Regards,
    Amandeep
    Last edited by amandeepsharma89; 06-04-2012 at 04:26 PM. Reason: got to put information that will be useful for the person

  2. #2
    IBM Rules
    Points: 13,389, Level: 75
    Level completed: 35%, Points required for next Level: 261
    Awards:
    Activity Award

    Posts
    2,598
    Thanks
    124
    Thanked 381 Times in 369 Posts

    Re: Simple regerssion

    You shouldn't eliminate data points that are valid (that is ones which were not created by say a mistake in coding) without an extremely good reason. Even if they are outliers.
    "Facts are stubborn things, but statistics are more pliable." Mark Twain

  3. The Following User Says Thank You to noetsi For This Useful Post:

    amandeepsharma89 (06-09-2012)

  4. #3
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Thank you Noetsi, what about Saturday and Sunday they have always 0% change can I remove them? Also, linear regression should be the way right?

    Amandeep

  5. #4
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    I also had a question here, do I have to treat Saturday, Sunday, Monday etc as different variables? Or have day as a single variable???
    Waiting for you replies, in a fix.

    Regards.
    Amandeep

  6. #5
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Also, should day occurrence number be treated as a different variable like 1,2,3 etc as different variable.

  7. #6
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Please Seniors in a fix and have to solve it really quick.
    Waiting for replies.

    Amandeep

  8. #7
    Points: 315, Level: 6
    Level completed: 30%, Points required for next Level: 35

    Location
    Bloomington, IN
    Posts
    32
    Thanks
    0
    Thanked 10 Times in 9 Posts

    Re: Simple regerssion

    Amandeep,

    If the business is closed on Sat and Sun, and if that is why %Deviation is 0.0 on weekends, then they probably should NOT be included in your analysis.

    Do not otherwise modify your data. Outliers are outliers, and should be included. There are more advanced ways to deal with them that are simply not worth your time exploring until you understand the basics of regression better.

    Presumably, you have a theory for why the day of the week, and the number occurences of that day in a given month, affect Percent Deviation. If so, then your model will look something like this:

    PctDev = a + b1*Monday + b2*Tuesday + b3*Wednesday + b4*Thursday + b5*occ1 + b6*occ2 + b7*occ3 + b8*occ4 + error

    After you've obtained your estimates, you can check them for robustness to outliers by removing said outliers one or two at a a time and rerunning your regression and compare your results to the model as it runs on the full data set. Or you can use your proposed decision rule "exclude if PctDev > 5.0" and see if your estimates change. This will tell you whether the outliers are driving your results. If so, that is not a failure of the data or the model; rather it is something you need to include in your report.

  9. The Following User Says Thank You to eyesack_kn For This Useful Post:

    amandeepsharma89 (06-09-2012)

  10. #8
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Hi eyesack_kn that is a very good explanation to my query. Thanks a lot for your prompt reply. I got what I have to do. I have one problem now i.e. the restructuring of the data to put in a tool. I my post you saw how the data is structured. How would you suggest me to structure it so that I can run it in excel or weka.

    How should I structure it so that I can analyze it in the software. It would be of great help if you can reply this, the reason being I have to analyze tons of similar data and make a report out of it.
    Also, let me know if it is fine if I PMed you.
    Waiting fro your prompt reply.

    Regards,
    Amandeep

  11. #9
    Points: 315, Level: 6
    Level completed: 30%, Points required for next Level: 35

    Location
    Bloomington, IN
    Posts
    32
    Thanks
    0
    Thanked 10 Times in 9 Posts

    Re: Simple regerssion

    No PM'ing unless I'm getting paid to consult =)

    Presumably you wish to create a bunch of indicator variables for each possible value of "occurrence" and "day of the week". A series of columns in excel with "=if(day = "Sunday",1,0)" would work. I don't know anything about weka.

  12. #10
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Thanks a lot eyesack_kn, can you provide me with an sample for only a month say April 2007 of how to create it for occurrence and day and match the corresponding %change value. I can take it forward from there. I would be of great help, if can attach such an example.

    Regards,
    Amandeep

  13. #11
    Points: 315, Level: 6
    Level completed: 30%, Points required for next Level: 35

    Location
    Bloomington, IN
    Posts
    32
    Thanks
    0
    Thanked 10 Times in 9 Posts

    Re: Simple regerssion

    Not sure I understand what you're asking. I'm not going to write out all the commands you would need to do this in Excel; you should be able to figure out how to do that based on what I said above. If you're asking what the data would look like, it would be N rows of indicator variables taking on the value of 1 for the occurence and day of that observation, and zero everywhere else. You could also create dummies for month and/or the year, if you wanted. The rest is just Excel functions though, and Excel has an excellent help file to assist you with those matters. Good luck!

  14. The Following User Says Thank You to eyesack_kn For This Useful Post:

    amandeepsharma89 (06-09-2012)

  15. #12
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion

    Ok got it. That is what I was asking for. Thanks a ton again!!

    Regards,
    Amandeep

  16. #13
    Points: 186, Level: 3
    Level completed: 72%, Points required for next Level: 14

    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Simple regerssion


    Hello Seniors,
    Thanks for your help. I was able to solve the problem and get the result as following


    Multiple R 0.408429271
    R Square 0.16681447
    Adjusted R Square 0.079284326
    Standard Error 0.008296363
    Observations 108

    ANOVA
    df SS MS F Significance F
    Regression 10 0.001364278 0.000136428 2.477634319 0.010937384
    Residual 99 0.006814135 6.88296E-05
    Total 109 0.008178412

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
    Intercept 0.000987554 0.002315966 0.426411062 0.670734106 -0.003607826 0.005582933 -0.003607826 0.005582933
    X Variable 1 0 0 65535 #NUM! 0 0 0 0
    X Variable 2 -0.001625844 0.002560313 -0.63501741 0.526881998 -0.006706061 0.003454373 -0.006706061 0.003454373
    X Variable 3 -0.000170694 0.002534512 -0.067347844 0.946440618 -0.005199715 0.004858327 -0.005199715 0.004858327
    X Variable 4 -0.001044758 0.002534512 -0.41221271 0.681074747 -0.006073779 0.003984263 -0.006073779 0.003984263
    X Variable 5 0.00552943 0.002534512 2.181655038 0.031501592 0.000500409 0.010558451 0.000500409 0.010558451
    X Variable 6 0 0 65535 #NUM! 0 0 0 0
    X Variable 7 0.000847719 0.002346566 0.361259256 0.718675457 -0.003808377 0.005503814 -0.003808377 0.005503814
    X Variable 8 -0.003499521 0.002346566 -1.491337328 0.139052552 -0.008155617 0.001156574 -0.008155617 0.001156574
    X Variable 9 -0.004545338 0.002346566 -1.937016822 0.055593767 -0.009201433 0.000110758 -0.009201433 0.000110758
    X Variable 10 0.002046103 0.003378997 0.605535541 0.546208916 -0.004658561 0.008750767 -0.004658561 0.008750767

    There were some dependency which were correctly shown. However, the value of R^2 is really less. What does that mean. How should I interpret that and write in the report. I know that less R^2 means less efficient.

    Can someone please elaborate on the above result and explain the R^2 value.

    Regards,
    Amandeep Sharma

+ Reply to Thread

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats