+ Reply to Thread
Results 1 to 8 of 8

Thread: Are these dependent samples?

  1. #1
    Points: 2,254, Level: 28
    Level completed: 70%, Points required for next Level: 46

    Posts
    23
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Are these dependent samples?




    Hello everyone,

    I have a dataset with 6,000 respondents. A regression analysis will be run on it. Then, I will break down the dataset into three separate datasets of ~2,000 cases each (A,B,C), based on a specific variable. Regressions will then be run on each of the 3 datasets. That means, each of the cases in A B and C will be the same respondents as in the original dataset.


    Does this type of sample have a name? Would these be considered dependent samples? I would like to know what this is called so I can look up how I can correct for issues of multicollinearity, etc.

    Thanks in advance!

  2. #2
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Are these dependent samples?

    A, B and C are subsets of the original data set. Provided that these subsets are mutually exclusive, they should be independent. Note: this does NOT mean that they are random nor representative, only that they are independent. You should also consider running the regression on the complete data set, but include that "specific variable" as an Indicator/Dummy variable in the regression. That will allow you to test the significance of the "specific variable".

  3. The Following User Says Thank You to Miner For This Useful Post:

    Kabouterke (05-31-2014)

  4. #3
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Are these dependent samples?

    Quote Originally Posted by Miner View Post
    A, B and C are subsets of the original data set. Provided that these subsets are mutually exclusive, they should be independent. Note: this does NOT mean that they are random nor representative, only that they are independent. You should also consider running the regression on the complete data set, but include that "specific variable" as an Indicator/Dummy variable in the regression. That will allow you to test the significance of the "specific variable".
    I know this is one approach that is typically advocated but I'm not sure it's always the best approach. It makes slightly different assumptions about the problem than fitting separate regressions does. Keep in mind that with multiple regression we assume constant variance. So even comparing "separate regressions" to "multiple regression using a dummy variable and the interaction of the dummy with all other variables" which basically allows the different groups to have completely different regression lines ... these aren't exactly the same since in the first approach you don't assume equal variance for the different regressions but in the second approach you do.
    I don't have emotions and sometimes that makes me very sad.

  5. The Following User Says Thank You to Dason For This Useful Post:

    Kabouterke (05-31-2014)

  6. #4
    Points: 2,254, Level: 28
    Level completed: 70%, Points required for next Level: 46

    Posts
    23
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Are these dependent samples?

    Hm, this leads me to two more questions.

    1. What about the relationship beteen the main dataset and subset A? They two samples are clearly not independent, but it's not really dependent either, since they overlap. Right?
    2. I am interested in testing the difference between the coefficients of variable X in both the main dataset and in subset A. Is it possible to do this? Would the dependent t-test work here?

    Thanks for your feedback, guys. I appreciate it.
    Last edited by Kabouterke; 05-31-2014 at 12:35 PM.

  7. #5
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Are these dependent samples?

    Quote Originally Posted by Dason View Post
    I know this is one approach that is typically advocated but I'm not sure it's always the best approach. It makes slightly different assumptions about the problem than fitting separate regressions does. Keep in mind that with multiple regression we assume constant variance. So even comparing "separate regressions" to "multiple regression using a dummy variable and the interaction of the dummy with all other variables" which basically allows the different groups to have completely different regression lines ... these aren't exactly the same since in the first approach you don't assume equal variance for the different regressions but in the second approach you do.
    True. The use of indicator variables does assume equal variances, but a diagnostic review of the residuals should identify whether this assumption was not met. Then separate regressions may be run if necessary.

  8. #6
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Are these dependent samples?

    Quote Originally Posted by Kabouterke View Post
    1. What about the relationship beteen the main dataset and subset A? They two samples are clearly not independent, but it's not really dependent either, since they overlap. Right?
    2. I am interested in testing the difference between the coefficients of variable X in both the main dataset and in subset A. Is it possible to do this? Would the dependent t-test work here?
    What is your objective? Is it to test the significance of this "factor" or are you trying to validate your model using subsets of data?

  9. #7
    Points: 2,254, Level: 28
    Level completed: 70%, Points required for next Level: 46

    Posts
    23
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Are these dependent samples?

    Hi Miner. In essence, I want to test whether the difference between the country level mean (main dataset) and regional level mean (Dataset A) is significant or not. So, in other words, Ho: μ(country) = μ(region.)

  10. #8
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Are these dependent samples?


    Got it. Use Analysis of Means (ANOM). ANOM tests whether any individual mean (country) is different from the group mean (region).

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats