+ Reply to Thread
Results 1 to 7 of 7

Thread: which type of hypothesis test?

  1. #1
    Points: 4, Level: 1
    Level completed: 7%, Points required for next Level: 46

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    which type of hypothesis test?




    Hello guys,

    I have a sample of 10.000 accommodations in 2013- 2016.

    I want to test two things but I am not sure I am on the right path.

    First thing I want to test is whether in 2016 the we have less openings (new hotels/apartments) than in previous years. Will a simply t-test between the openings in 2013-2015 and 2016 be enough? Will I need before proceeding to any hypothesis test to check whether the variance is the same between the two groups (F-test)?

    The second thing I want to test whether the price for rent of the properties in 2016 is more expensive on average than those in the previous years. A simple t test between the prices in 2013-2015 and 2016 is OK?

    And in general, do I really need hypothesis testing with such a big sample?

    Thank you very much in advance

    Best,
    Diana

  2. #2
    Points: 2,462, Level: 30
    Level completed: 8%, Points required for next Level: 138

    Posts
    200
    Thanks
    20
    Thanked 48 Times in 43 Posts

    Re: which type of hypothesis test?

    Hi Diana,

    I think a t-test ist perfect in both cases. However, you have to check the assumptions for a parametric T-test, which are: (1) normality of the data for each sample (e.g., via Shapiro-Wilk test or QQ-plots), and (2) variances are the same (e.g., via Levene's test or graphically). However, since you have huge samples, I would recommend to test these assumptions graphically, since Shapiro-Wilk-test and Levene's test can be significant even if violations of homogeneity or normality are pretty small and actually could be neglected. This is because with huge samples these tests have pretty much statistical power.

    If both assumptions are met, perform a students T-test. If only homogeneity of variance is violated, you can perform a Welch's T-test. If normality is violated, you can perform a non-parametric T-test, such as the U-test or a permutation T-Test

  3. #3
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: which type of hypothesis test?

    Quote Originally Posted by mmercker View Post
    Hi Diana,

    I think a t-test ist perfect in both cases. However, you have to check the assumptions for a parametric T-test, which are: (1) normality of the data for each sample (e.g., via Shapiro-Wilk test or QQ-plots), and (2) variances are the same (e.g., via Levene's test or graphically). However, since you have huge samples, I would recommend to test these assumptions graphically, since Shapiro-Wilk-test and Levene's test can be significant even if violations of homogeneity or normality are pretty small and actually could be neglected. This is because with huge samples these tests have pretty much statistical power.

    If both assumptions are met, perform a students T-test. If only homogeneity of variance is violated, you can perform a Welch's T-test. If normality is violated, you can perform a non-parametric T-test, such as the U-test or a permutation T-Test
    Given such a large sample size, (assuming each year/grouping has a large number of cases), would normality really be a concern due to the central limit theorem? My suspicion is that normality is somewhat irrelevant in this case due to the large number of cases and the applicability of the CLT.
    If the normality assumption is to be checked, though, I would agree in using normal probability plots since the formal tests tend to be highly sensitive to immaterial departures from normality.

    I think this more so boils down to Welch's test vs student's t-test (depending on variances, which might still be less of an issue) given the CLT (and assuming this is only regarding two groups, otherwise, CLT is definitely not applicable).

    Thoughts?

  4. The Following User Says Thank You to ondansetron For This Useful Post:

    mmercker (01-18-2017)

  5. #4
    Points: 2,462, Level: 30
    Level completed: 8%, Points required for next Level: 138

    Posts
    200
    Thanks
    20
    Thanked 48 Times in 43 Posts

    Re: which type of hypothesis test?

    Thank you, ondansetron, for this very useful remark.

    Indeed, in T-Tests and simple linear regression it seems to be that we don't have to check for normality if sample sizes are sufficiently high, c.f.:

    http://www.annualreviews.org/doi/pdf....100901.140546

    So, Diana, a Welch's T-Test or a students T-test should be optimal for your data - depending on your variances

  6. #5
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: which type of hypothesis test?

    Quote Originally Posted by mmercker View Post
    Thank you, ondansetron, for this very useful remark.

    Indeed, in T-Tests and simple linear regression it seems to be that we don't have to check for normality if sample sizes are sufficiently high, c.f.:

    http://www.annualreviews.org/doi/pdf....100901.140546

    So, Diana, a Welch's T-Test or a students T-test should be optimal for your data - depending on your variances
    Now, the only caution I would offer is that, unless your sample size is quite large (several thousand, as it is here), you should still check for extreme departures from normality (although you can be less concerned with a larger sample or one that appears to be from a closer-to-normal distribution). For example, OLS is widely known to be "robust" with respect to several assumptions, including normality of the error term. In other words, the errors can depart moderately from normality and OLS will still perform well, but we should still investigate the assumption, just to be safe.

    For the t-test (in this case with presumable thousands of cases in each group), I would imagine you needn't worry too much unless the sample indicates the population may be highly nonnormal. If it is potentially a problem, I would either transform the variable of interest (in an attempt to normalize) and rerun the parametric to see how the conclusion changes, or run a non-parametric "equivalent" on the variable to see if the qualitative results are substantially different (does the t-test say mu(a) > mu(b) and does the wilcoxon rank indicate that population a is right shifted (larger values, more or less) than population b?). If they don't disagree, you can be less concerned about the assumptions (either they're violated but not enough to impact the conclusion, or they're reasonably satisfied).

    Finally, remember that the central limit theorem can't be called upon if there are more than 2 groups being compared at once (such as in an ANOVA with at least 3 groups). In that case, check all assumptions.

    Edit: I can't find a source for another issue, so I decided to remove it. Can anyone comment on the CLTs applicability to the homogeneity of variance assumption with respect to t-tests, both independent and paired? I thought I've heard the CLT affords this to be relaxed as well, but I can't find a source right now.

    UPDATE: I found a few texts I have that indicate large sample t-test (paired and independent) can relax normality and the homogeneity of variances assumption in addition to the normality assumption.
    Last edited by ondansetron; 01-18-2017 at 09:39 AM.

  7. #6
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: which type of hypothesis test?

    Now, the only caution I would offer is that, unless your sample size is quite large (several thousand, as it is here), you should still check for extreme departures from normality (although you can be less concerned with a larger sample or one that appears to be from a closer-to-normal distribution).
    IIRC Rogojel once presented a simulation study here on talkstats which demonstrated that regression results are robust even in case of very non-normal residuals, if n > 40 or so.

    Finally, remember that the central limit theorem can't be called upon if there are more than 2 groups being compared at once (such as in an ANOVA with at least 3 groups).
    That is a special case of the general linear model, so the same principles as with linear regression apply (the residuals from the ANOVA should preferably be normally distributed, but with large enough sample size, the CLT guarantees robustness of the F-test).

    With kind regards

    K.

  8. The Following User Says Thank You to Karabiner For This Useful Post:

    ondansetron (01-18-2017)

  9. #7
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: which type of hypothesis test?


    I'm going to follow up with this since I had further interest in it. Given that we don't know how many observations you'll have in each group of the test, we can't give you a much better answer (slight imbalances leave the t-test fairly robust in large sample sizes, but larger imbalances with unequal variances might be an issue).

    I'll post this thread from stackexchange that was pretty interesting and gives arguments for both sides (using Welch's test vs. Student's vs. Wilcoxon's). Essentially, you can post some output on here for guidance, or you can decide on your own, but you'll have to get a feel for your data in terms of the sample variance for each group and the number in each group of the test.

    http://stats.stackexchange.com/quest...edirect=1&lq=1

    Good luck!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats