What is your actual study question? Write it out.
What is your actual hypothesis given the question?
Hi everybody,
I'm really new on this topic of statistical testing and I hope to find some help here.
I was trying to figure out which test I need for analyzing my data. However, I guess by reading that much I confused myself.
This is my data:
I analyzed two different retailers with respect to the avialbility of their products for 3 months. I would now like to test, if the difference in availability is statistically significant. First I thought I would use a independant sample t-test... But now I'm confused.
Because: I think it does not make sense to compare the average number of available products, as the number of total products at a retailer are different.
Let's say:
Retailer A: day 1: 12 of 14 products available, day 2: 15 of 18 products available, ..
Retailer B: day 1: 36 of 40 products available, day 2: 38 of 41 products available, ..
First I thought I could test the average of available products, so summing up 12+15+ ... /90 for retailer A and 36+38+.../90 for retailer B.
But as the total number of products differs each day, I think I better use percentages?
Because I think just comparing the average number of available products does not make a lot of sense as one retailer in general offers more products than the other.
So,
Retailer A: day 1: 85% available, day 2: 83% available , ...
Retailer B: day 1: 90% available, day 2 92% available, ...
Can I compare the means of these percentages with a t-test? Because I saw people also used Chi-Square, but this is only for non-metric data ...
And I think I can't use a "normal" mean, but a weighted one, right?
As you see, I'm really new to the topic! Therefore, I'm thankful for any comment!!
Thank you very much!!
Last edited by tesarolle; 07-13-2016 at 05:14 AM.
What is your actual study question? Write it out.
What is your actual hypothesis given the question?
Stop cowardice, ban guns!
Thank you!
You're right, I was not clear, I'm sorry!
I would like to see two things.
1)if differences in average number of products are statistically relevant between the tow retailers
2) if differnces in average percentage of products that are available at a retailer are relevant
For the first one, I clearly would use unpaired sample t-test, but for the second I am not sure, if
- using percentages is correct as amount of products differs each day
- a t-test can be applied to those percentages
Hope that was clearer now Thank you!
Whether percentages are o.k. or not is up to you to decide.
If the % represents what you want to know about, then use it.
I do not know what you mean by statistically relevant, but1)if differences in average number of products are statistically relevant between the tow retailers
anyway, do you mean by "availability" the sheer number of
available products, or do you mean the differences between
products and available products?
You can use either as DV, using t-test or U-test. But
since you have day-to-day data, observations within
groups are possibly not independent. Or does availability
(number of available products/number of non-available
products/ % of available products) on day i NOT affect
availabilty on day i+1?
With kind regards
K.
Dear K. and all the others!
Thank you very much. I somehow get the feeling this is more complicated than I thought..
My data regarding products listed looks like this (A = retailer A, B = retailer B)
day A B
19-May 36 11
20-May 38 20
21-May 39 19
and so on.
Regarding question 1
I meant significant.. I'm sorry, actually I'm German.. struggles me a bit to correctly express what I'm thinking
So, I observed that on average for 3 months at one retailer 38 products were listed and on the other 17;
and now I'm doing a t-test to see if the differece is statistically significant as one retailer is a supermarket and the other a drugstore. So there might be differences. That's the first thing I would like to do.
Data set for the second question regarding availability looks the same, showing only number of products available for purchase --> sheer number, not difference
days A B
19-May 30 11
20-May 31 18
21-May 32 18
Now, I think I have various options. As the amount of listed products varies each day, I thought when comparing availability, it would be useful to compare average percentage of available products between retailers.. So, calculating percentages
days A B
19-May 83,33% 100% of products available for purchase
20-May 81,57 % 90% of products available for purchase
21-May 82,05% 94,7% of products available for purchase
Now, I would compare the average percentage of available products between those retailers because they have different stock order management systems.
First question on calculating avergaes of percentages:
Is it simply (83,33+91,57+82,05)/3 in order to get the average or do I have to take account for the differing number of products generally listed each day.. ? arithmetic mean or geometric mean?
Besides that, no I obersved that on average at retailer A 82,3% of its products were available and at retailer B 94,9% were available.
That are the two means I would like to test and use a t-test, right?
I would not agree .. because that depends on the stock of the retailer and the pruchasing behaviour of shoppers.. if he had odered a lot // if people do not buy the product, the retailer would never run out of stock.. however, when being out of stock on day i, for sure another day i + 3 it may will be on stock again, here being out of stock influences being in stock later again.. I'm a bit consfused as I don't know why this is important.
Many many thank to you!!
You may see I'm kind of lost here..
The reason is, I first gathered the data and told my advicer that there is no sense in statistical testing to which she agreed...however, when presenting here what I already did (will hand in my paper in 2 weeks), she said I have to test the data somehow -> depends to me, what to test; but if I wont do a statistical test, I will get a bad mark.. so it's not the typical way of first thinking what to do and then do it but rather to see what I can do with my data that makes at least somehow sense...
THANK YOU!
Because of the assumptions underlying the t-test.however, when being out of stock on day i, for sure another day i + 3 it may will be on stock again, here being out of stock influences being in stock later again.. I'm a bit consfused as I don't know why this is important.
The calculation of the p-values assumes independent
observations. Seemingly, your observations (or, correctely
speaking, the errors/residuals) are not independent, but
autocorrelated, and the standard errors used for the
calculations are not correctely estimated.
One remedy could be to use only a subset of your data,
e.g. every 4th day or so, so that observations are
less correlated. Unfortunately, this would hughely
reduce the sample size and therefore the power of
the test.
Ok, so maybe you just do it incorrectely and performbut if I wont do a statistical test, I will get a bad mark..
t-tests without considering bias through autocorrelation...
Dunno.
With kind regards
K.
Tweet |