How to determine if the results of my test data are significant

I'm not sure if this is the right place to ask this, I have been conducting a test on 3 different soaps to determine if people wash their hands faster with one soap over another. I have about 500 data points for each soap of the amount of time some one ran a faucet for while using that soap. This is in an office setting with an average of 19 men and 15 women, about half of the data was taken in the mens restroom and the womens restroom for each soap.

Using these hypothetical results:
soap 1- average 13.32 seconds with a standard deviation of 6.7
soap 2- average 12.67 seconds with a standard deviation of 7.09
soap 3- average 13.13 with a standard deviation of 7.15

Can I say with certainty that people that use soap 2 wash their hands faster? Or is the difference too small to make a determination? I am just using Microsoft excel for this.
Should the sample size be 500?

There is about 250 data points from the mens rest room for each soap and about 250 data points from the womens rest room for each soap that was recorded over a period of a couple weeks.
But this is in an office that typically has 19 men and 15 women working. I am unclear if the sample size should be 500 or the total amount of people which is 34. In excel I just used the standard deviation of a sample function (stdev.s).

I can't post the raw data, but I found this link very helpful:
I plan on following this


Less is more. Stay pure. Stay poor.
Do you know the times for individual people along with soap type and number or washings they had? You are likely looking at needing to make comparisons using multilevel multiple linear regression modeling. This is because worker Dason may have only washed his hands with soap A and worker Spunky only used B, etc. Not controlling for individual level data you would actually be comparing Dason to Spunky and not soap A to B, since you wouldn't know how long they would have taken if using the other soap.

Though, I would tell you that given the crude parameters above you are likely not going to find a difference. Given frequentist approaches, 95% of soap group values should be within 2 standard deviation, so 0-26 seconds, do the same thing for the other groups and see quite a bit of overlap in times. It would be hard to imagine controlling for individual level data would overcome this. Also, are you desiring to control for gender? That would need to go in the above model as well.


Less is more. Stay pure. Stay poor.
Yeah, but unless you provide more details I am going to now say, what happens is Dason washed his hands once with soap A and 3 times with B, and vice versa for Spunky. We don't know how balanced the design is and whether individual level data needs to be controlled for.