What does a p-value of 0 mean in excel?

annu

New Member
#1
Hi!
I am in my first year of biomedical sciences and i currently am trying to write a paper about the difference in the incidence of a specific infectious disease between men and women.
I am using an available dataset with the data about the incidence of the disease in a specific country. I tried doing the two-sample z-test to compare the proportions of males and females and i ended up with a p-value of 0 in excel. (by increasing the number of decimal places, it is still infinitely 0 - the scientific notation even says that p value is 0,00E+00.
I am using the two sample z-test formula for porportions which we have learned so far:
p1-p2/sqrt((p*(1-p)*(1/n1 + 1/n2))
Therefore, i was wondering if i am doing something wrong. My thought process is the following:
1) i have the total number of females in the country (both infected+not infected) - n1 ; i have the total number of males in the country (infected+not infected)- n2.
2) for the p1 i divide the total number of infected females with the total number of females in the country; for the p2 i divide the total number of infected males with the total number males in the country.
3) the p in my formula is then all the infected people divided by the total population of the country
I follow the formula and then excel gives me a p-value of 0.
Am i doing something wrong or if i am correct, how is the correct way to write about it in my paper?

Thank you!
 

obh

Active Member
#2
Hi Annu,
0 means that p-value is so small that it shoes 0

The formula is of course with brackets over p1-p2:
(p1-p2)/sqrt((p*(1-p)*(1/n1 + 1/n2))

can you show the numbers? x1,x2,n1,n2?
 

annu

New Member
#3
Hi!

Thanks for your reply!!

The numbers are : x1 = 1 145 063 ; x2= 610 447; n1= 165 285 036; n2= 161 882 398

I get a z-value of 390, therefore i understand that the p value must be very very close to 0. But is excel's algorithm made so that since the z value is so large, it will just show me a p-value of 0 ? Should i in that case just write that p value is <0.05 (since im using a significance level of 0.05)?

Thank you!
 

obh

Active Member
#4
Hi Annu,

your calculation is correct.
I checked in http://www.statskingdom.com/121proportion_normal2.html
and in R

Z=390.7872. When Z>9 it is so small, so you will get zero


Code:
> prop.test(x = c(1145063,610447), n = c(165285036,161882398), alternative="two.sided", correct=TRUE)

    2-sample test for equality of proportions with continuity
    correction

data:  c(1145063, 610447) out of c(165285036, 161882398)
X-squared = 152710, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.003141092 0.003172666
sample estimates:
     prop 1      prop 2
0.006927808 0.003770929
 

obh

Active Member
#5
I believe it is important also to show the p-value (I know some think otherwise...)
You can write like in R: p-value < 2.2e-16
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Well two things to think about:

If you have the full country population, do you really need statistics? If the numbers are different then they are different. What would you be generalizing toward, you don't have a sample?

Second, a pvalue isn't really gonna tell you much, since the n-values are so large. If anything the pvalue is gonna mislead a person since they are going to see such a small number and think there is a big difference, which the difference is like 3 tenths of a percent.
So you would see this extreme of a difference with a probability of 0.0000000000000002 given there really isn't a difference, also given the n-values. However the probability of difference is below per a quick baye model. So there is a probability of 100% the difference is not equal to zero given the n-values, a non-pvalue and also less desirable metric compared to just looking at the difference..
1589650148577.png
 

obh

Active Member
#8
If you have the full country population, do you really need statistics?
Generally, you use statistics when you can't use the full population data, and you take only a sample of the data, trying to understand if the sample's results represent the population.

But I think that some times it is okay to use statistics also on the full population.
If the full population size is small, like one group of 100, you may ask if the full population results would be the same if you check the same group again. so you may look at the full population of a small group as a sample for several repeats of the same small group.

The problem of useless significance level appears only if you look only on the p-value but not on the effect size.
people (and teachers ...) tend not to look at the effect size.

Just need to look also at the effect size and all life problems will be solved :)
Using h effect size for proportion, 0.2 is a small effect. while in this case, it is 0.044. so it is very small.