# How to calculate p-value of two-sample t-test

#### randomcat

##### New Member
I have 2 independent data sets, and I know the following about each of them: mean, SD, and sample size. I calculated the t-statistic just fine

Code:
my.t.test<-function(mu1, mu2, sd1, sd2, n1,n2){
t=(mu1-mu2)/sqrt((sd1)^2/n1+(sd2)^2/n2)
return(t)
}

I know that the degrees of freedom is n1+n2-2, now I just need to calculate the p-value. I tried just pnorm(t) but that didn't seem like it was correct. I only know the mean, SD, and sample size. I don't know the actual values of each observation; that's why I didn't use the t.test function.

#### GretaGarbo

##### Human
Try the pt()-function. The distribution function for the t-variable

#### Dason

I've made this function before. Mine is a bit more complicated because I try to make the output identical to using t.test on raw data but if you want this is the code I have

Code:
fsttest <- function(mean1, mean2, sd1, sd2, n1, n2,
var.equal = FALSE, alternative = "two.sided",
conf.level = .95, simulate = FALSE)
{

if(simulate){
# Generate normally distributed data with the
# specified mean and standard deviation
# for each group
x <- scale(rnorm(n1))*s1 + mean1
y <- scale(rnorm(n2))*s2 + mean2

out <- t.test(x, y,
var.equal = var.equal,
alternative = alternative,
conf.level = conf.level)
return(out)
}

## Otherwise we'll compute everything ourselves...

if(var.equal){
se <- sqrt((s1^2*(n1-1) + s2^2*(n2-1))/(n1+n2-2)) * sqrt(1/n1 + 1/n2)
df <- n1 + n2 - 2
}else{
se <- sqrt(s1^2/n1 + s2^2/n2)
# The Welch–Satterthwaite equation gives us
# the appropriate df for this variance estimate
num <- (s1^2/n1 + s2^2/n2)^2
den <- (s1^4/(n1^2*(n1-1)) + s2^4/(n2^2*(n2-1)))
df <- num/den
}

tstat <- (mean1 - mean2)/se

## TODO: test what happens if given not one of these
p <- switch(alternative,
"two.sided" = 2 * pt(-abs(tstat), df),
"greater"   = 1 - pt(tstat, df),
"less"      = pt(tstat, df),
NA
)

quantile <- 1 - (1 - conf.level)/2
confint <- (mean1 - mean2) + c(-1, 1) * qt(quantile, df) * se

# Match the output from t.test
out <- list(statistic = c(t = tstat),
parameter = c(df = df),
p.value = p,
conf.int = structure(confint, conf.level = conf.level),
estimate = c("mean of x" = m1, "mean of y" = m2),
null.value = c("difference in means" = 0),
alternative = alternative,
method = ifelse(var.equal,
" Two Sample t-test",
"Welch Two Sample t-test"),
data.name = "x and y")

class(out) <- "htest"

return(out)
}
with some examples
Code:
x <- rnorm(20)
y <- rnorm(20, 0, 5)
m1 <- mean(x)
m2 <- mean(y)
s1 <- sd(x)
s2 <- sd(y)
n1 <- length(x)
n2 <- length(y)
o1 <- t.test(x, y, var.equal = T)
o2 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = T)
o3 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = T, simulate = T)
o4 <- t.test(x, y, var.equal = F)
o5 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = F)
o6 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = F, simulate = T)
all.equal(o1, o2)
all.equal(o1, o3)
all.equal(o4, o5)
all.equal(o4, o6)

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Dason, I know you pride yourself on clean code and perfection, must be a vestigial robot thing.

Should you not use x and y, since the defined y is not really a dependent variable?