How to calculate p-value of two-sample t-test

#1
I have 2 independent data sets, and I know the following about each of them: mean, SD, and sample size. I calculated the t-statistic just fine

Code:
my.t.test<-function(mu1, mu2, sd1, sd2, n1,n2){
  t=(mu1-mu2)/sqrt((sd1)^2/n1+(sd2)^2/n2)
  return(t)
}

I know that the degrees of freedom is n1+n2-2, now I just need to calculate the p-value. I tried just pnorm(t) but that didn't seem like it was correct. I only know the mean, SD, and sample size. I don't know the actual values of each observation; that's why I didn't use the t.test function.
 

Dason

Ambassador to the humans
#3
I've made this function before. Mine is a bit more complicated because I try to make the output identical to using t.test on raw data but if you want this is the code I have

Code:
fsttest <- function(mean1, mean2, sd1, sd2, n1, n2, 
                    var.equal = FALSE, alternative = "two.sided", 
                    conf.level = .95, simulate = FALSE)
{
    
    if(simulate){
        # Generate normally distributed data with the
        # specified mean and standard deviation
        # for each group
        x <- scale(rnorm(n1))*s1 + mean1
        y <- scale(rnorm(n2))*s2 + mean2
        
        out <- t.test(x, y, 
                      var.equal = var.equal, 
                      alternative = alternative,
                      conf.level = conf.level)
        return(out)
    }
    
    ## Otherwise we'll compute everything ourselves...
    
    if(var.equal){
        se <- sqrt((s1^2*(n1-1) + s2^2*(n2-1))/(n1+n2-2)) * sqrt(1/n1 + 1/n2)
        df <- n1 + n2 - 2
    }else{
        se <- sqrt(s1^2/n1 + s2^2/n2)
        # The Welch–Satterthwaite equation gives us
        # the appropriate df for this variance estimate
        num <- (s1^2/n1 + s2^2/n2)^2
        den <- (s1^4/(n1^2*(n1-1)) + s2^4/(n2^2*(n2-1)))
        df <- num/den
    }
    
    tstat <- (mean1 - mean2)/se
    
    ## TODO: test what happens if given not one of these
    p <- switch(alternative,
                "two.sided" = 2 * pt(-abs(tstat), df),
                "greater"   = 1 - pt(tstat, df),
                "less"      = pt(tstat, df),
                NA
    )
    
    quantile <- 1 - (1 - conf.level)/2
    confint <- (mean1 - mean2) + c(-1, 1) * qt(quantile, df) * se
    
    # Match the output from t.test
    out <- list(statistic = c(t = tstat),
                parameter = c(df = df),
                p.value = p,
                conf.int = structure(confint, conf.level = conf.level),
                estimate = c("mean of x" = m1, "mean of y" = m2),
                null.value = c("difference in means" = 0),
                alternative = alternative,
                method = ifelse(var.equal, 
                                " Two Sample t-test", 
                                "Welch Two Sample t-test"),
                data.name = "x and y")
    
    class(out) <- "htest"
    
    return(out)
}
with some examples
Code:
x <- rnorm(20)
y <- rnorm(20, 0, 5)
m1 <- mean(x)
m2 <- mean(y)
s1 <- sd(x)
s2 <- sd(y)
n1 <- length(x)
n2 <- length(y)
o1 <- t.test(x, y, var.equal = T)
o2 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = T)
o3 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = T, simulate = T)
o4 <- t.test(x, y, var.equal = F)
o5 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = F)
o6 <- fsttest(m1,m2, s1,s2, 20, 20, var.equal = F, simulate = T)
all.equal(o1, o2)
all.equal(o1, o3)
all.equal(o4, o5)
all.equal(o4, o6)
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Dason, I know you pride yourself on clean code and perfection, must be a vestigial robot thing.

Should you not use x and y, since the defined y is not really a dependent variable?
 

Dason

Ambassador to the humans
#5
Doesn't really matter. But I followed the convention set in the parameters for t.test which uses x and y for the two data sets if provided.