What is the best test to understand the relationship?

Thanks!

Ben ]]>

I tried to work out the necessary sample size for a logistic regression by simulations and got some scary results. If anyone could check the code below, it would be a great help.

I simulate a logistic regression with two normalized variables, one having a fixed odds ratio of 1.4 the other a variable odds ratio from 1 to 2.5.

What makes this scary is that even at relatively high sample sizes (190) the probability of not identifying the real situation (both xs significant) is very high. At lower sample sizes the situation is more or less hopeless.

Code:

`# logistic regression sample size simulation`

# for several variables (2)

#Define beta0 (start with 0)

B0=0

# Define the odds ratio

#First trial was from 1 to 5 where 1 is giving us a rate of the false alarms

#It dies not make sense to go higher then 2 because at that point

#the power will be almost always high

OR=seq(from=1, to = 2.5, by=0.1)

#Define the sample size

SSize=seq(from=50, to=200, by=20)

#Define table to store the power

T=expand.grid(OR,SSize)

T$PP1=rep(0, length(T$Var1))

T$PP2=rep(0, length(T$Var1))

T$PP3=rep(0, length(T$Var1))

T$PP4=rep(0, length(T$Var1))

names(T)=c("OddsRatio", "SampleSize", "Only.X1", "X1.and.X2", "Only.X2", "Neither")

#Define the X variable (from -3 to 3) assuming standardized X

#Normally distributed with mean 0 and stddev 1

#Define the number of trials per setting (OR and sample size)

NTrials=200

#FOR each OR DO

for(or in OR){

# FOR each Sample Size DO

for(ss in SSize){

# Define holding vector for pvalues

pvals1=numeric(length=NTrials)

pvals2=numeric(length=NTrials)

# For NTrials times DO

for(count in 1:NTrials){

# Generate SSize X values

XVals1=rnorm(ss)

XVals2=rnorm(ss)

# Calculate the probability of an event for the generated X

B1=log(or)

Fact=B0+B1*XVals1+log(1.4)*XVals2

YProb=exp(-Fact)/(1+exp(-Fact))

# Generate the Ys for the X values

Y=rbinom(length(YProb),1,YProb)

# Generate a dataframe (not really necessary here)

testdata=data.frame(Y=as.factor(Y), X1=XVals1, X2=XVals2)

# Run a logistic regression Y~X

mod=glm(Y~X1+X2, data=testdata, family="binomial")

# Save the p-value

pvals1[count]=coef(summary(mod))[11]

pvals2[count]=coef(summary(mod))[12]

# ENDFOR Ntrials

}

# Calculate the percent p-values less then 0.05

px1=sum(pvals1<=0.05 & pvals2>0.05)/NTrials

px1x2=sum(pvals1<=0.05 & pvals2<=0.05)/NTrials

px2=sum(pvals1>0.05 & pvals2<=0.05)/NTrials

pnone=sum(pvals1>0.05 & pvals2>0.05)/NTrials

# Save them in the power table

T[T$OddsRatio==or &T$SampleSize==ss,]$Only.X1=px1

T[T$OddsRatio==or &T$SampleSize==ss,]$X1.and.X2=px1x2

T[T$OddsRatio==or &T$SampleSize==ss,]$Only.X2=px2

T[T$OddsRatio==or &T$SampleSize==ss,]$Neither=pnone

# ENDFOR Sample Size

}

#ENDFOR OR

}

I have the following data and I want to show that the negative change for group 1 is statistically significantly different from the negative change for group 2. And along the same lines, I also want to compare positive change between group 1 and group 2.

The number of observations is 312 for group 1 and 972 for group 2 and the means are in percentages. The percentages are obviously quite different - but how do I prove that they are statistically significantly different (ideally with p-values)?

Key Indicator-----------First Group------------Second Group

------------------Observations---Mean-----Observations----Mean

1 Grade change

negative----------35-------------11,22%--------2--------------0,20%

no change------149------------47,76%-------316-----------32,51%

positive---------128-------------41,03%-------654-----------67,28%

Is it possible to do a t-test here or what would you recommend? ]]>