What does t-statistics (paired t-test) tell us?


I'm Max and this is my first time posting here.First let me just say how relieved and grateful I am to now know that there exists a help community for everything related to statistics and particularly research.

This is the summary of my problem:
I am currently conducting a semi quantitative educational research; in particular I'm examining if linguistics modification to maths test will significantly improve test takers results, especially among ESL or EFL learners.

I administered two sets of test; the 1st contain 9 sets of unmodified questions while the 2nd set contains the modified, parallel version of the 9 original questions. Both tests were administered in a sitting to a total of 163 students; the 9+9=18 questions were mixed up randomly. Students were reported for two sets of score - pre test score (X) and post test score (Y).
Thus I have two sets of data - overall pre test score and overall post test score.

A paired t-test was conducted using SPSS to examine if there is any difference between the two sets of overall scores. Here are the results:

(Pair) Mod.Score vs Ori.Score = (Mean difference is) -0.24 points
Std dev=1.590, std error mean=.125
(Within 95% confidence level) Upper=-.485, Lower=.007
t=1.921, df=162, sig(2-tailed) = .056

As such, it can be concluded that the difference between the two tests can be attributed to chance, and thus linguistics modification CANNOT and DO NOT help students do any better.

My problem is, what does t-statistics value mean? What does t=1.921 tell us in this condition? Does it mean more?

Furthermore, I was required to check the existence of any difference (btw post test scores and pre tests scores) across differing English abilities, so again I need to breakdown the participants of my studies into 3 distinct groups based on their latest English score in a state mandated test. The 3 groups are: GD (GOOD), AV (AVERAGE) and BA (BELOW AVERAGE). Within each group, I need to establish 2 more subgroups - GROUP + for those in that ability group AND has improved in the post test, and another GROUP 0 for those who show zero or negative improvement. So now we have 6 subgroups -> GD+, GD0, AV+, AV0, BA+ and BA0.

So let's just say there are 22 students designated within English ability group GD. Within this group, 12 students improved (hence go into subgroup GD+) and 10 students that display zero or negative improvement (hence go into subgroup GD0). On the surface we can say that linguistics modification do improve GD group students' Maths performance as more students improved (12 students) than the number of students that show zero and negative improvement summed up together (10 students).

But not so fast; I then have to run a paired t-test for each subgroup GD+ and GD0 to check if the improvement for subgroup GD+ is actually significant, and whether the negative improvement displayed by subgroup GD0 is also actually significant as well (one must be careful to not simply assume otherwise without any valid examination). One would think that only one subgroup (either subgroup GD+ or GD0) will show significance.

Here are the results:
1. For subgroup GD+
Mod.Score -- Ori.Score: Mean difference= +1.5 points, Correlation=.908, Significance=.000,
Paired differences: Std dev=.522, Std error mean=.151, 95% confidence upper=1.832, 95% confidence lower=1.168, t=9.950,df=11, significance=.000*

2. For subgroup GD-
Mod.Score -- Ori.Score: Mean difference= -1.3 points, Correlation=.886, Significance=.001,
Paired differences: Std dev=.675, Std error mean=.213, 95% confidence upper=-0.817, 95% confidence lower=-1.783, t=-6.091,df=9, significance=.000*

From above, I can theoretically say that for English ability group GD, linguistics modification has significantly IMPROVED subgroup GD+ participants' performances(see number 1) and at the same time, SIGNIFICANTLY WORSENED subgroup GD0 participants' performance. Problem is, is it really possible to make two conflicting statements like the above? If possible, what other explanations or deducement can I make? I'd expect that only one group will show significance statistically (whether in terms of being improved or otherwise). What are your opinions on this anomaly (if they count as one)?

Well I hope I didn't bog/bore you down with my two problems above. I'd really appreciate any pointers/opinions given. My paper is due soon and I really require some help to interpret my data better.

Thanks again for reading.


p/s: My research paper's draft is available to anyone that's interested. PM me for more info. Paper expected to be published in May 2009.
I have a question/comment based on your first question :) You have 163 samples, which is quite large. We know that the t-distribution approaches normal for large n. Testing your H0 using the standard normal shown below indicates you shouldn't reject H0 i.e. means are not significantly different at 95%. Any response?

My thinking is that your data suggests a borderline case and you risk a Type I error. If the consequences are drastic you may want to rethink the conclusion.

One-Sample Z

Test of mu = 0 vs not = 0
The assumed standard deviation = 1.59

N Mean SE Mean 95% CI Z P
163 -0.240 0.125 (-0.484, 0.004) -1.93 0.054
Thanks for the reply :).

Testing your H0 using the standard normal shown below indicates you shouldn't reject H0 i.e. means are not significantly different at 95%. Any response?
- that's what I'm postulating all this while: a large significance value plus the fact that there is a zero within the mean difference interval subsequently tells us that there is no difference at all between the two sets of means.

To restate my possible hypotheses:
(null) hypothesis - there is no difference; thus linguistics modification do not cause any difference in participants'performance
(alternative) hypothesis - there is a difference; thus linguistics modification do cause differences in participants'performance

Could you please enlighten me more when you said:
My thinking is that your data suggests a borderline case and you risk a Type I error. If the consequences are drastic you may want to rethink the conclusion.
? I didn't think I was rejecting a (valid) null hypothesis (or if I sounded like I did, sorry)

It's been two years since I last took a statistics class, so please excuse me for my layman (erratic) use of statistics terms; furthermore I was advised by my supervisor that since my paper is not submitted as a thesis (but as a final year project instead), so I can't yet assign any hypotheses to my research questions. What are your opinions -should I or shouldn't I disagree with my supervisor :confused:?