## Seemingly simple problem of t-tests

Hi, this problem that I have easily generalizes, but I will simply use two students for an example.

We see two students. S1 and S2.
One of them is a smarter kid, and we are trying to figure out who.
In particular, what is the probability S1 is smarter than S2?

Here is the environment.

S1 and S2 are in some school which offers many tests per year.
There are 100 tests per year.

The true grade distribution that S1 and S2 would get from the tests
is normally distributed with a known variance. In other words, the variance of test scores both kids is some constant for all tests, in both years.

The means however, is different.

The true means of their grades change across years, but the smarter kid will
have a higher true mean in both years. So grades are not comparable across years, just within years.

More concretely:
S1 has true means m11 for all the tests he takes in the first year,
and m12 for all the tests he takes in the second.
S2 has true means m21 for all the tests he takes in the first year,
and m22 for all the tests he takes in the second.

If S2 was the smarter kid, then

m21 > m11
m22 > m12

We know nothing about the difference and it may change.

m21 - m11 ???<=>??? m22 - m12

To put some numbers.
S1 takes n11 tests in the first year and n12 in the second year.
S2 takes n21 tests in the first year and n22 in the second year.

For each year, I can construct a simple Z statistic that we learn the
first week of undergraduate statistics for hypothesis testing when
samples are normal and there is a known variance.

The inverse CDF of this is the probability S1 is a smarter student than S2 (depending on which way I take the difference).

The same can be done for the second test.

This gives me two probabilities, one for each test.

How do I combine the data from both years to get the overall
probability that S1 is smarter than S2? The true means are different.
Sample sizes are also different. I've tried going back to first principles, but there seems to be something missing... is there no consensus on how to actually do this?

Thanks very much
Tzuo