Comparing and balancing two outcome-production methods

Hi, I've a question.

I have a soccer videogame where teams have different strengths (16 to 621). The result of a match between two teams can be known in two different ways:
- directly checking the result (it doesn't just compare the two teams, there's a random factor; but it's always the same result if I save before the match and reload it again and again)
- watching the match (different results even if I reload from the same save)

Now, I'd like to know:
1. whether there's a systematic difference in the number of goals scored in the match when I directly check the result vs. when I watch the match. Consider that as far as I know it could be something quite complex, like this: "if the sum of the strengths of the two teams is >500 and the difference is <50, then directly checking the result means in average -1.25 goals scored; if the sum is <400 and the difference is >100, then directly checking the result means in average +0.75 goals scored; and so on". Also, since the situation is most complex than the one I've described (each player has a given value in each skill - the team's strength I talk about is just the sum of all these values), there's some noise.
2. how can I "balance" the two methods.

What should I do?

---

Here's what I'm doing, as a layman. Please tell me where I'm wrong! (I can imagine almost infinite ways I'm erring)

1. I took 30 couples of teams at random. For each couple, starting from the same save, I'm computing the result twice: directly checking the result, and watching the match. I'm taking note of the following variables: team A's strength, team B's strength, number of goals scored when directly checking the result, number of goals scored when watching the match. After I'll have run the the whole test, I'll check for: average difference in goals scored between the two methods; correlation between sum of teams' strengths and said difference in goals scored; correlation between absolute difference of teams' strengths and said difference in goals scored.

2. If (average number of goals scored when watching matches) is lower than (average number of goals scored when directly checking the results) and there is no meaningful correlation with the sum or difference of teams' strengths [note], I would check the percentile (= p) of (average number of goals scored when directly checking the results) in the group of observation (number of goals when watching matches); then I would load and watch any match 50/p times and consider the instance with most goals scored as the one that is closest to the number of goals that I would have if I directly checked the result.

Thank you.

[note] If there's such a correlation... I have no idea.

Re: Comparing and balancing two outcome-production methods

Indeed I'd be happy even solving the simpler form of the problem, i.e. assuming neither sum nor absolute difference of teams' strengths make any difference. I.e. my question becomes: assuming that (average number of goals scored when watching matches) is lower than (average number of goals scored when directly checking the results), is it right (in order to get as close as possible to the number of goals that I would see directly checking the result) to watch the same match n times and then pick the match where most goals were scored? If so, how should I compute n?