Analyzing winrates between populations

#1
Hello!

I am a graduate student and I have already completed my (very limited) statistics requirements for my field in undergrad. That is to say that this request is not homework-related, but more of a personal project.

Trying to not get into too much extraneous detail, I will present the data I am trying to analyze:

1. There are three factions, each having a different number of players.
2. The total amount of players is 1081.
3. Over the course of some 13 years, there have been 43299 games played, with irregular amounts of games played each year. Some of those games have been played between two players of the same faction, but I am not interested in those games. I am working on parsing those out of this total amount of games.
4. In the majority of these games, there is a winner and a loser. Sometimes there are draws, but I will not be looking at those (they make up way less than 1% of the games and would unnecessarily complicate matters for my intended analysis, I believe).

Here are the interactions I would be interested in exploring:

1. Is one faction favored over another, and is that relationship statistically significant?
2. How has the "balance" changed over the years?

There are other interactions I would like to explore later, but I would like to keep it relatively simple at first.

Now, as to how I would approach these questions, I have considered doing the following:

1. For the first question, I was thinking of doing three separate tables, one for each faction relationship (Faction 1 vs. Faction 2, Faction 2 vs. Faction 3, Faction 1 vs. Faction 3) with wins being the measurement. This is because the data I have is "wins" and "losses," which are not interval-ratio. Thus, I would take the total amount of games played between Faction 1 and Faction 2, let's say, and then write down how many times each faction won (because one's losses are the other's wins). The expected value for each cell, considering a null hypothesis of perfect 50% balance, should be total games/2. Then I would use the chi squared test to test if any irregularities are significant.

2. For the second question, I was thinking of using three factorial anovas, with two rows representing the wins for one faction versus another, and the columns being years. I would then test significance using Tukey's HSD if necessary. The issue I have with this is that there have been different amounts of games played each year, so I am considering using win% in each cell, and then I would be able to make this a one-way anova instead of a factorial anova (because the win% describes the success of both factions simultaneously).

Onto my questions for you all:

1. Am I making any grave errors in approaching this data?
2. Any thoughts or suggestions?
3. How would you approach analyzing this data?
4. Also, I remember hearing about statistical tests that can be run post-analysis to test to test the validity (or was it relevance?) of the test run on the data - I think it was called a beta test? - would that be necessary/applicable here?

Thank you for your time! Looking forward to your responses.
 
#2
If there is something I did incorrectly about presenting my query, or if I am in the wrong place, could someone let me know? I'm eager to start working on this data!