I am looking at two different sequences of events (s1, s2) of different length (l1, l2). Each event in the sequence can be of three types (t1, t2, t3). I would like to test if there are differences in the distribution of the three possible types.

In order to see which types differ between s1 and s2 I would consider three separate χ2 tests most appropriate: both sequences in a binary categorical independent variable, and each event type as dependent variable.

The problem: s1 and s2 are of different length. How do I control for this? I could calculate relative frequencies for the event-types in both s1 and s2, e.g.: #t1-events-in-s1 / #total-events-in-s1. But only raw counts are to be used for t-tests or χ2 tests, not relative frequencies. Alternatively, if l1<l2, I could stick with raw counts and linearly scale the l1 counts up to l2 by multiplying each event-type count for l1 by a factor (l2/l1). Perhaps there is another solution? What is best practice in this case? Thanks for any thought on this problem.