I have data for number of different vehicle types (cars, 2-Wheelers, 3-Ws, other) and which vehicle is following (going behind) which vehicle in 5 lanes (one direction).

For example,

In this table, 3Ws are following 120 2Ws and 90 cars. In other table it shows total number of vehicles present in data. Therefore, for any vehicle, 43% of its leader should be cars and only 17% must be 3Ws. But from the first table it is clear that 3Ws are following themselves much more than expected.

I want to check if there is actually any statistical significance using chi-square or any other test. But, this is not valid analysis because composition of each vehicle class is different in each lane. Ex, most cars travel in first 2 lanes, most 3Ws travel in 3rd, 4th lanes. So we cannot expect 3Ws will follow more cars, it is obvious I must consider lanes. Now the problem becomes multi-dimensional and I have no idea how to move forward. I tried several things, the best one that gave reasonable result was,

From this I can apply chi-squared test but I know the extrapolating stuff is wrong lol. Other way would be to consider each lane separately as individual roads, but I won't be able to do chi-square test as needed right?

If nothing is making sense, is there a way I can predict a particular vehicle type to be behind a particular vehicle type on a particular lane? This would probably be better research outcome. But the current analysis is needed to prove "herding behaviour" of 2Ws and 3Ws.

(I don't want to consider the probability that 3Ws are actually moving in 3rd and 4th lanes to avoid following cars or to follow themselves. That is actually because of different speeds in each lane.. I guess.. lol I'll try to prove that as well but later).

P.S.: Please ignore number of samples being less than 5 for chi-squared. I have data for multiple flow levels (freely flowing, congested traffic etc. ) and multiple places. I'll not consider <5 data

For example,

In this table, 3Ws are following 120 2Ws and 90 cars. In other table it shows total number of vehicles present in data. Therefore, for any vehicle, 43% of its leader should be cars and only 17% must be 3Ws. But from the first table it is clear that 3Ws are following themselves much more than expected.

I want to check if there is actually any statistical significance using chi-square or any other test. But, this is not valid analysis because composition of each vehicle class is different in each lane. Ex, most cars travel in first 2 lanes, most 3Ws travel in 3rd, 4th lanes. So we cannot expect 3Ws will follow more cars, it is obvious I must consider lanes. Now the problem becomes multi-dimensional and I have no idea how to move forward. I tried several things, the best one that gave reasonable result was,

From this I can apply chi-squared test but I know the extrapolating stuff is wrong lol. Other way would be to consider each lane separately as individual roads, but I won't be able to do chi-square test as needed right?

If nothing is making sense, is there a way I can predict a particular vehicle type to be behind a particular vehicle type on a particular lane? This would probably be better research outcome. But the current analysis is needed to prove "herding behaviour" of 2Ws and 3Ws.

(I don't want to consider the probability that 3Ws are actually moving in 3rd and 4th lanes to avoid following cars or to follow themselves. That is actually because of different speeds in each lane.. I guess.. lol I'll try to prove that as well but later).

P.S.: Please ignore number of samples being less than 5 for chi-squared. I have data for multiple flow levels (freely flowing, congested traffic etc. ) and multiple places. I'll not consider <5 data

Last edited: