Ordinal Data

I have the following hypothesis:

Novice drivers perceive road environments as more difficult when there are more pedestrians.
I have the following data:

- The category each driver falls in to (1. Novices, 2. Experienced, 3. Advanced)
- Number of pedestrians identified in 5 separate sections for 26 drivers
- Ranked difficulty for each section (obtained from a questionnaire from each driver)


Section; No. Pedestrains; Ranked Pedestrians; Ranked Difficulty; Driver
1 2 1 3 1
2 15 3 2 1
3 3 2 1 1
5 and so on..
1 5 1 1 2
2 12 3 2 2
3 7 2 3 2
and so on...

I have ranked the number of pedestrians and compared it to the ranked difficulty in SPSS using Spearmans ranked correlation. I did this separately for each driver group. The correlation co-efficients are:

Novice 0.57
Experienced 0.56
Advanced 0.77

Is this the right test or is there a better test to use to analyse the data? Can anyone point me in the right direction? Ideally I would like a statistical test that compares the three driver groups.

My advanced driver data set is also much smaller than the others, will this cause problems?

Thanks in advance.
Last edited:
My hypothesis is:

Novice drivers perceive road environments as more difficult when there are more pedestrians.

Sample sizes are:

Advanced - 4
Experienced - 12
Novice - 9

Advanced and experienced samples can be combined if necessary. (experienced 16, novice 9)



Probably A Mammal
So essentially you want to see if there is a difference between what the advanced/experienced drivers perceive as difficult (i.e., they're the control group) versus what novices perceive as difficult, correct? So what you want to compare, I would think, is that a driver at a given exposure of pedestrians (treatment) ranks the difficulty such and such to that of the other class of driver at that same treatment. I'm far from experienced with design of experiments (ANOVA), but I wonder now what the different classes of pedestrians you have. To control for the effect of how many a novice driver is exposed to compared to an experienced, you need to look at their rates based on being exposed to the same number of pedestrians. So you would want, ideally, something like

Bob, Novice, 2 pedestrians, 2
Joe, Experienced, 2 pedestrians, 1
Frank, Novice, 10 pedestrians, 5
Chris, Experienced, 10 pedestrians, 3

If you're comparing, say, two people each exposed to different pedestrian numbers, that variation can influence the ratings, obviously. You need to compare apples to apples. You didn't fully detail your data, so I don't know if your collection follows this mode. If it doesn't, the only thing I can think of is to categorize the pedestrians like 1-3 is 'low' and 4-6 is 'medium', etc. Then your data will be something like

ID, Group, Treatment, Rating
1, Novice, low, #
2, Experienced, low, #

The previous example I gave is exactly the same, except we're using exact numeric values for the treatment. The above instead categorizes them, which may be necessary depending on your collection. But as I said, a more experienced statistician may need to weigh in on this in case I erred anywhere.
Apologies for not giving a clear description. Let me try again...

- Each driver experiences different sections of a road and rates them 1-5 (easiest to hardest) therefore, this is all evenly distributed.
- I have data of the number of pedestrians in each section for each driver - distribution shown in
- Some sections (quiet roads) have very few or no pedestrians, other sections have a significantly high amount of pedestrians (+100)
- The number of pedestrians are very rarely the same for each driver/section

I need to find out whether:
- Both categories of drivers rank sections as more difficult when there are more pedestrians.
- Whether novice drivers rank sections as more difficult according to the number of pedestrians more than experienced drivers.

I have ranked the number of pedestrians for each section for each driver (see image VRU=pedestrians), to make it ordinal (im not sure whether this needs to be done).

So for my first test, I need to compare:

Ranked difficulty vs Number of pedestrians/ranked pedestrians

So for my second test, I need to compare:

Ranked difficulty vs No. pedestrians vs. driver category

So whether novices class sections as more difficult when there are more pedestrians compared to experienced drivers.

I hope ive explained it well enough.

Many Thanks.


Cookie Scientist
You say that each driver goes through 5 road sections. Does every driver get the same 5 road sections, or are they completely different for each driver, or somewhere in between (i.e., some road sections may be repeated across drivers but others are not)?


Cookie Scientist
Okay. In that case, what you really want to do is fit a mixed-effects model with crossed random effects for drivers and road sections. (5 is a pretty small sample of road sections for a random factor, but it's better than ignoring that source of dependence.) If you are not familiar with mixed models, some introductory resources can be found here. Make sure that your 5 road sections are labeled consistently across people (that is, always 1 through 5, or "a" through "e," or whatever) so that they are explicitly crossed with drivers. And then if you use the lme4 package in R, the syntax for your model might look something like:
model <- lmer(difficulty ~ numPeds*category + (1|driver) + (1|roadSection))
Since you only have 118 observations you probably don't want to make the random effects structure more complicated than that.

I have ranked the number of pedestrians for each section for each driver (see image VRU=pedestrians), to make it ordinal (im not sure whether this needs to be done).
It does not need to be done--you should definitely keep them unranked. However, you might consider applying a log or square-root transformation to this variable. This is just because, psychologically speaking, it seems likely that the difference between seeing 2 vs. 10 pedestrians on the road is greater than the difference between seeing 32 vs. 40 pedestrians on the road. So one of these transformations should compress the higher values and thereby more accurately reflect this nonlinear psychological relationship. Since you have some 0's, the square-root would be preferred. But this is just a suggestion, not a statistical necessity.
Many Thanks for your reply. Ive had a look at some tutorials on mixed effects models but cant really get my head round them, I only have a real basic knowledge of statistics. Are there any simpler tests which could be used or is the test I used suitable?