It's definitely true that you shouldn't just put everything in one big table and run your analyses.
Your problem is in fact very common in (cognitive) psychology, although usually with the DV being some kind of continuous variable (for instance reaction time). There may be some differences because you measure something else, but this is how it's usually done with say reaction times:
You do two analyses: the F1 and the F2 (assuming you have F-tests, of course, otherwise you would call them differently). F1 is a test on the subject means. F2 is a test on the item means. The best case is for both of them to be significant. People used to combine them in a measure called minF, but that's not commonly done anymore I think. If one is significant but not the other, you're in a bit of a rut. Especially if the F1 is significant, you would probably still imply there's some kind of effect.
Now this is probably not the best way to do these kind of analyses. A better analysis would probably use multilevel modelling, as that's what I've seen the stats wizzes use. However, it's not very well-known in psychology circles and reviewers might not accept it.