1. ## Help with a basic question about the role of weights in SPSS

I'm helping someone out with kind of a cruddy dataset, I was hoping I could ask a question about how weights work in SPSS and if I'm thinking correctly.

I have a file that has many observations that were made using many variables based on interview transcripts. I changed the file so each row represents one subject, and each column is one variable. The values in these columns is the number of times the subject was observed exhibiting this behavior.

The problem with the design is subjects were rated with a widely varying number of observations, some as few as 9 and others up to say 40.

To look at relationships between these column variables, it would be messy because the people with more observations have a much higher total number of these observed behaviors.

So, I was thinking I could create a weight to correct for this. I tried dividing 1 by the number of observations x 100 to move the decimal. I get a number that is smaller as the number of observations goes up.

Does this in fact help correct for this awful design or should I consider a different tree to bark up?

If anyone has read this far, here's an example of the issue--say we are counting the number of times a subject blinks (var 1) and scratches their nose (var 2) in a minute. Problem is we made 10 of these 1 minute observations for some subjects and 40 for others and all numbers in between. We have to correct for this messy design before we look at the relationship between var 1 and 2.

Before you decide to weight the responses, I'm not sure I'm clear on your dataset. You say you've set it up so that each row is a subject and each column is a variable.

It's not clear to me whether each column is a combined total for all observations for that subject (total number of blinks). If so, then is the 9-40 range the number of sessions in which the observations were collected? So some subjects had their blinks counted over 9 sessions and some came back for a total of 40 sessions? If so, then you could weight the responses, but not in the method you described.

Or are you saying each subject was interviewed once and observed by multiple observers who wrote down different answers (soneone counted 40 blinks during the interview and someone else only counted 9 blinks during the same interview)? If this is the case, then weighting is definitely NOT the way to go.

Give us some more info as to what the dataset includes and I can give you a better idea on a way to weight it.

Originally Posted by Berley
It's not clear to me whether each column is a combined total for all observations for that subject (total number of blinks). If so, then is the 9-40 range the number of sessions in which the observations were collected?
Yes, exactly. The file originally was in a messy state and I figured this would be a way to tame it. Each row represents one person. Each column represents a counted variable for the number of times the person did this or that. Problem is some people were observed a lot and some were not observed so much, so those who were observed a lot have more tallies. If I try to compare the subjects on this basis, the ones who had lots more observations will appear to have larger "scores" on these variables.

The file is actually stuff that people said and I believe they coded a sentence at a time, so some people said lots of sentences and some said fewer.

Originally Posted by Berley
So some subjects had their blinks counted over 9 sessions and some came back for a total of 40 sessions? If so, then you could weight the responses, but not in the method you described.
Yes, that's exactly correct. I want to correct for the fact that the subjects with a higher total number of observations would naturally tend to get a higher count for the observed behaviors.

Sorry to sound idiotic, but this is one of those projects that fell in my lap after all the data was collected and input. I was looking for a way to differentiate amongst categorical data in this big 25x25 contingency table and I'm not coming up with a lot of good ideas, they used a boatload of codes, many were only used a couple times. aaaggh.

Sorry I didn't see your response until now. Hope this is still helpful.

So some of the participants were observed 9 times and some 40 times. You want to weight it so that the participants that were observed 40 times don't skew your results, right? OK.

Now that I re-read your original post, I think you're probably weighting it correctly (but I would word it a bit differently). I would weight your samples by adding together the total number of observation sessions. That's your denominator. Each participant's result is the numerator.

But is that what you really want to do? Do you really care about total results per participant? Or do you care about total results per session? Just a thought. Maybe you don't have totals by session...

However, all that aside, SPSS will weight your cases for you. Just choose the "Weight cases" option from the data menu and select the field for total number of sessions to weight by that option.

