Ordinal or interval survey response analysis (from a convenience sample)

#1
I have the responses from a survey where people had to provide a rating on a scale from one to five, where five was most positive side of things. I would like to assume it is interval scale where the differences between each point on the scale are the same, as I would like to compute quantitative summary statistics such as mean and standard deviation of the ratings. The idea is to see what the variation is in the ratings. I don't know how much data I need before I can trust the rating, but I thought about using a test to see if the standard deviation of the ratings was equal to 0 (null is that sigma equals 0, alternative sigma greater than 0). Sometimes I only have a few responses, however if they were all the same I feel as though I could trust them because the standard deviation is 0 in that case. One problem (or two or more problems) is the ratings are often skewed (tend to be high), and this is a sample of convenience (look out assumptions) as not everyone who had the option to do the survey did one. So people who did one were probably going to provide a high rating. Can I perform a test like I want to here on data like this? What are your thoughts?

Thank you in advance.
 
#2
are you saying that the survey has issues of sample of convenience because of non-response rates? Thats normal and much of that can be accounted for by proper weighting. As for how much data you need, that again goes back into issues of your sample design. These should be guided by your research questions. Use those and your knowledge of the target population you are address to see if there will be any issues in things like undercoverage.

Also, that is one of the issues with surveys sometimes is that they sometimes draw from samples at the two ends of the normal curve (people who are angry, and people who support the idea).

For example, if you take a sample of 20 people, and they those surveys have a really high intraclass correlation, then you probably dont need an exceptionally large number of sampling units to draw inference. You can just weight them, and draw your conclusions. On the other hand, if you sample 20 people and you get 20 vastly different sets of survey responses (unlikely on a 5 point scale :) ), then you need a much larger sample to draw inference on.

Now if you are asking if you can just take the mean and SD of survey responses, you certainly can, but those are not always meaningful.


I can suggest a pretty good text if you would like that can point you in the right direction.
 
#3
Thank you for your response!

"are you saying that the survey has issues of sample of convenience because of non-response rates? Thats normal and much of that can be accounted for by proper weighting."

Actually yes, I realized I did not really mean convenience after all, and the issue was yes more of a nonresponse type. I gave everyone something to try and they had to rate it on a scale of 1 to 5. They didn't all submit a rating. What sort of weighting would apply here and to what and how? And absolutely, I would love your resources if you have some to offer! Thank you.

"Now if you are asking if you can just take the mean and SD of survey responses, you certainly can, but those are not always meaningful."

Why wouldn't they be? Is it because of the limited five point scale I am using? You mentioned: "On the other hand, if you sample 20 people and you get 20 vastly different sets of survey responses (unlikely on a 5 point scale ), then you need a much larger sample to draw inference on." Then couldn't just how different the responses are be measured with a measure of spread such as variance?

Thank you!
 
#4
so you have an issue of non-response.

How you choose to weight your sample depends on what you care about, and again your target population.

If you research questions do not care about a certain population, and you only care about general responses to data, then its probably not that big of a deal. As long as your rate of return is decent (>50%) you should be ok. Now, if you find out there is a reason that a certain group did not respond, then you have an issue with coverage. But I doubt that.


Before I can give much advice on actual design analysis in terms of waiting, I would need to know if you are trying to make inference on different populations. Or are you just making inference on the items rated?

If the latter is the case, you can just take your n, run something like a mann-whitney u test.
 
#5
Thank you!

I suppose I am not as interested in comparing groups, but more interested in how much data I need to trust a response. If 500 people were given a test item and asked to rate it, they could do so on a scale of 1 to 5 (where 1 is really didn’t like and 5 is really did like). I am not necessarily concerned with what the rating is, in terms of exact value. I am concerned with whether the number of responses I get is enough to trust the rating. How can I be sure if the item is really likable or not? If I get 100 responses and they are all across the board, and the variation is large, then I don’t feel as though I could trust the rating. But if I had 100 responses and they were all a 5, then maybe I could trust the rating. The question I am wondering is whether an item can be considered likable or not. If only five people answered and they all answered 5, maybe it’s enough to say it’s a likable item. Though perhaps these people are the ones who really liked it and it’s not enough to trust the score.

Thank you again.
 
#7
Thank you for your response.

I do understand sample size calcs... thank you for the links. However what about adjusting for the non-response bias? And what about the fact that the responses are on an ordinal scale?


Thank you!
 
#9
if you are really worried about responses being on an ordinal scale, then treat each response as its own group and run a regression with each response as a dummy variable. Alternatively, an Anova will give you the exact same result. Whichever you are comfortable with.

As for adjustment for non-response bias. You can do that, it involves adjusting your standard error using a population correction factor. That will widen your confidence intervals a touch. But honestly, if your sample is large enough, you will find that PCF's do not largely affect the estimates.
 

noetsi

Fortran must die
#10
There are of different issues here which people tend to confuse when they talk about sample size. First, is can you generalize to a large population. This depends on the effect size, how certain you want to be of the results and other factors. There are many links on line and tools that will tell you how many you need. Note this assumes simple random sampling.

A second question is statistical power. How likely it is you will reject the null if in fact you should (another way ot putting this is how likely it is you will not make a type II error). Again there are tools on line such as Gpower that deal with this. There is no easy way to know this except doing such a power calculation.

Third is the issue of non-response. Essentially this deals with if those who respond to your sample are reflectiving of the population as a whole. To some extent that depends on who responds and does not respond. I have never seen literature that suggests what specifically larger samples do to this issue I am not sure it is knowable.

Finally some statistical test are only asymptotically accurate. Skew and other problems can seriously distort the results for these methods unless you have a "large" sample (which is not defined specifically).

All of these concepts are different and you have to be careful on which you are addressing.