Power calculation with measurement error on Y

Junes

New Member
#1
I'm trying to do a power calculation for a study I'm involved in, and I'm a bit stuck.

The study is about textual cues of trust. We want to analyze a number of predictors in the text using the LIWC tool (which generates features based on text) and correlate them with perceived trust as determined by the users of this internet platform.

We want to do a multiple linear regression with a fixed number of predictors (we still have to determine the exact set, but think around 10-15). We expect low to medium correlations (say, .2-.3)

I want to do a power analysis. It's easy to find material to calculate the power for a correlation (like this)

However, I'm not sure how to deal with the measurement error. The thing is, the dependent variable is the mean perceived trustworthiness. We estimate this with a sample of ratings for each profile. From a pilot study I know that the SD of these ratings is around 0.6 (out of a Likert-scale of 5). So the SE of the mean rating, and thus the error of my measurement, is estimated to be:

SE = 0.6/sqrt(a) , where a is the number of raters per profile

How can I combine the power analysis that uses perfect measurement with the error in measurement to get a definite power?

And how does the number of predictors come into play?

So, basically, how do I get from:
a = number of raters
n = profiles (sample size)
p = predictors
r = expected correlation (0.2)

to a power? Any help is greatly appreciated!
 
Last edited:

rogojel

TS Contributor
#2
hi,
off the top of my head, wouldn't the measurement error just act as an increased variance in the data? So, I would expect it to reduce the correlation: if you expect 0.2 based on theory then you might want to plan the sample such that it can detect, say, 0.15 due to measurement errors.
regards
 

Junes

New Member
#3
Thanks for your reply. Yeah, that is what I thought too. But how would I calculate the factor? Also, one is in standardized units (r) and the other in unstandardized (SE).
 

Dason

Ambassador to the humans
#4
The mathematician in me says "Yes there is a way to calculate that directly". The lazy person in me says "just simulate it". That's my favorite way to do power analysis.
 

Jake

Cookie Scientist
#5
The assumed correlation coefficient that you use will already represent the "attenuated" correlation due to measurement error, and it is this attenuated correlation coefficient that is directly relevant for power. In other words, most power analyses do not "use perfect measurement," as you say. Knowing the fraction of the observed variance in the DV that is due to measurement error might be interesting, but it does not affect the power analysis in any way.

As for the influence of the number of other predictors, this has very little influence on power after the partial correlation for the IV of interest (which I assume is the correlation coefficient you're talking about) has been given. All it does is make the denominator degrees of freedom a little smaller. So you can probably safely ignore this altogether, unless your sample size is very small, like less than 20 or something.
 

Dason

Ambassador to the humans
#6
Yeah you could probably safely ignore it. Or you could be awesome and simulate. Either way works.
 
#7
I haven't followed the discussion so well, but isn't it so that the correlation among predictors will have an influence om the power. So if there is no correlation (as in a designed experiment) some correlation or very high correlation (i.e. multicolinearity) then the power will be influenced, as increasing multicolinearity will increase standard error.

Isn't it so that the power from the noncentral t-distribution is influenced by the determinant from the full (X'X)-matrix (and that is influenced by multicolinearity)? So, as I understand it, it is not just about that the critical t-value will increase with decreased degrees of freedom.

Except for the difficulties in actually computing the the power, there is a difficulty in beforehand to imagine what could the correlations be among x-variables, so that the power computation is realistic.
 

hlsmith

Omega Contributor
#8
Bottom line as well, will there be measurement error in the actual study you are going to conduct or is it exclusive to the preparatory sample!
 

Jake

Cookie Scientist
#9
I haven't followed the discussion so well, but isn't it so that the correlation among predictors will have an influence om the power. So if there is no correlation (as in a designed experiment) some correlation or very high correlation (i.e. multicolinearity) then the power will be influenced, as increasing multicolinearity will increase standard error.

Isn't it so that the power from the noncentral t-distribution is influenced by the determinant from the full (X'X)-matrix (and that is influenced by multicolinearity)? So, as I understand it, it is not just about that the critical t-value will increase with decreased degrees of freedom.
Yes, but all of this is already accounted for in the partial correlation coefficient. If we were using the simple correlation coefficient then it would be a different story.
 

rogojel

TS Contributor
#10
I guess one could have a situation where one expects a given correlation based on theory not previous experience. In this case one would have to factor the measurement error in.

regards
 

Junes

New Member
#11
Thanks a lot for the many replies!

Dason said:
The lazy person in me says "just simulate it". That's my favorite way to do power analysis.
Excellent idea! I think I'm going to give that a try.

jake said:
The assumed correlation coefficient that you use will already represent the "attenuated" correlation due to measurement error, and it is this attenuated correlation coefficient that is directly relevant for power. In other words, most power analyses do not "use perfect measurement," as you say. Knowing the fraction of the observed variance in the DV that is due to measurement error might be interesting, but it does not affect the power analysis in any way.
Thanks, that is reassuring. However, I'm not sure that's applicable in my case because it's a new instrument. The correlations are just what I expect from some related stuff and my tiny pilot sample. Also, my resources are finite: my participants don't get much in the way of incentive. So there is a limited number of ratings I can have them make. So it either means many profiles with few ratings or few profiles with many ratings, and I want to optimize the power.

hlsmith said:
Bottom line as well, will there be measurement error in the actual study you are going to conduct or is it exclusive to the preparatory sample!
I think in the study as well, since I want to look for correlations with mean perceived trustworthiness (in the population) and what I get is an estimate of that, based on a number of ratings.
 
Last edited:

Junes

New Member
#12
Another question: as participants will rate only a selection of the profiles (say 20 out of the 150 I will use in the text analysis), these ratings will be correlated. To make matters worse, it seems likely that the software we are using will only allow us to present batches of items. So participants 1-10 might get profiles 1-20, participants 11-20 might get profiles 21-40, etc.

How big of a problem is this clustering for the estimate? My hunch is that it's not a really a big problem, as we only use the ratings to estimate the mean trustworthiness. We don't use the individual ratings. But maybe I'm wrong about this.
 

Junes

New Member
#13
Simulation so far show that spending more than 7-10 ratings on a single profile is a waste of resources, and that the resources can better be spent on extra profiles.

Thanks Dason, simulating was a really good idea. Gives you a lot of insight in the data, too.