Weighted Regression

I have a few questions in regards to setting up and running my weighted regression:

Firstly, some background. I am comparing point of subjective equality (PSE) values from participants. Specifically this is the compression value of a stimulus that needs to be applied for the participant to judge the stimulus to be the same width of the reference stimulus.

I am looking to use a regression analysis to compare this value with autistic traits as measured along the AQ.

Now, my first question is:
1) In my results file i have the standard error of the mean (SEM) for each PSE value.

To calculate the weights for the regression would I do :

1/(SEM^2) for each PSE value?

If so, these values are extremely large, is this normal?

2) If the above is correct do i simply add these values in as the weights then run the analysis or do I need to do anything else?

I hope this makes sense and thank you in advance for the help,
Hi Jack!

Just spotted your post. - I am by no means an expert, infact I'm trying to get my head around WLS regression at the moment to! Although the terms you are using are slightly different it sounds like you might be doing it correctly.

So to quickly ensure we're talking the same language, the WLS equation that I'm using is

\( \hat{\beta}= \left ( X^{T} W^{-1} X \right ) ^{-1} X^T W^{-1}Y \)

Where W is the weighting matrix

{\sigma_1^{2}} & 0 & \dots & \dots & \dots \\
0 & {\sigma_{2}^{2}} & 0 & \ddots & \dots\\
\vdots & 0 & \ddots & \ddots & \vdots \\
\vdots & \ddots & \ddots & \ddots & \vdots \\
& \dots & \dots & \dots & {\sigma_n^{2}}

(Note for the way I did it W contains the variance of the data points, but the formula uses the inverse of this is a matrix with diagonals of \(\frac{1}{\sigma^{2}} \)

What I did is using excel i set up the some hypothetical data sampling from data with known differing variances in excel and did the WLS using matrix formulae to see if it looked like I was getting a "common sense" result. - So I set up 15 data points from one distribution SD and then 15 data points from a second SD. Varied the second SD and looked at what my estimates were like and compared it with OLS results for the same data and the two groups (I simplified it by reducing the underlying model to be just a mean and variance with no (zero) explanatory/independent variables.

As expected when the SD in the two groups were the same I got the same as OLS on 30 datapoints, when one SD was much smaller than the other that estimate dominated my estimate, and in the middle bit the estimates I got were less variable coming from OLS on either the whole data set or individual groups.

Hope that helps or inspires a way you could grapple with your own problem :)

Last edited: