PISA 2012: Calculating variance explained by ESCS

Hi there! I was wondering if anyone could help me figure out why I am getting the wrong results!

I am using the PISA dataset of 2012 to calculate the variance explained by ESCS (socio-economic status).

I followed the steps in the PISA DATA ANALYSES Manuel 2nd edition for SAS:

1. SAS® syntax for preparing a data file for the multilevel analysis (Box 16.3 at p.238-239).

2. SAS® syntax for running multilevel models with the PROC_MIXED_PV macro (Box 16.6 at p.242).

3. Calculating the results:
I used following equation:


T2 = is the intercept/between-school variance in model 2 and;
T1 = is the intercept/between-school variance in the empty model


First I tried using SPSS, than SAS, but either way I am not getting the same results as in the examples given by PISA.
For example: According to PISA, in Norway, 7% of variance in math performance is explained by ESCS. According to my calculations that is wrong, as much as 20% is explained by ESCS, and 25% by ESCS and MU_ESCS.

Interestingly when I run a one-way ANOVA, than the R-square is (almost) the same percentages as in the examples by PISA.

I only have a few more days left before submitting my BA Paper, and I really need to calculate the variance for countries that are not included in the examples on PISA's website.
Can someone please please help me figure out why I am not getting the right percentage. :)
Thanks for trying to help out!!! :D
From the PROC_MIXED_PV.sas file proved by PISA it lookes like all the 80 different weights are being used.


Phineas Packard
Ok so I have the data to hand and I get variance explained of 7.41%

Note that I don't know sas at all but here is my R script. Note that I have all pisa waves in an sqlite3 database:
#set working directory
# Extract 2003 data
pisa <- dbConnect(SQLite(), dbname="LSAY.sqlite")

dbListFields(pisa, "PISA2012")

norway <- dbGetQuery(pisa, "SELECT * FROM PISA2012 WHERE cnt = 'NOR'")
brr <- norway[,grep("W_FST.*", names(norway))]

norway2 <- list()
for (i in 1:5){
	tmp <- paste0("PV",i,"MATH")
	tmpData<-	norway[,c("W_FSTUWT", "ESCS", "StIDStd", tmp)]
	names(tmpData)[4] <- "math"
	norway2[[i]] <- tmpData

norway2 <- imputationList(norway2)

dclust <- svrepdesign(ids = ~StIDStd, weights = ~W_FSTUWT,
					  data = norway2, repweights = brr[,-1],
					  type = "Fay", rho = 0.5)

M1 <- with(dclust, svyglm(scale(math) ~ scale(ESCS) - 1) )
Which gives:
Multiple imputation results:
      with(dclust, svyglm(scale(math) ~ scale(ESCS) - 1))
              results         se    (lower    upper) missInfo
scale(ESCS) 0.2723315 0.02066994 0.2317947 0.3128684      5 %
[1] 0.07416447
Wow, thats is very interesting! :) I tried to replicate your code in R, but I hard a hard time figure out how R works. Thanks for the help, I will try to compare the codes and see if I can find the difference.


Phineas Packard
Ok so there are a few issues here:

1. You have to at least use the W_FSTUWT weight.
2. The PISA results that you allude to (the same as what I got) are fixed effects model not multilevel models. They deal with the complex nature of the data using replicate weights. In addition they are also running each analysis 5 times. Once for each plausible value and then they integrate the results afterwards using Rubin's multiple imputation rules. SAS and SPSS should handle all of this.
3. What it looks like you want to do is you want to do a contextual model? That is look at the degree to which escs and school average escs predicts math achievement? Is this right?