Contextual Effect Question


Phineas Packard
I am looking for general thoughts on the following circumstance.

I am working on a problem for a colleague in which he has some outcome Y at T2 that he wants to predict by a contextual effect: school average Y at T1. He does not want to estimate change so he has estimates a model:

yT2 ~ avg.yT1 + (1 | schoolid)

rather than:

yT2 ~ yT1 + avg.yT1 + (1 | schoolid)
#the random effect could be (yT1 | schoolid) but it does not matter much for this description

My claim is that you cannot do this for several reasons.

1. A contextual effect is defined by the difference between the school average and individual effect (Harker & Tymms, 2004)
2. In the absense of individual yT1, avg.yT1 acts as a ****ty proxy for individual yT1 - in cases where ICC(yT1) =/= 0. I mean taken to extremes yT1 == avg.yT1 where ICC(yT1) == 1.

My off-the-cuff suggestion was to calculate avg.yT1 separately for each individual such that the school average equals the average yT1 of everyone in the school BUT that particular individual.

Thus every individual in a school potentially has a different avg.yT1 score and the model would be:

yT2 ~ Leave1outAvg.yT1 + (1 | schoolid)

On reflection though I am not sure this helps much. For the following reasons:

1. This is still not the definition of a contextual effect as without individual yT1 in the model you cannot get the differences between individual yT1 and avg.yT1.
2. The reason we need to control for nesting in a dataset is that the observed sample size is difference from the apparent sample size where individuals within a cluster may be the same. In that case then wouldn't Leave1outAvg.yT1 also be a ****ty proxy for individual yT1? If ICC(yT1) == 1 then yT1 == Leave1outAvg.yT1. Thus you don't decouple yT1 and avg.yT1 simply by calculating the avg via leave one out and thus:

yT2 ~ Leave1outAvg.yT1 + (1 | schoolid)

is still misspecified right?


Cookie Scientist
The (proper) contextual model works by estimating the orthogonal effects of yT1 and avg.yT1. So if they want to mimic this but for some reason not estimate the yT1 effect, then they can first construct an orthogonalized-with-respect-to-yT1 version of avg.yT1, and then use that variable as the predictor in their model. In other words, regress avg.yT1 on yT1, save the residuals, and use those in place of avg.yT1.

As for the definitional issues, my understanding is that you are right in that unless the colleague either fits the two-predictor model that you wrote or does the procedure I just elaborated, the effect of avg.yT1 in their model would not be a "contextual effect" as that phrase is commonly understood in the educational psych literature. (Although it might still estimate an effect of interest, I don't know, it depends on what the researcher is trying to get at.)

In the original model suggested by the colleague, the thing being estimated is really just the undifferentiated time 1 school effect. My understanding of the relevant terminology here from the ed psych literature (I've never seen these terms used elsewhere in the wider multilevel modeling literature) is that we have the following decomposition:
school effect = compositional effect + contextual effect

Parenthetically, all of these models seem a bit strange to me in this context because we would usually want the dataset to be structured such that we have separate rows for the Y values at time 1 and time 2, and we would account for time by including a time predictor in the model... but I guess we can ignore this for now.