So, a colleague of mine, who is a survey methodologist, is proposing to do an experiment on a survey.

* The purpose of the experiment is to see if the communication protocol for the survey has impacts on (1) response rates, and (2) if specific protocols have higher acceptance rates by people of certain ages.
* The sample size is 2,000.
* The main outcome is Age (in years)

The sample will be randomly allocated to one of 4 communication protocols:

1. Phone call only (n=500)
2. Phone call, followed by text message if no response to phone call (n=500)
3. Text message only (n=500)
4. Text message, followed by phone call if no response to text message (n=500)

This is just a CRD and seems easy to parameterize.

HOWEVER, here are the actual likely outcomes:

1. Phone call only (n=500)
* nonresponse (n=?/500)
* Age from phone call only (n=?/500)
2. Phone call, followed by text message if no response to phone call (n=500)
* nonresponse (n=?/500)
* Age from phone call only (n=?/500)
* Age from text message after no response to phone call (n=?/500)
3. Text message only (n=500)
* nonresponse (n=?/500)
* Age from text message only (n=?/500)
4. Text message, followed by phone call if no response to text message (n=500)
* nonresponse (n=?/500)
* Age from text message only (n=?/500)
* Age from phone call after no response to text message (n=?/500)

How in the WORLD do you parameterize this thing?

I mean, now we have subgroups that look alike (e.g., two subgroups with age from phone call only but one was randomly allocated to protocol 1 and the other to protocol 2) that cannot be combined due to the design.

My goal is to help argue the poor quality of this design by showing the messy parameterization, not to mention how it carves up the sample. I am also open to any other points if you have them.