# average difference in averages... ?

#### jamesmartinn

##### Member
Hi all,

I'm hoping to get some feedback on a project I'm working on in terms of the methods we're using. I'll give a brief description of our project:

We're looking at patient volume for about N=9000 physicians. For each physician, I have the number of patients and days worked under two different conditions. Condition A is volume under 'A-Type Days' and Condition B is the volume under 'B-Type Days'. For each condition, I have the number of patients they saw (sum total) and the number of days over a single year. For example, I might have the following information for a given physician:

Worked a total of 25 "A-Type Days" and saw 100 Patients over this time. Therefore their A-Type average number of patients would be 100/25 = 4.

For the same physician, they may have worked a different number of "B-Type Days". For example, they could have worked a total of 3 "B-Type Days" and a total of 27 Patients in that time. Their B-Type average would then be 27/3 = 9.

I'm wondering how I could compare the average difference between the averages. My intuition tells me to use a paired T-Test to compare the difference in averages. I would therefore be evaluating the "average average difference" in patients from A-Type to B-Type.

Is this kosher? If not, would there be an alternate strategy? I'm able to extract the individual level data for each physician to see on which days how many patients they saw - right now I just have it in aggregate form.

Thanks for insights

cheers,

#### MARijlaarsdam

##### New Member
Hi,
You are measuring the same subject, i.e. you are using the subject (doctor) as its own control. I would therefore agree that a paired t test is the right way to go if you have the appropriate sample size (which seems to be the case). You are then indeed testing if there is a significant difference between the average patient load of type A and type B.
Best, Martin

#### jamesmartinn

##### Member
Hi,
You are measuring the same subject, i.e. you are using the subject (doctor) as its own control. I would therefore agree that a paired t test is the right way to go if you have the appropriate sample size (which seems to be the case). You are then indeed testing if there is a significant difference between the average patient load of type A and type B.
Best, Martin
Thanks for the reply. I'm almost positive a paired t-test is the right analysis, I'm just wondering if my unit of analysis (individual averages based on different denominators) is legit or not.

#### hlsmith

##### Omega Contributor
May be missing something, but I would imagine that this may be best answered using a hierarchical model.

Patient load = condition while controlling for physicians. However, your dependent variable seems to be a count and this approach may require the actual data.

#### jamesmartinn

##### Member
May be missing something, but I would imagine that this may be best answered using a hierarchical model.

Patient load = condition while controlling for physicians. However, your dependent variable seems to be a count and this approach may require the actual data.
Thanks for the reply hlsmith! Can you elaborate more? I'll try to give some more details too. I think the way I described the conditions might be a bit misleading so here it goes:

I'm interested in physicians who work a night shift, and immediately after, without a break, go right into working a day shift.

Right now, I have the data in the following form. I've organized it as such to present some aggregate results for the PI. It's in wide-format.

ID - Identifies Physician
N-Nights - Total number of night shifts they worked for a given year
N-Nights-Patients - Total number of patients seen at night for a given year
N-Days - Total number of day shifts worked directly following a night shift for a given year
N-Days-Patients - Total number of patients seen during day shifts

an example:

Suppose that Dr. Bob worked a total of 10 night shifts for the year. He saw a total of 20 patients during those night shifts. I'd take his average to be 20/10 = 2 patients a night shift.

Now, out of those 10 night shifts worked, Dr. Bob continued working into the day for 2 of them. During those two "day-following-night" he saw a total of 10 patients. I'd take this average to be 10/2 = 5 patients during a day-following-night shift.

Therefore N-Days will always be less than or equal to N-Nights.

I have access to the individual level records so I can definitely dis-aggregate it if need be. I can tell for example, the exact nights that were followed by a day shift, and for each record, I can even tell how many patients were exactly seen on it.

I have this type of data for 5 sequential years (2007-2011).

I look forward to hearing your modeling suggestions! Thanks!