Hi there - I hope someone can help me out - it's been a long time since college stats classes...

I have five business groups, for each one I have a performance metric (a percentage score). I am looking for factors which might explain that metric (I understand that correlation <> causation!)

The groups have offices (between 20 and 90, each group has a different number), they have revenue (varies with each group), and the offices are in different locations. Some of the offices are in locations that are on a list of 20 places that are problematic - we'll call this the bad location list. All the metrics are for the group as a whole, not the individual offices.

OK, so, number of offices correlates well with performance - r2 - .87. Percentage of offices that are on the list correlates well (but negatively) about -.8.
So far so good - the more locations, the better, and the fewer of them that are on the bad list the better. What I'm worried about is that the larger the number of offices, the lower the percentage on the bad list CAN be - that is to say there are only 20 'bad' locations, so if you have more than 20 office locations you are automatically going to have a lower percentage on the bad list - how can I deal with this? I want to disagregate the effect of size per-se from the effect of bad locations.

Thanks so much!