# About Correlations between Mathematically Dependent Variables: Best Practice?

#### ByTheNumbers

##### New Member
Summary / Basic question: How do I correct for a logical (/mathematical) dependency between two variables when I want to compute the correlation between these two variables?

Specific Issue:

Analysis Goal: To assess whether or not medical facilities favor some Procedure X more or less as a function of their size.

Predictor Variable: Facility size is defined as the number of patients a facility handles on a daily basis, totaled over the course of a given month. The specific measure is “Total number of patient days” which is a product of the number of patients and their individual length of stay during a given month. For example, if 10 patients stay at facility F for 5 days each in the month of May, Total number of patient days for the month of May is 10*5=50.

Predicted Variable: The rate at which a facility administers Procedure X per Patient day. The specific measure is: Procedure X Incident Rate = (Number of administrations of procedure X in a given month) / (Total number of patient days in a given month). On its own, Procedure X Incident Rate allows me to compare the practices of facilities to each other. I can tell for example what the normative rate is, and I can single out outliers that either rarely use procedure X or use it very often.

I get into trouble is when I want to assess whether there’s a correlation between facility size and facility use of procedure X. I might have a non-significant Pearson’s r, and a graph that shows a flat line of facilities that all use Procedure X at the same rate regardless of their size. Does that actually mean that size doesn’t matter, and if not can I continue to use Pearson’s r if I correct for the logical dependency between the predictor and predicted variables.

What do I mean by logical dependency? Basically that “Total Number of Patient Days” appears in both variables.

In more detail, what I mean by dependency is this:
I know that Total number of patient days (TPD) correlates perfectly with itself: r=1.
TPD
|............... *
|............*
|........*
|.....*
______________
TPD

I know that the inverse of Total number of patient days (1/TPD) correlates perfectly negatively with Total number of patient days (TPD): r=-1.

1/TPD
|.....*
|........*
| ...........*
|...............*
______________
TPD

If I multiply the predicted variable (1/TPD) by the number of administrations of Procedure X (#A) so that I end up with my Procedure X incident rate, I’ve recreated the correlation I originally wanted to compute between administration rate and facility size. But is this interpretable? Do an r=0 and a horizontal line in the graph below mean that there’s an r=+0.5 because I measure the departure from an r=-1 rather than r=0? Or does it mean that I really have an r=0 and no correlation between administration rate and facility size? Should I be approaching the problem differently to begin with?

#A/TPD
|
|.....*.....*.....*.....* |
|
|______________
TPD

Any insight, ref. to outside literature, partial or complete answer is welcome. Thanks for your help!