# Thread: Using regression to compensate for matching?

1. ## Using regression to compensate for matching?

Suppose I am using four groups for a study:

South Asian patients, n=41
First Nation patients, n=48
Arabic patients, n=26
South Asian controls, n=70
The controls do not match for age and gender to any of the patient groups.

However, if I show through regression analysis that ethnicity (categorical variable: South Asian/First Nation/Arabic/South Asian) is not predicted by age and gender (statistical insignificance), then does that compensate for not matching by age and gender?

2. ## Re: Using regression to compensate for matching?

I think you have the right idea but your description is wonky. Are you trying to say the distributions of age and gender were the same in the groups? What would happen if you tried to do this with small samples (low power). What type of regression did you run?

3. ## Re: Using regression to compensate for matching?

At least from my perspectives this is a sample issue not a population issue so statistical significance is not really appropriate. What I think is more important is to show that the difference in gender and/or age by ethnicity is small.

4. ## Re: Using regression to compensate for matching?

Originally Posted by Lazar
At least from my perspectives this is a sample issue not a population issue so statistical significance is not really appropriate. What I think is more important is to show that the difference in gender and/or age by ethnicity is small.
So I am thinking of 2 verifiers of the data:

1. Show that age and gender are not statistically significant when compared across the groups (P > 0.05)

2. Show that age and gender do not predict ethnicity (Regression).

Given these are met, do you think this would compensate for no age and gender matching?

5. ## Re: Using regression to compensate for matching?

To clarify your question, I am trying to say that the age and gender distributions are not statistically significant across the groups, and therefore they are not a possible confound. Then, to proceed with the regression would be further evidence that gender and age are not confounds to ethnicity.

Would this compensate for not having age and gender matched controls?

An example of a situation where there are age and gender matches of controls to cases:

Case group - n = 50
Control group: n = 50, age and gender matched.

6. ## Re: Using regression to compensate for matching?

Originally Posted by muaaman
So I am thinking of 2 verifiers of the data:

1. Show that age and gender are not statistically significant when compared across the groups (P > 0.05)

2. Show that age and gender do not predict ethnicity (Regression).

Given these are met, do you think this would compensate for no age and gender matching?
As I note above the issue of whether a dataset is balanced is not a population question and thus you should not use statistical significance. Note that statistical significance does not tell you whether the difference is large or not and this is my problem with it. The effect of gender could be significant but have little influence on biasing the estimate of ethnicity of outcome or vice-versa. Hence I would focus on effect size metrics.

In any case your underlying model sounds like:

Code:
``````         gender/age
^
/                     \
/                       \
/                         \
v
x------------------------>y``````
Thus to get an unbiased effect of x (ethnicity) on your outcome y you need to break either (BUT not both) of the effect of x on gender/age (typically done via a randomised control trial OR matching. Alternatively you can break the link between gender/age and y. This is typically done via regression adjustment like ancova. Either is fine.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts