# How to analyze a categorical confounding variable for experimental results

#### wynand

##### New Member
I ran an experiment. The experiment had 3 conditions (C1,C2,C3). I analyzed the results which are countably infinite variables (e.g. - number of points) or a finite set of numbers (e.g. - on a scale of 1-5 how much did you like this?).

I have two questions in regards to this data and possible confounds:

1) Within each condition participants differed on a variable say X (not the condition they were in) that had 3 possible values. What would be the best way to control for X to eliminate it as a confound? If i do a chi-square test of independence then i can only find evidence whether proportions for each condition are significantly different or not (not evidence that the null hypothesis is true that the proportion of people for each value of x is roughly equal for each condition) right? Someone suggested log-linear analysis and someone also suggested stratified analysis (but I think my strata will have too small of a sample size then...)

2) After the experiment people had the option of filling in a post-survey. I would like to determine that the people who chose to fill in the survey were evenly distributed among the 3 conditions (can i use a chi-squared test of homogeniety for this and if the result is not significant state there were no significant differences in the number of people per condition who filled out the post-survey and thus the post-survey results are not biased to be representative of a particular condition?

Thanks for any help!

#### Karabiner

##### TS Contributor
I ran an experiment. The experiment had 3 conditions (C1,C2,C3). I analyzed the results which are countably infinite variables (e.g. - number of points) or a finite set of numbers (e.g. - on a scale of 1-5 how much did you like this?).
So will analyse this using oneway analysis of variance (infinite variables), or Kruskal-Wallis H-test (rating scales)?
How were participants allocated to conditions, was there a randomization?

1) Within each condition participants differed on a variable say X (not the condition they were in) that had 3 possible values. What would be the best way to control for X to eliminate it as a confound?
It depends on your statistical analysis. If you use oneway anaysis of variance, then you can add variable x as a factor,
so that you have a 2-way analysis of variance. In case of the rating scales (ordinal variables), you could perform
an ordinal logistic regression with "condition" and "variable x" as predictors.

I would like to determine that the people who chose to fill in the survey were evenly distributed among the 3 conditions (can i use a chi-squared test of homogeniety for this and if the result is not significant state there were no significant differences in the number of people per condition who filled out the post-survey and thus the post-survey results are not biased to be representative of a particular condition?
Seemingly, you are not interested in the theoretical question whether varible "
filling out a post-survey" is associated with variable "condition" in the population;
instead you seemingly want just to answer the practical question whether results
within your sample could be biased. Therefore, a statistical test of significance would
be useless here, since it is performed in order to make inferences about the population.

In addition, a non-significant result would not proof that there could not be a bias,
because you say you have a small sample size, and small samples could be responsible
for a false-negative test result .

I would just look at the descriptive statistics of the sample (% of post-survey
participants in condiations 1, 2, 3) and try to judge whether the difference is
so large that it could produce bias. There are no objective statistical criteria
for this.

With kind regards

Karabiner

#### wynand

##### New Member
@Karabiner Thank you for your thorough response.

So will analyse this using oneway analysis of variance (infinite variables), or Kruskal-Wallis H-test (rating scales)?
Yes, I did schapiro-Wilks normality tests and they indicated that none of the data was normally distributed so I did Kruskal-Wallis omnibus tests and then Dunn post-hoc test for pairwise comparison to report the results.

How were participants allocated to conditions, was there a randomization?
Yes random assignment.

It depends on your statistical analysis. If you use oneway anaysis of variance, then you can add variable x as a factor,
so that you have a 2-way analysis of variance. In case of the rating scales (ordinal variables), you could perform
an ordinal logistic regression with "condition" and "variable x" as predictors.
Since I used Kruskal-Wallis can i do ordinal logistic regression for both the finite (scale of 1-5) and infinite ordinal variables (e.g.- # of points) and for this regression then the dependent variables are the variables used in the analysis I mentioned in the first part of this message?

An alternative I was thinking stratified analysis but participants aren't evenly distributed among the strata of the confounder. Is one of regression or stratified analysis preferable in cases like this?

One other way of looking at this possibly...is I'm interested in a test for equality of proportions of for each value of confounder variable X in each condition (if proportions of people were close to equal for each value of X for each condition then this would indicate results I found in initial analysis not due to confounder?)...

Seemingly, you are not interested in the theoretical question whether varible "
filling out a post-survey" is associated with variable "condition" in the population;
instead you seemingly want just to answer the practical question whether results
within your sample could be biased. Therefore, a statistical test of significance would
be useless here, since it is performed in order to make inferences about the population.
I'm actually interested in both. For example, maybe experience in condition 1 was so bad nobody bothered to fill out the post-survey. For this is the test of homogeneity helpful?

In addition, a non-significant result would not proof that there could not be a bias,
because you say you have a small sample size, and small samples could be responsible
for a false-negative test result .
Yes total people filling out survey is n=26 ; c1=9, c2=8, c3=9 so its very close was hoping to just state this objectively somehow.

I would just look at the descriptive statistics of the sample (% of post-survey
participants in condiations 1, 2, 3) and try to judge whether the difference is
so large that it could produce bias. There are no objective statistical criteria
for this.

With kind regards

Karabiner

Last edited:

#### Karabiner

##### TS Contributor
Since I used Kruskal-Wallis can i do ordinal logistic regression for both the finite (scale of 1-5) and infinite ordinal variables (e.g.- # of points) and for this regression then the dependent variables are the variables used in the analysis I mentioned in the first part of this message
I am not quite sure what you mean. My suggestion was to use the ratin (1-5) scale as depenedent variable in an ordinal regression, and condition and other variables as predictors. Whether # of points could be used here depends on the actual range of that variable. If # of points has a wirde range, then an ordinal regression is not useful (as far as I know) and you'd better use ANOVA or linear regression.

An alternative I was thinking stratified analysis but participants aren't evenly distributed among the strata of the confounder. Is one of regression or stratified analysis preferable in cases like this?
What exactely do you mean by stratified analysis here?

One other way of looking at this possibly...is I'm interested in a test for equality of proportions of for each value of confounder variable X in each condition (if proportions of people were close to equal for each value of X for each condition then this would indicate results I found in initial analysis not due to confounder?)...
of the confounder could cause trouble.

Yes total people filling out survey is n=26 ; c1=9, c2=8, c3=9 so its very close was hoping to just state this objectively somehow.
I can hardly imagine that anyone would not assume this as an equal distribution.

With kind regards

Karabiner

#### wynand

##### New Member
I am not quite sure what you mean. My suggestion was to use the ratin (1-5) scale as depenedent variable in an ordinal regression, and condition and other variables as predictors. Whether # of points could be used here depends on the actual range of that variable. If # of points has a wirde range, then an ordinal regression is not useful (as far as I know) and you'd better use ANOVA or linear regression.
Yes, I meant if the variable has a larger range...linear regression sounds right.

What exactely do you mean by stratified analysis here?
I meant holding the (possible) confounding variable constant at each of its 3 values and then doing the same statistical analysis on each subset of data as in the main analysis previously described (i.e. - kruskal wallis then Dunn) I think if you can show the dependent variable has the same relationship for each value of the of the confounder as it does if you look at the aggregated data that rules it out as a confounder (I have seen some suggestions of showing if new values (with holding confounder constant) aren't off by more than 10% then it helps rule out the confounder.

of the confounder could cause trouble.
This seems to me like it would be the simplest to do but I am a little confused on how to demonstrate this for two reasons.

Per one of your previous posts there is no real population here. I just want to show that within this sample for each condition the distribution of the confounder is roughly equal. Not that this distribution would be representative of all future experiments run in this way (although that might indeed be the case). Second, I had originally thought a non signficant result of a chi square test of homogeneity would provide evidence of this (assuming a big enough sample size) but the null hypothesis for this test is that the proportions for each condition are equal so a non-significant result just says not to reject null which I don't think I can argue is evidence that the proportions are equal right? Is there another test of equality I can do where the alternative hypothesis is that the proportions are equal? Looking for one that works with a large sample size and one with smaller sample size if it exists.

One other thought that occured to me is showing that the type II error of the chi/square test is low might be evidence for that the proportions are close to equal?

I can hardly imagine that anyone would not assume this as an equal distribution.
I agree.

Thank you for suggestions.
With kind regards

Karabiner

Last edited: