Research Problem - Subtle Differences, is it significant?

#1
Greetings all,
I have a weird problem I could use some advice on..

Scenario: We are revamping a public health label in hopes that it will help attract more attention and reduce illnesses. We have designed 25 new versions of the label that vary visual design elements (font, border weighting, border type, placement of icons, etc..) to determine which labels are most salient to consumers. Label salience (i.e., participant degree of attention to the label) will be assessed using a limited-time exposure approach with cued recall questions. We have enough statistical power and our N is large enough.

We originally intended to analyze the differences using an ANOVA and then rank the labels based on "significance". We would then move the top 5 most "significant" labels onto the next phase of the experiment.

Differences between these labels will not be huge because we're only varying minor design elements. Now we're stuck though because we're left thinking, what does significance here actually mean?

Anyone have any ideas on either a test to better identify the differences, or another way to handle this?
Our goal is to just find 3-5 of the labels that attract the most attention out of 25 very barely different labels.

I would appreciate any and all suggestions! It's keeping everyone awake at night as we plan for this study.
 
#3
Greetings all,
...We have designed 25 new versions of the label...We have enough statistical power and our N is large enough.
I would suggest using maybe, 5 different versions at first with distinct, general themes or attributes and make adjustments off of that. Also, "enough statistical power and N is large enough..." is a vague statement that isn't really helpful. Are you going to randomize what participants can see out of the 25 or will they see all 25?

We originally intended to analyze the differences using an ANOVA and then rank the labels based on "significance". We would then move the top 5 most "significant" labels onto the next phase of the experiment.
Keep in mind that p-values may be below an alpha cutoff, but the estimated effect or association may be very small and practically unimportant. This should be considered when trying to rank what is "better".

Differences between these labels will not be huge because we're only varying minor design elements. Now we're stuck though because we're left thinking, what does significance here actually mean?
This is something to think about. If you're anticipating small differences, are they really that important? If so, proceed. Statistical significance has the same meaning as other contexts; the p-value for the hypothesis test is below some (hopefully) pre-selected alpha level that was chosen for the particular matter at hand. Aggregation of the p-value and other information would lead you to make some inference about the population's preference for your designs.

Anyone have any ideas on either a test to better identify the differences, or another way to handle this?
Our goal is to just find 3-5 of the labels that attract the most attention out of 25 very barely different labels.

I would appreciate any and all suggestions! It's keeping everyone awake at night as we plan for this study.
Again, I would recommend trying to remove 20 designs or so with 5 remaining that represent larger stylistic categories. You can always make finer adjustments within the "winners" this smaller group. People are notoriously indecisive when presented with more options as opposed to less options.
 
#4
I appreciate your help. We can't really depend on post hoc/ or ad hoc analyses unfortunately (its the way the rules are written for government). I'm going to repost with more information. I'm so stuck on this :(
 

Karabiner

TS Contributor
#5
Maybe you want to use a Bayesian approach. This will give you estimates of the differences, which then you can judge with respect to practical significanceor the like.