# Which tests to use?

#### stefansan

##### New Member
I have two problems I'm totally puzzled on which test to use, and would be very grateful for some help:

1) Say you ask 500 people what food they like, and give them a list of 100 dishes. Every person is allowed to like multiple different foods. This is the example list

Apples 7%
Bananas 60%
Meat 5%
Sausage 1%
Plantains 40%
Water 3%
...

Lets assume most foods are liked by below 10% of the people, but as you can see 60% like bananas, and 40% plantains. What test would give me a p-value for how likely one or more foods are significantly more liked than expected? Or what percentage of liked would be the significant cutoff (say everything above 25% is significant)?

2) Say i take a group of 100 people, and each of them has around 2-5 medical symptoms on average, of a list of 50 symptoms total. I want to check whether there are symptoms, that are significantly more likely to be co-occurring with other symptoms. Say symptom 1 (coughing) is present with 35 other symptoms total (in for instance 15 people). Is coughing statistically significantly more co-occuring with other symptoms?

Last edited:

#### Karabiner

##### TS Contributor
What test would give me a p-value for how likely one or more foods are significantly more liked than expected?
Within the frequentist framework (significance testing, hypothesis testing, Null hypothesis significance testing) no test will give you a p-value which
expresses how likely a certain hypothesis is. Apart from that, how would you define or derive the expected proportions here?

Or what percentage of liked would be the significant cutoff (say everything above 25% is significant)?
Unfortunately, no test can tell you which cutoff you can call significant (in the common sense, I suppose, not in the statistical significance testing sense). You supposedly have to develop an idea of which proportions you would find unexpecetedly/exceptionally/importantly high. Or, you could try to compare between the dishes.

2) Say i take a group of 100 people, and each of them has around 2-5 medical symptoms on average, of a list of 50 symptoms total. I want to check whether there are symptoms, that are significantly more likely to be co-occurring with other symptoms. Say symptom 1 (coughing) is present with 35 other symptoms total (in for instance 15 people). Is coughing statistically significantly more co-occuring with other symptoms?
More than what?

With kind regards

Karabiner

#### Miner

##### TS Contributor
You might try ANOM (Analysis of Means) for binomial data. The null hypothesis for ANOM is that no individual mean is different from the group mean.

#### Karabiner

##### TS Contributor
But ANOM is for independent samples (if I am correct), while here we have repetad-measures data.

With kind regards

Karabiner

#### Miner

##### TS Contributor
You are correct about independence of the samples. However, the design structure is unclear, so I am not seeing the repeated measures aspect.

#### Karabiner

##### TS Contributor
I derived the repeated-measures assumption from "Every person is allowed to like multiple different foods",
but indeed the study is presented not very clearly.

With kind regards

Karabiner

#### Miner

##### TS Contributor
Hmm.. I suspect that food preferences such as those listed would be fairly independent unless you include similar foods (e.g., spaghetti/fettuccine). However, the second question regarding symptoms would definitely lack independence and the OP is expecting to see it. Maybe a Chi-square test for association would work here? Although some type of paired test would be more powerful.

#### stefansan

##### New Member
More than what?

Karabiner

First of all, thak you for your thoughts! My question is if I can say that coughing correlates un-usually high with other symptoms. Given the data distribution or such, is there a way to say that there is one (like coughing) or multiple other symptoms that co-appear with other symptoms in a significant manner? Not sure if I'm being clear here... say all other symptoms are more or less randomly associated, but most people do have a cough, no matter what other symptoms are present. Can I put a statistics to that value instead of just saying "75% of people with random symptoms also had a cough" for instance?[/QUOTE]

#### stefansan

##### New Member
Hmm.. I suspect that food preferences such as those listed would be fairly independent unless you include similar foods (e.g., spaghetti/fettuccine). However, the second question regarding symptoms would definitely lack independence and the OP is expecting to see it. Maybe a Chi-square test for association would work here? Although some type of paired test would be more powerful.
Say the foods were very clearly different, and not associated at all. Would ANOM work?

#### stefansan

##### New Member
I derived the repeated-measures assumption from "Every person is allowed to like multiple different foods",
but indeed the study is presented not very clearly.

With kind regards

Karabiner
What part is not clear? I appreciate your help!!

#### Miner

##### TS Contributor
Say the foods were very clearly different, and not associated at all. Would ANOM work?
I believe that a binomial ANOM would work well. If you have count data with all foods > 10 you could also use a Poisson ANOM. Just keep in mind that the null hypothesis is there is no difference between a food and the overall mean for all foods. You can end up with three possible results. 1) a food is statistically higher than the overall mean; 2) a food is statistically lower than the overall mean, or; 3) a food is not statistically different from the overall mean.