# Thread: Data analysis - which test is best

1. ## Data analysis - which test is best

I would like to know which test would be appropriate for testing my hypothesis.

I am doing a research project on parasitic infections in snail hosts. Snails are infected when coming into contact with bird faeces.

My study takes samples of snails from 5 sites and dissects them to check for prevalence of infection.
Site 5 is 130 metres from a known bird roosting site.
Site 4 is 300 metres from known roosting site
Site 3 is 524 metres from roosting site
Site 2 is 664 metres " " "
Site 1 is 786 metres " " "

At each site I collected 50 snails.

Site 1 had 6 infected snails (out of 50)
Site 2 also had 6 infected snails (out of 50)
Site 3 had 7 infected snails

Hypothesis - More parasite infections would occur at site(s) closest to the known bird roosting site

Any thoughts on which test would be the best to check my hypothesis??

2. ## Re: Data analysis - which test is best

Among all other potential ways, one way is possible. You can correlate the distance (in meter) with the infection status (which is something binary). You should use a Spearman correlation coefficient for that purpose.

3. ## The Following 2 Users Say Thank You to victorxstc For This Useful Post:

GretaGarbo (04-22-2013), Justice! (04-22-2013)

4. ## Re: Data analysis - which test is best

I have used Minitab to do a Spearman correlation coefficient. Would I be right in stating that the Pearson's r value is the 'p' value??

5. ## Re: Data analysis - which test is best

Pearson's r value is a correlation coefficient like Spearman's (but making different assumptions and calculated a different way). It is not the "p" value which is an assessment of how likely that the results you got were entirely due to random error. You will have a p value with both Spearman's and Pearson.

6. ## The Following User Says Thank You to noetsi For This Useful Post:

Justice! (04-22-2013)

7. ## Re: Data analysis - which test is best

as noetsi said, no. Correlation coefficients (Pearson or Spearman) give you two values: A correlation coefficient (Pearson's R or Spearman's Rho) and the P value.

The correlation coefficient shows the extent and direction of the correlation. For example you can find that R = -0.34 P = 0.008. In this example, there is 34% correlation between distance and infection. Note that the sign is negative. Therefore, there is a negative 34% significant correlation, meaning that the shorter the distance, the higher the chance of infection.

However, please note that you should use a Spearman's coefficient, instead of Pearson's. I don't know if you have SPSS or not. But if you had SPSS you could do the followings to run the Spearman. If you have done Pearson's test in Minitab already, I think you won't have difficulty in doing Spearman in Minitab. However, before that, make sure you are dealing with 250 rows in your spreadsheet file (each row for a single specimen), not with 5 rows (not each row for a site).

In your SPSS file, you have 250 cases, right? (5 sites, each with 50 cases, so a total of 250 cases). In your raw data file (with at least 250 rows), just write the distance value for each site, in a new column, besides each of your 250 cases. So for example you need to write the number 524 (the third distance) for 50 times, besides the corresponding rows. Then make sure your column dealing with "infection status" is all 0 and 1. If not, create a new column which contains the infection status of each of 250 cases as 0 and 1. No you have two columns, each has 250 cases, and each row shows a single snail: its infection status (0 and 1) and its distance. Now go to analysis -> correlate -> bivariate, and select Spearman test and select those two columns. The test is now ready to be run.

8. ## The Following User Says Thank You to victorxstc For This Useful Post:

Justice! (04-22-2013)

9. ## Re: Data analysis - which test is best

I put my distance figures into row C1 then the number of infected snails into column C2, ran the test and got these results: (Just checking I am on the right track so far!)

All 2 1 1 1 5
40 20 20 20 100

Cell Contents: Count
% of Total

Pearson's r -0.970143
Spearman's rho -0.974679

10. ## Re: Data analysis - which test is best

Apologies Victorxstc, I posted before seeing what you had wrote. I do not have SPSS (I have tried to download a trial version but keep getting an error message when trying to download) I will persevere with trying to get it

11. ## Re: Data analysis - which test is best

If you are doing this work near a university, they commonly have SPSS on their computers these days.

The two values you noted (for Pearson's R and Spearman's Rho) are very close, effectively the same thing.

If either of your variables is coded as a dichotomy (that is for example infected/ not infected) then neither Pearson nor Spearman's will work correctly. You need to do polychoric correlations although I doubt Minitab will do this (even SPSS and SAS won't in the core code, they need special Macros or R code in the case of SPSS).

12. ## The Following User Says Thank You to noetsi For This Useful Post:

Justice! (04-22-2013)

13. ## Re: Data analysis - which test is best

No problem Justice

They are similar and Minitab is efficient. However, before any analyses, please make sure you are dealing with your raw data, not the summary of your raw data. In your Minitab file, you should have at least 250 rows. If your file is like that, your correlation coefficients are very good, as the more the coefficient is near the value 1, the higher the correlation.

14. ## The Following User Says Thank You to victorxstc For This Useful Post:

Justice! (04-22-2013)

15. ## Re: Data analysis - which test is best

My University does indeed have SPSS, I am off on Wednesday and intend to go in to use their computer. Just thought I would try and download now or try a different software (Minitab) so I could get crackin' instead of waiting until Wednesday
Thank you for the tip

16. ## Re: Data analysis - which test is best

Originally Posted by Justice!
Site 5 is 130 metres from a known bird roosting site.
Site 4 is 300 metres from known roosting site
Site 3 is 524 metres from roosting site
Site 2 is 664 metres " " "
Site 1 is 786 metres " " "

Thank you Justice! Or what should I call you?

Now we know if distance is significant – or not.

Maybe you could cooperate with Palmer86, because he has got identical data as you!

Oh, maybe he has plagiarized your result? Or maybe you should be careful with him since I was told that he had not been the most polite person. Or maybe you could cooperate with Mmanuel, a person I tried to help a lot. You two – I mean, you three – seems to have a lot in common.

Justice, if you find a topic difficult, then you see, there is a search engine called Google, that can be very useful. For example I googled “logit model” and saw 690 000 links. You should not expect someone else to write a thesis for you when there already are 690 000 others for you to read before.

Hlsmith suggested Fishers exact test. Karabiner pointed out that a chi-squarred test could be used. Victorxstc literally did the test for you.

When someone is serving the results on a silver plate for you, do you find it embarrassing to say “thank you” then?

If you find it humiliating (“squat”) to say thank you, then I suggest that you don't do that!

I will withdraw from this subject. I have tried to help you in many posts. But please don't thank me!

17. ## The Following 2 Users Say Thank You to GretaGarbo For This Useful Post:

Justice! (04-23-2013), victorxstc (04-23-2013)

18. ## Re: Data analysis - which test is best

[QUOTE=Justice!;123875]I would like to know which test would be appropriate for testing my hypothesis.

I am doing a research project on parasitic infections in snail hosts. Snails are infected when coming into contact with bird faeces.
My study takes samples of snails from 5 sites and dissects them to check for prevalence of infection.
Site 5 is 130 metres from a known bird roosting site.
Site 4 is 300 metres from known roosting site
Site 3 is 524 metres from roosting site
Site 2 is 664 metres " " "
Site 1 is 786 metres " " "

At each site I collected 50 snails.

Site 1 had 6 infected snails (out of 50)
Site 2 also had 6 infected snails (out of 50)
Site 3 had 7 infected snails
Each snail has 2 characteristics: a) infected yes/no and b) its distance from the
roosting site. You could try a Mann-Whitney U-test with infected yes/no as
grouping variable and distance as dependent variable. This will show you
whether in the infected group the distances are significantly higher or lower than
in the non-infected group.

With kind regards

K.

19. ## Re: Data analysis - which test is best

Originally Posted by noetsi
the "p" value which is an assessment of how likely that the results you got were entirely due to random error
Beg your pardon, but wouldn't that mean p(Hypothesis|Data), i.e. Bayes statistics?
With the frequentist approach, we achieve p(Data|Hypothesis) .

With kind regards

K.

20. ## The Following User Says Thank You to Karabiner For This Useful Post:

Justice! (04-23-2013)

21. ## Re: Data analysis - which test is best

[QUOTE=Karabiner;124000]
Originally Posted by Justice!
I would like to know which test would be appropriate for testing my hypothesis.

I am doing a research project on parasitic infections in snail hosts. Snails are infected when coming into contact with bird faeces.

Each snail has 2 characteristics: a) infected yes/no and b) its distance from the
roosting site. You could try a Mann-Whitney U-test with infected yes/no as
grouping variable and distance as dependent variable. This will show you
whether in the infected group the distances are significantly higher or lower than
in the non-infected group.

With kind regards

K.
I agree on that, but doesn't a correlation coefficient suffice. Besides, I guess before Mann-Whitney, Justice should do a Kruskal-Wallis to see if there is any overall difference between the 5 sites' infection rates or not. Well, a Kruskal-Wallis does not directly show the direction and extent of the "correlation" (and further evaluations would be necessary), at least not as clearly as the correlation coefficients show the extent and direction of the association.

Besides, when doing Kruskal-Wallis and Mann-Whitney tests, the length of the distance is discarded, because it would be used Only as a grouping variable; while in correlation coefficients, the distances (in meter) would have a meaning, which this favors the accuracy of the results.

Kind regards

22. ## The Following User Says Thank You to victorxstc For This Useful Post:

Justice! (04-23-2013)

23. ## Re: Data analysis - which test is best

I agree on that, but doesn't a correlation coefficient suffice.
Perhaps. But I feel uneasy with Spearman on binary-versus-rank-data.
Maybe some forgotten childhood experience.
Besides, I guess before Mann-Whitney, Justice should do a Kruskal-Wallis to see if there is any overall difference between the 5 sites' infection rates or not.
That is, treat infection yes/no as ordinal? I had rather assumed that this was
categorical, in which case the Chi² could apply (expected frequencies are all
> 5, AFAICS).
Besides, when doing Kruskal-Wallis and Mann-Whitney tests, the length of the distance is discarded, because it would be used Only as a grouping variable;
I would treat it ordinal DV, not as grouping variable. I guessed
that since there are 5 fixed distances and none in-between, ordinal
would be appropriate.

With kind regards

K.

24. ## The Following User Says Thank You to Karabiner For This Useful Post:

Justice! (04-23-2013)