1. ## Patriotic Worms

Hello all, I'm studying the Patriotic Worm…

Patriotic Worms are made of RED and BLUE segments with an average of 100 segments per worm. Occasionally, segments become infected with the Patriotic Worm Virus (PWV) which turns segments WHITE.

** We hypothesize that RED segments are more likely to become infected with PWV than BLUE segments.

We take a sampling of 4,000 Patriotic Worms and find that there three times more RED segments than BLUE.

Patriotic worms sampled:
4000

RED segments:
300,000

BLUE segments:
100,000

The ratio of RED to BLUE segments sampled is likely the same ratio that would be found by sampling all Patriotic Worms in nature.

We also find that there are a total of 12,000 segments infected with PWV (WHITE) of which there are 9 times more RED infected segments than BLUE.

Total PWV infected segments:
12,000

RED segments infected w/PWV:
10,750

BLUE segments infected w/PWV:
1,250

Intuitively, I might conclude that my hypothesis is correct but I haven't accounted for the difference in the number of total RED to BLUE segments.

So how should I justify that my hypothesis is correct?

Thanks. -P-

2. ## Re: Patriotic Worms

Great stuff. I may have missed this but do the segments alternate (e.g., red, blue, red, etc.)? With causality, can you say that more red segments are impacted or can it be that worms with more red segments are more vulnerable?

Tests, if the worms have the same number of blue and red segments (?), then there should be the obvious 1/4 ratio that should be examined if the observed is statistically different than the expected. Please provide the previous information so we can work on determining the appropriate test.

Sorry, I just noticed the 100,000 versus the 300,000 - so do segments alternate but start and end with red? There may be simple approaches to this or if red segments are bunched perhaps more complicated approaches.

3. ## Re: Patriotic Worms

Colored segments of the same type (RED or BLUE) tend to be adjacent. Here are 4 example worms which reproduce the RED to BLUE segment ratio of the original sample (3:1).

1) r r r r b b b r r r
2) b b r r r r r b r r
3) r r r r r r r r r r
4) b b r r r r r r b b

Also, infected segments tend to be adjacent and can overlap RED and BLUE segments. Although infected segments turn WHITE, we still know the original segment color. Here is an example of an infection in Worm 1 above.

1) r/w r/w r/w r/w b/w b b r r r

While worm segments can be classified as either RED or BLUE each segment is ultimately UNIQUE (due to genetic differences – yes each worm segment is genetically different within the same worm) across all worm segments sampled. Therefore, I believe it is safe to treat individual worm segments as independent even though colored segments do appear adjacent. In fact, it may be easier to consider all sampled worm segments as concatenated together into a super-worm and ignore individual worm vulnerabilities.

The likelihood that a colored segment is the same as its neighbor does NOT depend on the color of the neighbor. Instead, colored segments run adjacent to build colored REGIONS which perform different functions (RED or BLUE function). The location and length of any particular REGION depends on the needs of an individual worm. Thus, the location and length of any particular REGION within a worm appears to be random as can be seen in the example worms above. However, there is a preference for RED regions to be longer than BLUE regions which produces a 3:1 ratio for the total number of RED to BLUE segments sampled.

4. ## Re: Patriotic Worms

If you want to go very basic and you feel the assumptions hold true - you could do a Fisher's exact test. An example how this would look in the SAS program is as follows.

Code:
``````data worm ;
input color \$ infected \$ N;
cards;
Yes Yes 10750
Yes No  289250
No  Yes 1250
No  No  98750
;
run;
proc freq data=worm order=data;
table color * infected / fisher;
weight N;
run;``````
Obviously you would not have to do this in SAS, but using it as an example. From this basic analyses using your sample you would get p-value < 0.0001, and an odds ratio 2.9360 (CI: 2.7678, 3.1145), meaning red segments are approximately 3 times more likely to be infected than blue segments.

However, with this much data you could probably do more advance analyses (multivariable logistic regression, controlling for other potential predictors). I also wondered if positioning of the segments affected infection. But it is all up to you.