# Thread: Enrichment analysis between two sets of proteins

1. ## Enrichment analysis between two sets of proteins

I'm trying to calculate a statistical measure describing the overlap of two sets of proteins. The issue is that the sets are from different organisms with a different complement of proteins and some defined homologs.

Example for what I mean:

species #1
100 total proteins

species #2
200 total proteins

overlap
50 homologs

Say I choose 30 random proteins from set 1 and 40 random proteins from set2, and the overlap of the two groups of random proteins is 10 homologs.

How could I describe this overlap with some sort of statistical value?

Thanks a lot.

2. ## Re: Enrichment analysis between two sets of proteins

I will try to propose something, but I am not sure if this would make any sense for your case..

Let A and B be the number of protiens in the two sets. Let assume A>B.
Let C be the number of common proteins you get.
I am trying to define a quantity which will vary from 0 to 1 where 0 indicates no overlap and 1 indicates maximum overlap

Basically, the first fraction gives you the overlap percenage and the second fraction just scales it so that the index is within [0,1]

3. ## Re: Enrichment analysis between two sets of proteins

Thanks for the reply. I've thought about something similar, but don't I somehow have to take into account the total proteins in each set and the maximal possible overlap?

If I can ignore that, maybe something like your proposal would work. What I was thinking about doing was the following:

Randomly take X number of proteins from set 1 and Y number of proteins from set 2
Determine the stat for the overlap between these 2 groups of proteins
Repeat N times
Plot a histogram for the results of this simulation
Determine where my experimental group of proteins from set 1 and set 2 of size X and Y, respectively, fall in the histogram
Calculate the probability of getting this amount of overlap based on the simulation

4. ## Re: Enrichment analysis between two sets of proteins

That's a reasonable approach. I've been wondering though
Say I choose 30 random proteins from set 1 and 40 random proteins from set2, and the overlap of the two groups of random proteins is 10 homologs.
Are you really just choosing these proteins at random or is there a specific reason you chose these 30 and these 40?

5. ## Re: Enrichment analysis between two sets of proteins

I have two networks in my analysis, one from species 1 and one from species 2. Each of these networks is associated with a different number of proteins from their respective organisms (30 and 40, respectively, in my example). So, when I do my random simulation, I wanted to choose the same number of associated proteins from each species set.

6. ## Re: Enrichment analysis between two sets of proteins

Ok. Well your method sounds good to me. You're essentially doing a randomization test.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts