What comparisons to run and what to correct for multiple comparisons?

I've done some investment backtesting where I build random strategies and compare performance. To generate adequate sample size, I ran simulation three times to get 102 strategies. I'm looking at 102 best and 102 worst. I'm also looking at 102 4-rule strategies and 102 2-rule strategies. I tested on two different training periods. I want to test long and short positions. In total, that is 102 * 2 * 2 * 2 * 2 = 1,632 strategies.

I plan to use my spreadsheet program to run "t-test: Two Sample Assuming Unequal Variances" for comparisons.

My two questions are: what to compare and how many multiple comparisons to correct for?

I am interested in the effects of best vs. worst, # rules, training period, and direction. I guess I have to be concerned with interactions, though. If I collapse and test 816 long vs. 816 short, I might miss an interaction variable (e.g. maybe long and short are not different but 2 vs. 4 rules is and an apparent overall long/short difference might be just due to the 2/4 rules component). Should this be an Anova? Each category is itself mutually exclusive (e.g. a strategy is either long or short) but different categories are not (e.g. both long and short positions may come from 2- and 4-rule strategies).

Is this a 4-factor analysis?

Then--how many multiple comparisons to correct for, or will the Anova do that itself?

Any suggestions--and/or links--are appreciated!