Dear Group,
I am starting a PhD, its of the 'build your own' variety, i.e. its initiated by myself and not part of an existing project. I’m currently working on the experimental design.

I am looking at developing a 'clinical decision rule' from a large block of data (about 7,500 patients) however the difficulty is that I will have a very large number of predictor variables (typically 300 to 400), with very few actual outcome events (<10%). There probably are interactions. Linearity is not assumed, colinearity is a possibility.

I clearly exceed the limitations of logistic regression that require outcome events to be ~ 10 x Number of predictors. Large scale dimensional reduction is counter to the core hypothesis. Many other tests also seem to be unsuitable.

I am now hovering between Multiple Adaptive Regression Splines or Neural Networks but I have spent a day looking at the limitations of each and not really much clearer. My stats knowledge is 'old', which does not help..

Any input from anyone out there dealing with this kind of high dimensional problem would be of benefit. I honestly dont know where else to post and dont have anyone around me to bounce ideas of so I need to reach out to electronic communities a bit.

CHEERS!