Well for one, are you really predicting disease. Are some of your covariates signs/symptoms of disease, AKA effects? If so, you are doing retrodiction. Further trouble can occur with using actual predictors and retrodictors in same model. Look up Markovian Blanket.

Yes, using univariate analyses this day and age is considered a faux pas. Variable selection should be determined via content knowledge and existing research. Just selecting based on significance risks placing irrelevant variables in the model - such as common effects (colliders) or just spurious variables. The best approach is using content knowledge with data splits. So build and validate covariates using two data splits and then test them and get estimates using a third holdout set. Splitting of data should be based on a random process.

Thank you for the reply.

I am predicting a disease state. Specifically Lymph node metastasis in a particular type of cancer. The twelve covariates are clinicopathological and radiological factors. Trying to predict lymph node metastasis based on these factors.

My knowledge of statistics is limited to basic use of SPSS and Medcalc. Due to some restrains, unable to avail services of a Statistician, so doing the statistics myself.

Content knowledge on this matter is variable and not well established.

In layman's terms this is what I was hoping to achieve:

1. There were forty three patients with cancer who underwent surgery which included systematic lymph node dissection.

2. Out of the forty three only eight patients were actually found to have disease in the lymph nodes.

3. Systematic lymph node dissection has its own set of adverse effects after surgery.

4. So if we could know a subset of patients who had high risk of lymph node metastasis, then in only such patients can lymph node dissection be done. This will save rest of the patients from undergoing a morbid procedure.

5. For this, I used twelve variables (7 continuous Interval Scale and 5 Dichotomous categorical) like patient's age, BMI, some reports of blood anf biopsy and imaging parameters.

6. Used ROC curves and AUC for the continuous variables to define cut off levels, and thus had dichotomous categories for all twelve variables.

7. Did Univariate logistic analysis for all 12 variables. Found 8 of them to have significance.

8. Have calculated sensitivity, specificity, positive predictive value, negative predictive value, False positive rate and False negative rate, and Accuracy for each of these factors individually.

Now comes the point where I am stuck

1. Was hoping to do a multivariate analysis on either these selected 8, or on all 12 to find independent predictors of lymph node metastasis.

2. Wanted to assess various combinations of these 12 (or 8 significant) factors to predict lymph node metastasis. Maybe like a Probability risk matrix where based on presence or absence of various variables in different combinations, the probability of lymph-node metastasis can be predicted.

I am aware that the sample size is small, but given the study period and disease incidence, this was the maximum number I could get. This is part of an academic dissertation, so more than the validity and representativeness of the calculated research in population at large, the approach and attempt are more important.

Have hit a block. I don't know how to go about these.