I am doing a retrospective research on breast cancer using multivariate estimates. The aim of the research is to calculate the probability of finding the breast cancer given the multiple independent variables (IVs). So the outcome should be binary in nature (cancer versus benign). Some of IVs are correlated. My question, should I do Factor Analysis on IVs before starting my logistic regression? I know that FA or PCA are used to reduce the noisy data and get better estimate using higher variance along the direction of the principles components. But intuitively, I guess that if I have multiple variables, even-though they might be correlated, they still be able to strongly differentiate the cancer from benign cases. The second statement I inferred from Baye`s equation where the posterior odd for cancer will be proportional to the product of sensitivities of each variable which gives credits to the presence of correlated IVs. So, how do both ideas be conceived with each other?

Also, what if I do FA ( factor analysis or principle component analysis ) instead of logistic regression analysis from the start. I mean from the data pool of all available cases on hands including benign and cancer cases, I may consider a particular case in hand as a query vector which should be represented as a linear combination of component along the basic eigen vectors and then matching it with the other vectors in the data matrix to get the odd for being benign or cancer!