Hello all, I would be interested in your thoughts about whether this approach sounds correct to you or not.

I have data of a cohort of subjects who have been recruited within a period of 5 years and the cohort itself has been followed for more than 10 years (so it is a sort of open cohort although nobody left it yet it is just a matter of different time the subjects entered it)

All the subjects at the moment of recruitment were tested for a group (n.8) of psychological tests. The tests produce a score that follows a scale, sub-sequentially adjusted for age and sex. The scores go from 0 to 20 varying within the range of decimals. I also have a final score that is a count of all the test considered below the cut-point. (i.e. if 4 tests out of 8 had scores below the cut-point identified by previous literature exploring reliability of these test this final score is equal to 4. Therefore it can go from 0 to 8)

All the subjects have also been recorded for the time of having a relapse (if they ever had any) which is a binary outcome as they can either have it or not.

I would like to test whether any of these tests at the baseline can predict the risk of relapse and I immediately thought about cox-regression model as time to relapse (from the recruitment) is also available.

but I have two methodological questions:

1) Apart for time of recruitment and time of relapse it also available the time of onset of the disease (that is before the recruitment) which corresponds to the time when the risk started. When I declare the data to be time survival data on STATA shall I just specify the time subjects enter the study ("enter" option) and adjust for disease duration (time from onset to relapse) each following analyses or shall I also specify when I declare data to be survival data the origin of the risk (in my case time of onset, using the option "origin")?

2) each of the 8 tests measure different psychological aspects but somehow they are likely to be correlated and so there might be a problem of multi-collinearity once I fit the model all together. Thus my idea was the follow:

1) test all the tests together for correlation and check which ones have the smaller coefficient

2) run a cox analysis for each test separately

3) run a cox analysis including all the tests that are not much correlated together

but then how shall I treat the final score which is a count of the tests that are below the cut-off point? As it is a summary of the previous tests I would tend to run an analysis separately rather than include in the cox regression with all the other scores.

what do you think? thanks in advance