I am new to the Data world and am facing a problem which I can’t seem to fix.

My task is to analyze why some people cancel their policies and some don’t (find sig. factors) and to estimate the risk of cancelling for different groups. I was thinking of using survival analysis, more specifically cox-regression analysis, to find this out.

My problem is that my data set contains policies starting from about 1950, but data about cancellation is only available from 2011 onwards due to data protection laws. That means that there is a (probably) very large set of data missing (people who started AND cancelled their policies before 2011). Survival analysis will therefore I think give misleading results. The censoring possibility in survival analysis is of no use as I don’t know how MANY policies were started before 2011 and, also importantly, WHEN they started (as duration is a key component in survival analysis). Usually censoring is usefull when you know how many cases started "Treatment" and when they did, but don't know if they "survived".

Does anyone have an idea which analysis I could use or how to overcome this problem otherwise? I’m using R, btw.

Thank you in advance! ]]>

I'm planning to test some parts but because of all the replicates I can only afford to test 6 parts. The success criteria is after accelerated aging and chemical treatment the destructive pull force of the materials must not be less than 25% of their original baseline pull force. I haven't received the parts yet and do not have a clue of the pull force or standard deviation so doing a power and sample size in minitab leaves me with too many variables. And even then I don't think this seems like a 2 sample t. Any suggestions? Thanks,

Kevsterini ]]>