I'm working on developing a diagnostic test. I have a small clinical dataset that was evaluated with the diagnostic test and a very reliable gold standard that was used to generate the standard 2x2 contingency tables. I have estimates of the diagnostic test's specificity (and false positive fraction) and sensitivity (and false negative fraction).

Some time has passed. Due to the nature of the "disease state" (could be latent development that would never have registered a "positive" at the time by the Gold Standard was applied), I am wondering if some of the diagnostic false positives, upon re-evaluation with the same Gold Standard, will register as "positive" by Gold Standard, and thus be "true positives".

In other words, I would like to re-evaluate samples from one quadrant of a contingency table (Dx+, Gold Standard -) to see if any are now Gold Standard+ after some time has passed.

(I assume?) the null assumption would be 0% of patients are Gold Standard+, and I'd like to know:

1a) What is proper test to use to evaluate if there is an increase in True Positives (from null proportion of 0%)? McNemar test assumes utilization of full 2x2 table--this study design presumably uses half (1x2 table).
1b) Related to 1a, I would need to calculate a sample size for this study (don't want to retest ALL false positives with the Gold Standard), so understanding the test to run would help.
2) Is there a problem with the study design? Worried this may be a bit similar to a discrepant resolution analysis, which has many underlying biases
3) I understand that a coin-flip at the Retest would artificially inflate the number of True Positives I get (hey, half the samples I retested are now positive!), but we can assume the Gold Standard has very high accuracy.

Really appreciate any feedback.