Turing test analysis


I’m having problems with the analysis of one of the tests I’ve done on the evaluation of synthetic speech. The test was set-up like a Turing-Test: to each test subject, 20 sentences were shown (10 original and 10 synthetic). The subjects had to guess for each sentence if it was original or not. So for each subject, we get X out of 20 questions correct.
Intuitively, the more this X reaches 50% of the total number of questions, the more the scoring was ‘random’ and thus the more the synthetic and original samples were similar. However, I’m having difficulties with a proper statistical analysis of these results. Should I use a binomial or chi-square test? Should I put all answers of all test subjects together to calculate a final % correct, or should I calculate a separate % correct for each subject and work somehow further with these values?
I hope that somebody can give me some advice on this matter!

Thanks in advance,