In our research on apprenticeships in early modern times, we have a series of observations (cases), each with several variables. Each observation is unique because of its combination of the variables "student name" and "starting date of the apprenticeship" (a certain "student name" can occur in more than one observation, a certain "starting date of apprenticeship" may also occur in more than one observation, but a certain combination of the two can only occur in one observation - compare with primary key in Access).

We want to investigate the relationship between various other variables ("status apprentice", "parents deceased", "guild registration", "duration apprenticeship", "wages", "who pays the wages", "apprenticeship completed") by means of SPSS Crosstabs + chi-square. Variables can be dichotomous, nominal, ordinal or scale (binned to nominal categories). Later on we intend to perform Logistic Regression Analysis.

The fact that different cases can have the same apprentice might suggest that these cases are paired and that chi-square might not be appropriate. On the other hand, the values of other variables can also occur more than once, and you could say that the observations are also linked by those variables. Or by extension: in a large dataset the values of one or more variables often occur more than once, so that you will rarely see a dataset with only unpaired observations?
Can we perform Crosstabs + chi-square on this data? Can we perform Logistic Regression Analysis on this data?


TS Contributor
If I unterstand you correctly, you have n subjects, and each subject can start between 1 and k apprenticeships. How large is your sample size (n), and how many subjects appear in your dataset more than once?

With kind regards

Thank you for your reply and sorry for the delay on our end.

We have a total of 1386 cases (apprenticeship contracts). Our sample counts 952 subjects (apprentices) of who 318 subjects appear more than once.

Kind regards