two groups match or differ on basic demographic variables- how to check?

My study:
- 2 groups- 1. Not allowed to work; 2. Allowed to work (control)
- compare the groups on various aspects like quality of life, depression, anxiety, etc on standardized quantitative questionnaires
- compare if coping ability (2 standardized quantitative scales) predict scores on above measures

Currently I'm running preliminary analysis on collected data (n=103+) which is still ongoing. Have checked for normality.
Some questionnaires' data was normal, on others it wasn't.- will come to questions on these later.

QUESTION 1: how do I check, using SPSS & syntax, whether my 2 groups are matched/ do not differ on basic aspects like age, gender (i'm expecting skewed), education, income level, etc.

I'm guessing t-tests...

QUESTION 2: If they are not matched- what would the next step be?

Thanks in advance!
Last edited:


TS Contributor
Two groups can be compared with respect to categorical variables such as gender by using crosstabulation with the Chi² test, with respect to ordinal variables such as education by using the Mann-Whitney U-test, regarding inteval scaled variables such as age by using t-test. The signifcance level should be set more liberal, e.g. 10%, since one wants to avoid false retention of the null hypothesis here.
QUESTION 2: If they are not matched- what would the next step be?
Propensity score matching. Or using the significantly different characteristics as covariates, if they are associated with the outcome measures.

BTW Andy Field's "Discovering statistics using SPSS" has been highly recommended for beginners.

With kind regards



New Member
Dear psy_vm,
Do you have panel data or just one set of observation of your two groups ? Because from your description and your variable of interest, the status (i.e. being allowed to work or not) is surely endogenous to some observable and unobservable characteristics. If you don't have panel data with intra-variation of individual, then your identification strategy is at risk.
If you don't have several observations per individual, I would also suggest a propensity score matching to be able to compare the two groups.
In that case, make sure you have overlapping.

For your normality check, Usually it is not done ; however you can check for that using the Jarque–Bera test.
Thank you for your reply Arth.
It's more of a cross-sectional data- various aspects measured for each individual at one point of time- like in a survey. can you please elaborate why my identification strategy is at risk? with any aspect there is always a possibility of confounding variables but we try to keep them at a minimum...
So, I carried out the comparison tests.
My groups didn't differ significantly on age. I got an expected sig. difference w.r.t. gender.
Similarly w.r.t. marital status. However, it also showed :
a. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.90.

Even for my other cross-tabs of 'education level it said:
a. 6 cells (60.0%) have expected count less than 5. The minimum expected count is .44.

What do I do in such a situation?


New Member
Your aim is to identify why (i.e. find a causal relationship) some people are allowed to work and some aren't, am I correct ?
What you observe in a cross section is a set of observable characteristics for individuals in each groups. Yet you don't know if they got those characteristics because they were in a certain group or if they had those characteristics which led them in this group. You see ?
For instance, (I don't know on which country you are working) you can imagine that a young man who was told he can't work will be less likely to marry after.
So if you regress the status on a set of variable that have a significant impact on being able to work or not, what you get is not really the probability of being in the group.
Moreover, there might be some unobservable characteristics that drive the status. You could control for them using individual fixed effect if you observed individual at least twice (which is apparently not your case). If you don't control for them, your results are biased.
If on the contrary you want to describe the two groups, there is no big issue. You can tell that people unable to work are less likely to be married than people who work. It is really the causality that is at risk because being in either of the two group is the result of ex ante characteristics (what you would like to know) but you observe ex post characteristics that might be a reaction to the status itself.

Concerning your chi-2 test, when you have less than five observation, you usually merge cells to have at least 5 obs. It's always a shame to lose informations but you have no choice.

Have a nice day.
oops Arth, you got it all wrong about my objectives...
I'm studying people who are on visas which do not allow them to work, thus hypothesizing that they would have negative quality of life, more depression etc compared to those allowed to work. thus, the cross-sectional viewpoint.