How to extract representative random subsample from large sample?

Hi! I am new to SPSS and currently looking at data on family firms and non-family firms and their performance - I want to compare the group of family firms (70 firms) with non-family firms (25 000 firms). As you can see, the sample sizes are very different and I want to make them more similar. Therefore, I wonder how I can extract around 70 random firms from the 25000 non-family firms and still keep the same characteristics (such as standard deviation and mean) on key variables (as age, profitability, size etc.). This way it is a representative subsample, right?

Thank you for your time! :)


Active Member
This is wrong. It has been proven that ignoring information always leads to decreased accuracy of estimates and decreased power of relevant statistical tests. There are methods developed for unbalanced samples. You should read about those and apply one of them. SPSS has them implemented conveniently.... Alternatively, you may want to consider propensity score matching but the usefulness of this method is limited.

In general, if you are not an expert in statistics, you should perform a certain procedure not because you "want" to perform it but because you have read about it in a book or another respectable resource.
Thank you for you reply! I "want" this based on what I have read in books and online. I have tried the propensity score matching but I have not succeeded. If I understand correctly, the propensity score matching is between a treatment group and control group. Here, I try to make a subsample of only the non-family firms that is representable for all non-family firms in the original sample. Are there any guides online for this process or someone who can give some guiding steps in spss? Thanks in advance :)


TS Contributor
You have several orders of magnitude more data in the non-family group so I think you could just calculate the relevant statistics for that group and use a one-sample test with the family group.

Thank you! In addition to the t-test I am looking to do a regression to capture the effect of being a family firm on profitability (OLS with firm fixed effects):

Profitability = a + bFamilyDummy + control variables + u + e

Will the unequal sample sizes matter here or do I need to make some adjustments?