STATA Help - T-tests with Unmatched Data

Hi everyone!

I'm running STATA and have a bit of a problem. I have employment data for minority-owned firms and non-minority owned firms. Employment data is all in one column, and I've separated it by dummy variables so I have "majorityemp" with about 350,000 observations and "minorityemp" with about 1,000 observations. The "minority" column has Y or N, with a blank for companies that did not disclose that data.

I want to test if there's a statistically-significant difference in the means of the two groups, but whenever I run "ttest majorityemp=minorityemp" I get "zero observations (r2000)" error.

When I try running "ttest minority, by(emp)" I get "more than 2 groups found, only 2 allowed r(420);" I assume this is because the non-reporting companies are tacked on as a third group.

My question: is there any way I can do a ttest by employment for JUST minority=Y or N, thus filtering out the non-responses?


TS Contributor
Let me check if I understood correctly, you have your data ordered in this way:


X1 Y X1 Y1

If that's the case there could be issued since the first two columns will have more observations, and STATA may identify the cases incorrectly.

Try using the original dataset, erasing both dummy variables.


X1 Y

With the command ttest employment, by(minority)

now if the problem is missing data, you can drop those observationes with an if

That was exactly what I had hoped to do :) Unfortunately, I seem to be having some trouble with that if statement - I can't seem to say "if minority=Y or minority=N" in the proper coding language. Any ideas?

Also, when I just try and do it the way you suggested, I wind up with that 'greater than two groups' issue, because there are non-entries for "minority".
if you're in stata remember that you need to use == for equals in logical statements

If you have three values for minority, you can just say if minority!=3, or whatever. != is the not equal symbol.

Or you could try if minority==1 | minority==2, since | is the OR operator.
Last edited:

Where would I use the if statement? I can't use it in the "ttest emp, by(minority!=0), because I get "!= invalid name in option by()"

I tried running "ttest emp if minority!=." to get rid of the non-term observations, but that comes back with "type mismatch."


TS Contributor
ttest emp, by(minority!=0)

What are you trying to do with that?

To eliminate missing data try with

drop if minority=="."

Now, if minority is introduced with text (Y,N), you may need to encode it properly. Type edit to open data editor and if the letters are written in red then you have a problem.

Use encode minority, gen(newvar)

To perform the ttest just with:

ttest emp, by(newvar)
Ok what are all the values for minority? are they strings? is the missing data actually empty or is it just another value like NA?

If you have numeric values and the missing values are actually missing, so they show up as . in STATA, you can say ttest employment, by(minority) if minority<.

<. excludes missing numeric values.

See here for more.

You might also want to check to make sure you really only have two real values of minority--could be you have some typo or something that's being read as a third value.
I'm sorry for being such a continued bother, but it simply isn't working. First thing I did was take terzi's suggestion and "encode minority = gen(minoritydummy). That gets me a new column with the same values, in blue instead of red. I then "tabulate minoritydummy" and get the following:

minoritydum |
my | Freq. Percent Cum.
| 13,734 1.90 1.90
N | 706,986 97.78 99.68
Y | 2,322 0.32 100.00
Total | 723,042 100.00

So it's giving me three groups - _, N, Y. The _ are showing up as ., so I tried dropping them as per terzi's next suggestion:

. drop minoritydummy=="."
== invalid name

I really appreciate all the help, I'm just feeling so lost that nothing's working.