Apologies if this question is very basic but I have spent hours looking online and in forums such as this one for an answer and I am still unsure.

I have an unbalanced panel data set of a few thousand firms, for the years 2002 to 2008. I have looked at the mean values for the audit fees that these companies pay for each year, but now I wish to compare the means for the first (2002) and last (2008) years of the panel.

I know I need to use a ttest to compare whether the means are significantly different. But I am struggling to understand which one to use. The dataset is unbalanced, i.e. some firms appear in 2002 only, some firms have averages for 2002 and 2008, and some only have averages for 2008. Therefore the samples are not entirely independent, but the observations are not all paired either.

I have successfully run an independent ttest, and the paired ttest does not work as I have no observations due to stata having to delete them because they are missing. Therefore I think I should be running an independent ttest in stata but I am not completely sure as some of the firms appear in both years.

Please would someone be able to advise me on the correct test to compare my average values for the two years of the panel?

Many thanks