I'm struggling with some statistical concepts for a survey I'm implementing. I'm planning on using SAS for this.

Here is the problem:

I have to check a large company geo database in the field . That is, the company have many kinds of equipments (about 20 kinds and millions of each) spreaded all over the state and I have to check and see how representative that database really is. (check if the equipment exists and if it is of the same kind they claim to be)

That state is really big, the size of a medium european country - and it also has some transportation issues, so first, I've created stratas for each equipment type and then I'll do a single-stage cluster sampling (of unequal sizes) of the cities in the state.

I've decided to use proportional to sizes sampling in each cluster. (A city with many equipments gets more samples)

Then I was thinking in using dummy variables for the things I'll see in the field (i.e. 0 or 1 if the equipment exists; 1 or 0 if it's location is correct; 1 or 0 if it's the same kind; 1 or 0 if it's the same power spec etc).

My question for you guys is:

- Is this a good method?

- Is this method resulting in a probability sampling?

- How do I calculate the error and variance of my sample after (can I do it before?) I check it? Do I use like linear regression?

- How big should my sample be? 95% confidence and 5% error. How do I calculate it since I'm using two types of sampling methods?

I'm an electrical engineer with scarce statistical knowledge.

Thank you very much for your time and attention.

