I'm working on a problem where I have 30% of a full dataset and I have to estimate the generalization error.

To be more precise, let's say I have the information of the transactions of the clients of a bank which has 30% of the countries market.

I can easily get the mean, standard deviation and so on of my dataset but I can't figure out how to extrapolate.

I know all about basic statistics and so I went through all my lectures to try to find the answer.

I would like to calculate the sampling error of my dataset.

To explain clearly:

population: the information of all transactions in a certain country.

mean : m (unknown)

standard deviation: sigma (unknown)

my dataset : the information of the transactions of the clients of 1 bank owning 30% of the countries market. (so 30% of the whole big dataset)

mean : m* (known)

standard deviation : s (known)

The problem is that in all the examples I see, the formulas all include the standard deviation of the population, which i don't have.

I used this formula:

m=[m* ± 1.96*sigma/√(n)] n being the size of my sample

They use "sigma" which I don't know.

The second part would be to get the variance (or standard deviation) error but i'm not there yet.

Any help would be apreciated,

Thank you

Nicolas