I'm not an expert on theory but the data itself to some extent will suggest. For instance if you have a binary variable it is likely from the binomial distribution. Or a randomly sampled continuous variable is likely from the normal distribution. Count data is likely from the Poisson distribution.
Next you try to fit a model. The residuals of the model can often (maybe always) tell you if the data comes from the distribution you're model says it's from.
I'm sure other theory based people on here have a great deal more to offer but that's my applied 2 cents.
Suppose there are 1 million parts from which 1% are defective i.e 10000 part are defective. Now I am taking different sample sizes from 1 million which are 10%,30%,50%, 70% and 90% of original population i.e 1 million. I want to find probability of detecting 5000 defective parts from each sample independently. issue is suppose I am finding probability of maximum 5000 defective parts in any sample size below 50% of population, the answer is 0, Here p value is 0.01 and q is 0.99. At 50% of total population, probability of finding 5000 defective parts is 0.5 and sample sizes above 50% gives probability 1. In all sample size value of p is 0.01 and number of defective parts is 5000. Only sample size is changing. Now here we have so difference from 0 to 0.5 and then 1. there are no intermediate values between them although we are changing sample size linearly. Can someone plz tell me the problem. I really need to solve it