Context: I am a programmer, not a statistician, and I have only some vague ideas of statistical concepts. Also, my english is far from perfect, be patient please.
My client needs an estimate of the percentage of taxes he is going to pay, based on what happened in the recent past.
He needs something like "the percentage P will be between 1.8% and 2.0% with a 95% probability".
So I guess this is not a confidence interval but a probability interval.
Let's see what I have:
The client sells a product that comes in 3 varieties (A,B,C)
For each of the last 10 months I know what percentage of the sales was represented by each variety.
For instance, in February he sold 50% A, 30% B and 20%C while in March I sold 45% A, 35% B and 20%C. Of course these 3 numbers always add up to 100%
He has to pay taxes on what he sold: 1% over A, 2% over B and 3% over C.
Let's say all the products cost the same, 1$, and every month he always sells 100 products.
So, in February, he has to pay 1% of 50, 2% of 30 and 3% of 20. This gives 0.5$+0.6$+0.6$ = 1.7$ of taxes.
That is, He has to pay P = 1.7% of taxes in February and P = 1.75% of taxes in March.. .and so on.
Therefore, P, the percentage he pays every month, varies. My client needs an estimate of this P, with a reasonable "error".
There is NO seasonal effect on sales.
One thing I can do is calculating P in the past for each month. I calculate the mean M and the Standard Deviation SD.
I would say to the client: your P is approximately M and with a probability of 95% your P will lie between P-2*M and P+2*M.
Is this correct? One possible problem is that I have less than 30 months to work with and I have no idea of the normality of the distribution of the values of P.
I have another way of reasoning.
I can examine how to different sell percentages (A%, B%, C%) varies over time. I could calculate mean and StDev of these 3 quantities.
But in this case I don't know if (and how) I can propagate the error to the calculation of P. Is the error on P a weighted sum of the errors on A,B and C?
It seems to me that the estimate on P could be more precise if I examine more data (the various A,B,C) but since I will have an error on each of these, I don't know how to combine them to get a reasonable error on P.
I had to simplify somehow my situation but I hope I gave enough info. I'll be glad to add more if needed.
Thank you for Your time.
Wentu
Tweet |