I am wondering why there is no comment, I appreciate any idea. thanks
I have some data regarding the arrival of customers at a shop. These arrivals are bulk arrivals which have been collected at fixed time intervals (say, t1=0+x,t2=t1+x,t3=t2+x,..).
How can I convert these bulk arrivals to inter-arrivals?
what about using uniform distribution for this purpose and distribute the number of customer between t and t+x and then censor anything less than x as I shown below?
More explanation as requested:
Data look likes this
Please assume x is 3600 seconds, as data have been collected hourly.
start, t0
first hour, t1=t0+x, 10 customers
second hour, t2=t1+x=t0+2x, 20 customers
third hour, t3=t2+x=t0+3x, 50 customers
...
...
..
23 hour, t23=t22+x=t0+23x, 4 customers
So, the above shows the arrival times.
and I would like to know the which distribution inter-arrival rate of customers has. In simple words, I would like to know after how many seconds I shall expect a customer in the shop and what distribution can model this inter-arrival.
What do you think of this approach:
Code:for each day { sl=a null list do this for all t_i(s) { ss=runiform(number of observed customers at t_i, min=t_i-1,max=t_i) ss=sort(ss) sl=c(sl,ss) } to obtain the inter-arrivals : lis= a null list for i in 2:length(sl) { lis=c(lis, abs(sl[i]-sl[i-1])) } left censor anything less than x, and x is (t_i)-(t_i-1), which is 3600 sec and then fit, }
Please let me know if you need more information.
Thanks
Last edited by mohsenhs82; 02-20-2017 at 02:40 PM. Reason: Updated based on comments
I am wondering why there is no comment, I appreciate any idea. thanks
Probably you need to make the question clearer. Just what are "bulk arrivals" and "inter-arrivals"? We have a sort of idea what you may be talking about but you need to be more precise. Give us an example of what the data looks like.
Thank you Katxt,
I have added more explanation to the question.
Cheers
Last edited by mohsenhs82; 02-18-2017 at 12:54 AM.
hi,
if the interarrival times are independent of each other, which is a reasonable assumption, then they are exponentially distributed. If you are waiting until k parts arrive then the total waiting time is the sum of k exponentially distributed variables - which will be a gamma distributed variable. Adding the max function to this complicates things a lot more imho.
Probably, the best would be to just simulate the process and estimate the waiting time.
regards
Thank you rogojel for your time and consideration, I was wondering if you could clarify why you have mentioned exponential. I know that exponential is a good candidate, but without goodness of fit ( in R I do this by comparing AIC/BIC obtained from fitdistrplus) I cannot say for sure they follow exponential or any other distribution.
You also mentioned "simulate the process and estimate the waiting time", while you mentioned this, I think you assumed they have exponential distribution, If I am not sure that they have exp distribution, how can I convert the arrival to interarrival to simulate that?
Please correct me if I misunderstood you.
Are those figures you quoted realistic? They are hardly likely to be exponential if you have both 50 and 4 from the same rate. If they are just made up, can you give us some real ones to look at.
On the other hand, perhaps your 24 hours are one complete 24 hour period and the rate is changing with the time of the day. if this is the case, you could try and find the daily pattern and use the exponential distribution with a changing rate according to time. Or, perhaps, put a smoother through to estimate the rate at any point.
hi,
as far as I can remember, the demonstration goes like this : the probability of having to wait for t+T is the same as the probability of having to wait for T and then to wait t more. As the waiting times are independent P(t+T)=P(t)*P(T) It can be proven that only the exponential distribution satisfies this equation in P.
Also if the number of arrivals is poisson distributed then the waiting times are exponential -- if I remember correctly.
Regards
Let's say the average number per hour at some point is 10 arrivals per hour. Then the average wait time = 1/10 hrs or 6 minutes. You cannot calculate the probability that the wait time is 7 minutes because the probability of exactly 7 minutes is 0. What you can calculate are probabilities like - the wait time is less than 7 minutes, or more than 7 minutes, or between 1 and 7 minutes, using the exponential distribution. In this case, the formula is P(t<x)=1-exp(-x/6) so the probability of the wait being less than 7 min is 1-exp(-7/6) = 0.689
If this is all OK, then you need to estimate the arrival rate at any given time. You could assume a constant rate within an hour using the rate for that hour. That looks simple enough.
Or you could make a polygon using the mid points of the hour and interpolate the rate.
Or you could use a smoother on the data you have at the midpoint of the hour, say a lowess.
If it a cyclic thing you might even be lucky enough to fit something sinusoidal to the data.
Thank you guys for your time and support.
I am mainly looking for an academic approach which includes goodness of fit and would like to the interarrival distributions per day. Please remeber that we may have seasonal effects and thjs means interarrivals on weekends and weekdays might be different.lt would be great if you could advise about the approach. And those numbeds are just a sample not recorded data. For simplicity I assume that for each hour of a single day there is a similar probability, but does not mean day a and day b have similar probability and distribution.
If you have a model for the hourly totals, then you could use a goodness of fit test to see how well the bulk data fits the model. However, because you have no actual inter-arrival times there is no way of testing if they fit an exponential or any other distribution.
It sounds as if you think that although there are day to day and season to season differences the rate stays constant hour to hour over a single day. You can start to see if this is true by finding the mean and sd of the 24 hourly totals. The sd should be close to the sqrt(mean).
mohsenhs82 (02-19-2017)
Katxt, I think you mentioned an interesting idea. Please let me my current approach for converting the arrival to interarrival, although I have explained it before briefly.
At the moment, I distribute the number of customers by uniform distribution as this can be considered as a neutral dist.
What do you think of this approach?Code:for each day { sl=a null list do this for all t_i(s) { ss=runiform(number of observed customers at t_i, min=t_i-1,max=t_i) ss=sort(ss) sl=c(sl,ss) } to obtain the inter-arrivals : lis= a null list for i in 2:length(sl) { lis=c(lis, abs(sl[i]-sl[i-1])) } left censor anything less than x, and x is (t_i)-(t_i-1), which is 3600 sec and then fit, }
Cheers
I'm sorry. I don't fully follow the code. The impression I get that you are picking the appropriate number of times at random in some interval you think the rate is constant over, and calculating the inter-arrival times from those random times. This is fine as a reasonable approximation. The distribution won't be exactly exponential because it is constrained by the total number in your time period, but it will give you an answer which is probably as good as you need. As I see it, your main problem is how long to make the period over which the rate is constant.
I am not sure what mistake I made that you conclude this. Number of customers as arrival times for each hour is not randomly selected. These are the number of actual customers per hours of each day. I think I have mentioned earlier, for simplicity I assume the rate within each hour is constant. This does not mean the rate at t_i and t_i+n is the same.
Cheers
Tweet |