(转)Understanding Waiting Times Between Events with the Poisson and Exponential Distributions
A webhook POSTs to our database each time a particular event occurs on our website. We receive about two of these requests per minute. I was mindlessly monitoring the log files one day and noticed it had been roughly 90 seconds since our database had been hit by this request. Before worrying, though, I wondered how rare that observation is. What is the likelihood of waiting longer than 1.5 minutes for the next request?
This is a probability problem that can be solved with an understanding of Poisson processes and the exponential distribution. A Poisson process is any process where independent events occur at constant known rate, e.g. babies are born at a hospital at a rate of three per hour, or calls come into a call center at a rate of 10 per minute. The exponential distribution is the probability distribution that models the waiting times between these events, e.g. the times between calls at the call center are exponentially distributed. To model Poisson processes and exponental distributions, we need to know two things: a time-unit t and a rate λ.
Poisson Distribution
Let's start with the Poisson distribution: If we let N(t) denote the number of events that occur between now and time t, then the probability that nn events occur within the next t time-units, or P(N(t)=n), is
As mentioned earlier, we receive an average of two requests from this webhook per minute. Thus, the time-unit tt is one minute and the rate λλ is two. Knowing these, we can answer questions such as:
- What is the probability that we receive no requests in the next two minutes?
$P(N(2) = 0) = \frac{(2 \cdot 2)^0 e^{-2 \cdot 2}}{0!} = e^{-4} \approx 0.0183$
- What is the probability that we receive at least two requests in the next three minutes?
$\begin{aligned}
P(N(3) \geq 2) & = 1 - P(N(3) = 1) - P(N(3) = 0) \\\\
& = 1 - \frac{(2 \cdot 3)^1 e^{-2 \cdot 3}}{1!} - \frac{(2 \cdot 3)^0 e^{-2 \cdot 3}}{0!} \\\\
& = 1 - 6e^{-6} - e^{-6} \\\\
& = 1 - 7e^{-6} \\\\
& \approx 0.9826
\end{aligned}$
For those who prefer reading code, we can write a class Poisson
that's initialized with its rate λ:
from math import pow, exp, factorial class Poisson: def __init__(self, rate): self.rate = rate def prob_exactly(self, n, t): rate = self.rate * t return pow(rate, n) * exp(-rate) / factorial(n) def prob_at_least(self, n, t): complements = range(n) total = 0.0 for c in complements: p = self.prob_exactly(c, t) total += p return 1 - total def prob_at_most(self, n, t): return 1 - self.prob_at_least(n + 1, t)
To answer the same questions, we can create an instance of Poisson
initialized with rate λ=2.
pois = Poisson(2)
If we want the probability that exactly n = 0
events occur within t = 2
minutes:
pois.prob_exactly(0, 2)
0.01831563888873418
And if we want the probability that at least n = 2
events occur within t = 3
minutes:
pois.prob_at_least(2, 3)
0.9826487347633355
Exponential Distribution
Let's move onto the exponential distribution. As mentioned earlier, the waiting times between events in a Poisson process are exponentially distributed. The exponential distribution can be derived from the Poisson distribution: Let XX be the waiting time between now and the next event. The probability that XX is greater than tt is identical to the probability that 0 events occur between now and time tt, which we already know:
We also know that the probability of XX being less than or equal to tt is the complement of XX being greater than tt:
Thus, the distribution function of the waiting times between events in a Poisson process is 1−e−λt1−e−λt. With this, and recalling that our time-unit tt is one minute and our rate λλ is two requests per minute, we can answer questions such as:
- What is the probability that the next request occurs within 15 seconds?
- What is the probability that the next request is between 15 and 30 seconds from now?
Again, for those who prefer reading code, let's write a class Exponential
that's initialized with its rate λλ.
class Exponential: def __init__(self, rate): self.rate = rate def prob_less_than_or_equal(self, t): rate = self.rate * t return 1 - exp(-rate) def prob_greater_than(self, t): return 1 - self.prob_less_than_or_equal(t) def prob_between(self, t1, t2): p1 = self.prob_less_than_or_equal(t1) p2 = self.prob_less_than_or_equal(t2) return p2 - p1
To answer the same questions, we can create an instance of Exponential
initialized with the rate λ=2λ=2.
expo = Exponential(2)
If we want the probability that the next request occurs within t = 0.25
minutes:
expo.prob_less_than_or_equal(0.25)
0.3934693402873666
And if we want the probability that the next request occurs between t1 = 0.25
and t2 = 0.5
minutes:
expo.prob_between(0.25, 0.5)
0.2386512185411911
Conclusion
Now, referring back to the original question: What is the probability of waiting longer than 1.5 minutes for the next request?
The probability of waiting longer than 1.5 minutes for the next request is 4.98%.
expo.prob_greater_than(1.5)
0.04978706836786395
For this particular example, we could have answered the question with the Poisson distribution by finding P(N(1.5)=0))P(N(1.5)=0)), or the probability that n = 0
events occur within t = 1.5
minutes.
pois.prob_exactly(0, 1.5)
0.049787068367863944
refer to
http://nbviewer.jupyter.org/github/nicolewhite/notebooks/blob/master/Poisson.ipynb