Obviously, you need at least N balls, but since you are ignoring whether the bucket already contains a ball then it seems reasonable that the throwing will be very inefficient, and some buckets will get many balls before the last buckets is occupied. Maybe the number of balls will average to N2 or maybe even exponential in N?
Also suppose that you were to do this many times. How much would you expect the number of needed balls to vary between each attempt? If N were 1000 then maybe one attempt would take 10000 balls and the next attempt twice as many balls.
In fact the results might be considered surprising:
The number of balls needed is about N*log(N), but the variation between attempts is only about 2N.
Here are some examples:
| N | Average | Width | Width/Average | |
| 100 | 460 | 200 | 43% | |
| 1,000 | 6,900 | 2,000 | 30% | |
| 10,000 | 92,000 | 20,000 | 21% | |
| 100,000 | 1,151,292 | 200,000 | 17% | |
| 1,000,000 | 13,815,510 | 2,000,000 | 14% |
For very large N the average number of balls needed, N*log(N), will be much larger than than the variation of 2N between attempts. This high level of predictability is a good sign that the problem exhibits an exact threshold, or phase transition.
To make things more precise we first rephrase the problem as a decision problem:
INSTANCE: Given N buckets, L balls, and a sequence of random choices for destination of each ball.QUESTION: Does every bucket end up containing at least one ball?
and we can define an associated "control parameter", a, by
a = L/(N*log(N))For sufficiently large N, we can now say, for example, that
a < 0.999 gives "almost certainly one bucket is left empty"That is, we have a sharp threshold at a=1.a > 1.001 gives "almost certainly no buckets are left empty"
Definition: P(N,L) is the probability of a "yes" answer, i.e. that everyone on the buckets will get at least one ball.This problem is sufficiently simple that we can explicitly calculate P(N,L) using simple probability arguments.
Firstly,
prob(bucket 1 contains ball 1) = 1/Nand so
prob(bucket 1 does NOT contain ball 1) = (1-1/N)Now
prob(bucket 1 is empty) =
prob(bucket 1 does NOT contain ball 1) *
prob(bucket 1 does NOT contain ball 2) *
...
prob(bucket 1 does NOT contain ball L)
and since all the balls have the same associated probability we have
prob(bucket 1 is empty) = (1-1/N)Lso that
prob(bucket 1 is NOT empty) = 1 - (1-1/N)LNow, all the buckets independently have this same probability so we have the main result:
P(N,L) = prob(all buckets are NOT empty)
= ( 1 - (1-1/N)L)N
P(N,L) =~ ( 1 - e-L/N)Nsuppose we put L= N*log(N) then we have P =~ ( 1 - 1/N)N and in the limit of N goes to infinity this is just 1/e, a constant. This suggests that maybe it is the ratio L/(N*log(N)) that matters. so lets just try making some plots
we see that as N increases then the sharpness of the transition increases
To investigate the width we will define a new parameter x by
x = (L-N*log(N))/Nso that now
L = N*log(N) + x Nthen in the limit of large N we find that
P(N,L) = e-(e-x)The right hand side is a function which looks like
that is, it does a fairly rapid transition from zero to one.
Putting all this together we have that
for L < N*log(N) - 2N then P(N,L) ~ 0 for L > N*log(N) + 2N then P(N,L) ~ 1justifying the results we summarized at the top of the page.
Slightly more precisely, suppose at some point that k buckets are empty. Then hittng the next empty bucket is likely to take N/k throws.
With N=100 we get a progress curve looking like
Note that this estimate of the progress rate also give a quick way to estimate the position of the phase transition: Just add up the expected number of throws needed for each of the new buckets. That is, we can expect that the number of throws needed to get all the buckets is roughly
sum from k=N to k=1 of N/kwhich is approximately N*log(N). (To see this, approximate the sum over 1/k with an integral over 1/k).
Although trivial in many senses, this problem might well serve as a first test for some ideas about phase transitions.
Back to ANTS@CIRL project home page: http://www.cirl.uoregon.edu/ANTS/
Page last modified: 2000-09-18 by Andrew Parkes. Please feel free to email corrections/suggestions.