Calculating the occupancy problem

The occupancy problem refers to the experiment of randomly throwing $k$ balls into $n$ cells. Out of this experiment, there are many problems that can be asked. In this post we focus on question: what is the probability that exactly $j$ of the cells are empty after randomly throwing $k$ balls into $n$ cells, where $0 \le j \le n-1$?

For a better perspective, there are many other ways to describe the experiment of throwing $k$ balls into $n$ cells. For example, throwing $k$ balls into 6 cells can be interpreted as rolling $k$ dice (or rolling a die $k$ times). Throwing $k$ balls into 365 cells can be interpreted as randomly selecting $k$ people and classifying them according to their dates of birth (assuming 365 days in a year). Another context is coupon collecting – the different types of coupons represent the $n$ cells and the coupons being collected represent the $k$ balls.

_____________________________________________________________________________________

Practice Problems

Let $X_{k,n}$ be the number of cells that are empty (i.e. not occupied) when throwing $k$ balls into $n$ cells. As noted above, the problem discussed in this post is to find the probability function $P(X_{k,n}=j)$ where $j=0,1,2,\cdots,n-1$. There are two elementary ways to do this problem. One is the approach of using double multinomial coefficient (see this post) and the other is to use a formula developed in this post. In addition to $X_{k,n}$, let $Y_{k,n}$ be the number of cells that are occupied, i.e. $Y_{k,n}=n-X_{k,n}$.

Practice Problems
Compute $P(X_{k,n}=j)$ where $j=0,1,2,\cdots,n-1$ for the following pairs of $k$ and $n$.

1. $k=5$ and $n=5$
2. $k=6$ and $n=5$
3. $k=7$ and $n=5$
4. $k=6$ and $n=6$
5. $k=7$ and $n=6$
6. $k=8$ and $n=6$

We work the problem for $k=7$ and $n=5$. We show both the double multinomial coefficient approach and the formula approach. Recall the $k$ here is the number of balls and $n$ is the number of cells.

_____________________________________________________________________________________

Example – Double Multinomial Coefficient

Note that the double multinomial coefficient approach calculate the probabilities $P(Y_{7,5}=j)$ where $Y_{7,5}$ is the number of occupied cells when throwing 7 balls into 5 cells. In throwing 7 balls into 5 cells, there is a total of $5^7=$ 78125 many ordered samples. To calculate $P(Y_{7,5}=j)$, first write down the representative occupancy sets for the event $Y_{7,5}=j$. For each occupancy set, calculate the number of ordered samples (out of 78125) that belong to that occupancy set. Then we add up all the counts for all the occupancy sets for $Y_{7,5}=j$. This is best illustrated with an example. To see the development of this idea, see this post.

Fist, consider the event of $Y_{7,5}=1$ (only one cell is occupied, i.e. all the balls going into one cell). A representative occupancy set is (0, 0, 0, 0, 7), all the balls going into the 5th cell. The first multinomial coefficient is on the 5 cells and the second multinomial coefficient is on the 7 balls.

(0, 0, 0, 0, 7)

$\displaystyle \frac{5!}{4! 1!} \times \frac{7!}{7!}=5 \times 1 =5$

Total = 5

So there are 5 ordered samples out of 16807 that belong to the event $Y_{7,5}=1$. Note that the first multinomial coefficient is the number of to order the 5 cells where 4 of the cells are empty and one of the cells has 7 balls. The second multinomial coefficient is the number of ways to order the 7 balls where all 7 balls go into the 5th cell.

Now consider the event $Y_{7,5}=2$ (all 7 balls go into 2 cells). There are three representative occupancy sets. The following shows the calculation for each set.

(0, 0, 0, 1, 6)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{1! 6!}=20 \times 7 =140$
(0, 0, 0, 2, 5)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{2! 5!}=20 \times 21 =420$
(0, 0, 0, 3, 4)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{3! 4!}=20 \times 35 =700$

Total = 140 + 420 + 700 = 1260

Here’s an explanation for the occupancy set (0, 0, 0, 3, 4). The occupancy set refers to the scenario that 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell. But we want to count all the possibilities such that 3 of the 7 balls go into one cell and 4 of the 7 balls go into another cell. Thus the first multinomial coefficient count the number of ways to order 5 cells where three of the cells are empty and one cell has 3 balls and the remaining cell has 4 balls (20 ways) The second multinomial coefficient is the number of ways to order the 7 balls where 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell (35 ways). So the total number of possibilities for the occupancy set (0, 0, 0, 3, 4) is 20 times 35 = 700. The sum total for three occupancy sets is 1260.

We now show the remaining calculation without further elaboration.

Now consider the event $Y_{7,5}=3$ (all 7 balls go into 3 cells). There are four representative occupancy sets. The following shows the calculation for each set.

(0, 0, 1, 1, 5)

$\displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{1! 1! 5!}=30 \times 42 =1260$
(0, 0, 1, 2, 4)

$\displaystyle \frac{5!}{2! 1! 1! 1!} \times \frac{7!}{1! 2! 4!}=60 \times 105 =6300$
(0, 0, 1, 3, 3)

$\displaystyle \frac{5!}{2! 1! 2!} \times \frac{7!}{1! 3! 3!}=30 \times 140 =4200$
(0, 0, 2, 2, 3)

$\displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{2! 2! 3!}=30 \times 210 =6300$

Total = 1260 + 6300 + 4200 + 6300 = 18060

Now consider the event $Y_{7,5}=4$ (all 7 balls go into 4 cells, i.e. one empty cell). There are three representative occupancy sets. The following shows the calculation for each set.

(0, 1, 1, 1, 4)

$\displaystyle \frac{5!}{1! 3! 1!} \times \frac{7!}{1! 1! 1! 4!}=20 \times 210 =4200$
(0, 1, 1, 2, 3)

$\displaystyle \frac{5!}{1! 2! 1! 1!} \times \frac{7!}{1! 1! 2! 3!}=60 \times 420 =25200$
(0, 1, 2, 2, 2)

$\displaystyle \frac{5!}{1! 1! 3!} \times \frac{7!}{1! 2! 2! 2!}=20 \times 630 =12600$

Total = 4200 + 25200 + 12600 = 42000

Now consider the event $Y_{7,5}=4$ (all 7 balls go into 5 cells, i.e. no empty cell). There are two representative occupancy sets. The following shows the calculation for each set.

(1, 1, 1, 1, 3)

$\displaystyle \frac{5!}{4! 1!} \times \frac{7!}{1! 1! 1! 1! 3!}=5 \times 840 =4200$
(1, 1, 1, 2, 2)

$\displaystyle \frac{5!}{3! 2!} \times \frac{7!}{1! 1! 1! 2! 2!}=10 \times 1260 =12600$

Total = 4200 + 12600 = 16800

The following is the distribution for the random variable $Y_{7,5}$.

$\displaystyle P(Y_{7,5}=1)=\frac{5}{78125}=0.000064$

$\displaystyle P(Y_{7,5}=2)=\frac{1260}{78125}=0.016128$

$\displaystyle P(Y_{7,5}=3)=\frac{18060}{78125}=0.231168$

$\displaystyle P(Y_{7,5}=4)=\frac{42000}{78125}=0.5376$

$\displaystyle P(Y_{7,5}=5)=\frac{16800}{78125}=0.21504$

3.951424

Remarks
In throwing 7 balls at random into 5 cells, it is not like that the balls are in only one or two cells (about 1.6% chance). The mean number of occupied cells is about 3.95. More than 50% of the times, 4 cells are occupied.

The above example has small numbers of balls and cells and is an excellent example for practicing the calculation. Working such problems can help build the intuition for the occupancy problem. However, when the numbers are larger, the calculation using double multinomial coefficient can be lengthy and tedious. Next we show how to use a formula for occupancy problem using the same example.

_____________________________________________________________________________________

Example – Formula Approach

The formula we use is developed in this post. Recall that $X_{k,n}$ is the number of empty cells when throwing $k$ balls into $n$ cells. The formula calculates the probabilities $P(X_{7,5}=j)$ where
$0 \le j \le 5$. The first step is to compute the probabilities $P(X_{7,m}=0)$ for $m=5,4,3,2,1$. Each of these is the probability that all m cells are occupied (when throwing 7 balls into $m$ cells). These 5 probabilities will be used to calculate $P(X_{7,5}=j)$. This is a less direct implementation of the formula but gives a more intuitive explanation.

\displaystyle \begin{aligned} P(X_{7,5}=0)&=1-P(X_{7,5} \ge 1) \\&=1-\sum \limits_{j=1}^5 (-1)^{j+1} \binom{5}{j} \biggl[ 1-\frac{j}{5} \biggr]^7 \\&=1-\biggl[5 \biggl(\frac{4}{5}\biggr)^7-10 \biggl(\frac{3}{5}\biggr)^7+10 \biggl(\frac{2}{5}\biggr)^7 -5 \biggl(\frac{1}{5}\biggr)^7 + 0 \biggr] \\&=1-\frac{81920-21870+1280-5}{78125} \\&=\frac{16800}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,4}=0)&=1-P(X_{7,4} \ge 1) \\&=1-\sum \limits_{j=1}^4 (-1)^{j+1} \binom{4}{j} \biggl[ 1-\frac{j}{4} \biggr]^7 \\&=1-\biggl[4 \biggl(\frac{3}{4}\biggr)^7-6 \biggl(\frac{2}{4}\biggr)^7+4 \biggl(\frac{1}{4}\biggr)^7 - 0 \biggr] \\&=1-\frac{8748-768+4}{16384} \\&=\frac{8400}{16384} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,3}=0)&=1-P(X_{7,3} \ge 1) \\&=1-\sum \limits_{j=1}^3 (-1)^{j+1} \binom{3}{j} \biggl[ 1-\frac{j}{3} \biggr]^7 \\&=1-\biggl[3 \biggl(\frac{2}{3}\biggr)^7-3 \biggl(\frac{1}{3}\biggr)^7+0 \biggr] \\&=1-\frac{384-3}{2187} \\&=\frac{1806}{2187} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,2}=0)&=1-P(X_{7,2} \ge 1) \\&=1-\sum \limits_{j=1}^2 (-1)^{j+1} \binom{2}{j} \biggl[ 1-\frac{j}{2} \biggr]^7 \\&=1-\biggl[2 \biggl(\frac{1}{2}\biggr)^7-0 \biggr] \\&=1-\frac{2}{128} \\&=\frac{126}{128} \end{aligned}

$P(X_{7,1}=0)=1$

Each of the above 5 probabilities (except the last one) is based on the probability $P(X_{7,m} \ge 1)$, which is the probability that there is at least one cell that is empty when throwing 7 balls into $m$ cells. The inclusion-exclusion principle is used to derive $P(X_{7,m} \ge 1)$. The last of the five does not need calculation. When throwing 7 balls into one cell, there will be no empty cells.

Now the rest of the calculation:

$\displaystyle P(X_{7,5}=0)=\frac{16800}{78125}$

\displaystyle \begin{aligned} P(X_{7,5}=1)&=P(\text{1 empty cell}) \times P(\text{none of the other 4 cells is empty}) \\&=\binom{5}{1} \biggl(1-\frac{1}{5} \biggr)^7 \times P(X_{7,4}=0) \\&=5 \ \frac{16384}{78125} \times \frac{8400}{16384} \\&=\frac{42000}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=2)&=P(\text{2 empty cells}) \times P(\text{none of the other 3 cells is empty}) \\&=\binom{5}{2} \biggl(1-\frac{2}{5} \biggr)^7 \times P(X_{7,3}=0) \\&=10 \ \frac{2187}{78125} \times \frac{1806}{2187} \\&=\frac{18060}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=3)&=P(\text{3 empty cells}) \times P(\text{none of the other 2 cells is empty}) \\&=\binom{5}{3} \biggl(1-\frac{3}{5} \biggr)^7 \times P(X_{7,2}=0) \\&=10 \ \frac{128}{78125} \times \frac{126}{128} \\&=\frac{1260}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=4)&=P(\text{4 empty cells}) \times P(\text{the other 1 cell is not empty}) \\&=\binom{5}{4} \biggl(1-\frac{4}{5} \biggr)^7 \times P(X_{7,1}=0) \\&=5 \ \frac{1}{78125} \times 1 \\&=\frac{5}{78125} \end{aligned}

Note that these answers agree with the ones from the double multinomial coefficient approach after making the adjustment $Y_{7,5}=5-X_{7,5}$.

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Problem 1
5 balls into 5 cells

$\displaystyle P(X_{5,5}=0)=P(Y_{5,5}=5)=\frac{120}{3125}$

$\displaystyle P(X_{5,5}=1)=P(Y_{5,5}=4)=\frac{1200}{3125}$

$\displaystyle P(X_{5,5}=2)=P(Y_{5,5}=3)=\frac{1500}{3125}$

$\displaystyle P(X_{5,5}=3)=P(Y_{5,5}=2)=\frac{300}{3125}$

$\displaystyle P(X_{5,5}=4)=P(Y_{5,5}=1)=\frac{5}{3125}$

Problem 2
6 balls into 5 cells

$\displaystyle P(X_{6,5}=0)=P(Y_{6,5}=5)=\frac{1800}{15625}$

$\displaystyle P(X_{6,5}=1)=P(Y_{6,5}=4)=\frac{7800}{15625}$

$\displaystyle P(X_{6,5}=2)=P(Y_{6,5}=3)=\frac{5400}{15625}$

$\displaystyle P(X_{6,5}=3)=P(Y_{6,5}=2)=\frac{620}{15625}$

$\displaystyle P(X_{6,5}=4)=P(Y_{6,5}=1)=\frac{5}{15625}$

Problem 3
7 balls into 5 cells. See above.

Problem 4
6 balls into 6 cells

$\displaystyle P(X_{6,6}=0)=P(Y_{6,6}=6)=\frac{720}{46656}$

$\displaystyle P(X_{6,6}=1)=P(Y_{6,6}=5)=\frac{10800}{46656}$

$\displaystyle P(X_{6,6}=2)=P(Y_{6,6}=4)=\frac{23400}{46656}$

$\displaystyle P(X_{6,6}=3)=P(Y_{6,6}=3)=\frac{10800}{46656}$

$\displaystyle P(X_{6,6}=4)=P(Y_{6,6}=2)=\frac{930}{46656}$

$\displaystyle P(X_{6,6}=5)=P(Y_{6,6}=1)=\frac{6}{46656}$

Problem 5
7 balls into 6 cells

$\displaystyle P(X_{7,6}=0)=P(Y_{7,6}=6)=\frac{15120}{279936}$

$\displaystyle P(X_{7,6}=1)=P(Y_{7,6}=5)=\frac{100800}{279936}$

$\displaystyle P(X_{7,6}=2)=P(Y_{7,6}=4)=\frac{126000}{279936}$

$\displaystyle P(X_{7,6}=3)=P(Y_{7,6}=3)=\frac{36120}{279936}$

$\displaystyle P(X_{7,6}=4)=P(Y_{7,6}=2)=\frac{1890}{279936}$

$\displaystyle P(X_{7,6}=5)=P(Y_{7,6}=1)=\frac{6}{279936}$

Problem 6
8 balls into 6 cells

$\displaystyle P(X_{8,6}=0)=P(Y_{8,6}=6)=\frac{191520}{1679616}$

$\displaystyle P(X_{8,6}=1)=P(Y_{8,6}=5)=\frac{756000}{1679616}$

$\displaystyle P(X_{8,6}=2)=P(Y_{8,6}=4)=\frac{612360}{1679616}$

$\displaystyle P(X_{8,6}=3)=P(Y_{8,6}=3)=\frac{115920}{1679616}$

$\displaystyle P(X_{8,6}=4)=P(Y_{8,6}=2)=\frac{3810}{1679616}$

$\displaystyle P(X_{8,6}=5)=P(Y_{8,6}=1)=\frac{6}{1679616}$

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

A Problem of Rolling Six Dice

Problem
Suppose that we roll 6 fair dice (or equivalently, roll a fair die 6 times). Let $X$ be the number of distinct faces that appear. Find the probability function $P(X=k)$ where $k=1,2,3,4,5,6$.

Equivalent Problem
Suppose that we randomly assign 6 candies to 6 children (imagine that each candy is to be thrown at random to the children and is received by one of the children). What is the probability that exactly $k$ children have been given candies, where $k=1,2,3,4,5,6$?

___________________________________________________________________________

Discussion
Note that both descriptions are equivalent and are refered to as occupancy problem in [1]. The essential fact here is that $n$ objects are randomly assigned to $m$ cells. The problem then asks: what is the probability that $k$ of the cells are occupied? See the following posts for more detailed discussions of the occupancy problem.

Each of these posts presents different different ways of solving the occupancy problem. The first post uses a counting approach based on the multinomial coefficients. The second post developed a formula for finding the probability that exactly $k$ of the cells are empty.

The first approach of using mulltinomial coefficients is preferred when the number of objects $n$ and the number of cells $m$ are relatively small (such as the problem indicated here). Otherwise, use the formula approach.

___________________________________________________________________________

Using the approach of multinomial coefficients as shown in this post (the first post indicated above), we have the following answers:

$\displaystyle P(X=1)=\frac{6}{6^6}=\frac{6}{46656}$

$\displaystyle P(X=2)=\frac{930}{6^6}=\frac{930}{46656}$

$\displaystyle P(X=3)=\frac{10800}{6^6}=\frac{10800}{46656}$

$\displaystyle P(X=4)=\frac{23400}{6^6}=\frac{23400}{46656}$

$\displaystyle P(X=5)=\frac{10800}{6^6}=\frac{10800}{46656}$

$\displaystyle P(X=6)=\frac{720}{6^6}=\frac{720}{46656}$

For more practice problems on calculating the occupancy problem, see this post.
___________________________________________________________________________

Reference

1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968