The occupancy problem refers to the experiment of randomly throwing balls into cells. Out of this experiment, there are many problems that can be asked. In this post we focus on question: what is the probability that exactly of the cells are empty after randomly throwing balls into cells, where ?
For a better perspective, there are many other ways to describe the experiment of throwing balls into cells. For example, throwing balls into 6 cells can be interpreted as rolling dice (or rolling a die times). Throwing balls into 365 cells can be interpreted as randomly selecting people and classifying them according to their dates of birth (assuming 365 days in a year). Another context is coupon collecting – the different types of coupons represent the cells and the coupons being collected represent the balls.
_____________________________________________________________________________________
Practice Problems
Let be the number of cells that are empty (i.e. not occupied) when throwing balls into cells. As noted above, the problem discussed in this post is to find the probability function where . There are two elementary ways to do this problem. One is the approach of using double multinomial coefficient (see this post) and the other is to use a formula developed in this post. In addition to , let be the number of cells that are occupied, i.e. .
Practice Problems
Compute where for the following pairs of and .
- and
- and
- and
- and
- and
- and
We work the problem for and . We show both the double multinomial coefficient approach and the formula approach. Recall the here is the number of balls and is the number of cells.
_____________________________________________________________________________________
Example – Double Multinomial Coefficient
Note that the double multinomial coefficient approach calculate the probabilities where is the number of occupied cells when throwing 7 balls into 5 cells. In throwing 7 balls into 5 cells, there is a total of 78125 many ordered samples. To calculate , first write down the representative occupancy sets for the event . For each occupancy set, calculate the number of ordered samples (out of 78125) that belong to that occupancy set. Then we add up all the counts for all the occupancy sets for . This is best illustrated with an example. To see the development of this idea, see this post.
Fist, consider the event of (only one cell is occupied, i.e. all the balls going into one cell). A representative occupancy set is (0, 0, 0, 0, 7), all the balls going into the 5th cell. The first multinomial coefficient is on the 5 cells and the second multinomial coefficient is on the 7 balls.
(0, 0, 0, 0, 7)
Total = 5
So there are 5 ordered samples out of 16807 that belong to the event . Note that the first multinomial coefficient is the number of to order the 5 cells where 4 of the cells are empty and one of the cells has 7 balls. The second multinomial coefficient is the number of ways to order the 7 balls where all 7 balls go into the 5th cell.
Now consider the event (all 7 balls go into 2 cells). There are three representative occupancy sets. The following shows the calculation for each set.
Here’s an explanation for the occupancy set (0, 0, 0, 3, 4). The occupancy set refers to the scenario that 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell. But we want to count all the possibilities such that 3 of the 7 balls go into one cell and 4 of the 7 balls go into another cell. Thus the first multinomial coefficient count the number of ways to order 5 cells where three of the cells are empty and one cell has 3 balls and the remaining cell has 4 balls (20 ways) The second multinomial coefficient is the number of ways to order the 7 balls where 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell (35 ways). So the total number of possibilities for the occupancy set (0, 0, 0, 3, 4) is 20 times 35 = 700. The sum total for three occupancy sets is 1260.
We now show the remaining calculation without further elaboration.
Now consider the event (all 7 balls go into 3 cells). There are four representative occupancy sets. The following shows the calculation for each set.
Now consider the event (all 7 balls go into 4 cells, i.e. one empty cell). There are three representative occupancy sets. The following shows the calculation for each set.
Now consider the event (all 7 balls go into 5 cells, i.e. no empty cell). There are two representative occupancy sets. The following shows the calculation for each set.
The following is the distribution for the random variable .
3.951424
Remarks
In throwing 7 balls at random into 5 cells, it is not like that the balls are in only one or two cells (about 1.6% chance). The mean number of occupied cells is about 3.95. More than 50% of the times, 4 cells are occupied.
The above example has small numbers of balls and cells and is an excellent example for practicing the calculation. Working such problems can help build the intuition for the occupancy problem. However, when the numbers are larger, the calculation using double multinomial coefficient can be lengthy and tedious. Next we show how to use a formula for occupancy problem using the same example.
_____________________________________________________________________________________
Example – Formula Approach
The formula we use is developed in this post. Recall that is the number of empty cells when throwing balls into cells. The formula calculates the probabilities where
. The first step is to compute the probabilities for . Each of these is the probability that all m cells are occupied (when throwing 7 balls into cells). These 5 probabilities will be used to calculate . This is a less direct implementation of the formula but gives a more intuitive explanation.
Each of the above 5 probabilities (except the last one) is based on the probability , which is the probability that there is at least one cell that is empty when throwing 7 balls into cells. The inclusion-exclusion principle is used to derive . The last of the five does not need calculation. When throwing 7 balls into one cell, there will be no empty cells.
Now the rest of the calculation:
Note that these answers agree with the ones from the double multinomial coefficient approach after making the adjustment .
_____________________________________________________________________________________
_____________________________________________________________________________________
Answers
Problem 1
5 balls into 5 cells
Problem 2
6 balls into 5 cells
Problem 3
7 balls into 5 cells. See above.
Problem 4
6 balls into 6 cells
Problem 5
7 balls into 6 cells
Problem 6
8 balls into 6 cells
_____________________________________________________________________________________