Monthly Archives: March 2015

Calculating the occupancy problem

The occupancy problem refers to the experiment of randomly throwing k balls into n cells. Out of this experiment, there are many problems that can be asked. In this post we focus on question: what is the probability that exactly j of the cells are empty after randomly throwing k balls into n cells, where 0 \le j \le n-1?

For a better perspective, there are many other ways to describe the experiment of throwing k balls into n cells. For example, throwing k balls into 6 cells can be interpreted as rolling k dice (or rolling a die k times). Throwing k balls into 365 cells can be interpreted as randomly selecting k people and classifying them according to their dates of birth (assuming 365 days in a year). Another context is coupon collecting – the different types of coupons represent the n cells and the coupons being collected represent the k balls.

_____________________________________________________________________________________

Practice Problems

Let X_{k,n} be the number of cells that are empty (i.e. not occupied) when throwing k balls into n cells. As noted above, the problem discussed in this post is to find the probability function P(X_{k,n}=j) where j=0,1,2,\cdots,n-1. There are two elementary ways to do this problem. One is the approach of using double multinomial coefficient (see this post) and the other is to use a formula developed in this post. In addition to X_{k,n}, let Y_{k,n} be the number of cells that are occupied, i.e. Y_{k,n}=n-X_{k,n}.

Practice Problems
Compute P(X_{k,n}=j) where j=0,1,2,\cdots,n-1 for the following pairs of k and n.

  1. k=5 and n=5
  2. k=6 and n=5
  3. k=7 and n=5
  4. k=6 and n=6
  5. k=7 and n=6
  6. k=8 and n=6

We work the problem for k=7 and n=5. We show both the double multinomial coefficient approach and the formula approach. Recall the k here is the number of balls and n is the number of cells.

_____________________________________________________________________________________

Example – Double Multinomial Coefficient

Note that the double multinomial coefficient approach calculate the probabilities P(Y_{7,5}=j) where Y_{7,5} is the number of occupied cells when throwing 7 balls into 5 cells. In throwing 7 balls into 5 cells, there is a total of 5^7= 78125 many ordered samples. To calculate P(Y_{7,5}=j), first write down the representative occupancy sets for the event Y_{7,5}=j. For each occupancy set, calculate the number of ordered samples (out of 78125) that belong to that occupancy set. Then we add up all the counts for all the occupancy sets for Y_{7,5}=j. This is best illustrated with an example. To see the development of this idea, see this post.

Fist, consider the event of Y_{7,5}=1 (only one cell is occupied, i.e. all the balls going into one cell). A representative occupancy set is (0, 0, 0, 0, 7), all the balls going into the 5th cell. The first multinomial coefficient is on the 5 cells and the second multinomial coefficient is on the 7 balls.

    (0, 0, 0, 0, 7)

      \displaystyle \frac{5!}{4! 1!} \times \frac{7!}{7!}=5 \times 1 =5

    Total = 5

So there are 5 ordered samples out of 16807 that belong to the event Y_{7,5}=1. Note that the first multinomial coefficient is the number of to order the 5 cells where 4 of the cells are empty and one of the cells has 7 balls. The second multinomial coefficient is the number of ways to order the 7 balls where all 7 balls go into the 5th cell.

Now consider the event Y_{7,5}=2 (all 7 balls go into 2 cells). There are three representative occupancy sets. The following shows the calculation for each set.

    (0, 0, 0, 1, 6)

      \displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{1! 6!}=20 \times 7 =140
    (0, 0, 0, 2, 5)

      \displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{2! 5!}=20 \times 21 =420
    (0, 0, 0, 3, 4)

      \displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{3! 4!}=20 \times 35 =700

    Total = 140 + 420 + 700 = 1260

Here’s an explanation for the occupancy set (0, 0, 0, 3, 4). The occupancy set refers to the scenario that 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell. But we want to count all the possibilities such that 3 of the 7 balls go into one cell and 4 of the 7 balls go into another cell. Thus the first multinomial coefficient count the number of ways to order 5 cells where three of the cells are empty and one cell has 3 balls and the remaining cell has 4 balls (20 ways) The second multinomial coefficient is the number of ways to order the 7 balls where 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell (35 ways). So the total number of possibilities for the occupancy set (0, 0, 0, 3, 4) is 20 times 35 = 700. The sum total for three occupancy sets is 1260.

We now show the remaining calculation without further elaboration.

Now consider the event Y_{7,5}=3 (all 7 balls go into 3 cells). There are four representative occupancy sets. The following shows the calculation for each set.

    (0, 0, 1, 1, 5)

      \displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{1! 1! 5!}=30 \times 42 =1260
    (0, 0, 1, 2, 4)

      \displaystyle \frac{5!}{2! 1! 1! 1!} \times \frac{7!}{1! 2! 4!}=60 \times 105 =6300
    (0, 0, 1, 3, 3)

      \displaystyle \frac{5!}{2! 1! 2!} \times \frac{7!}{1! 3! 3!}=30 \times 140 =4200
    (0, 0, 2, 2, 3)

      \displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{2! 2! 3!}=30 \times 210 =6300

    Total = 1260 + 6300 + 4200 + 6300 = 18060

Now consider the event Y_{7,5}=4 (all 7 balls go into 4 cells, i.e. one empty cell). There are three representative occupancy sets. The following shows the calculation for each set.

    (0, 1, 1, 1, 4)

      \displaystyle \frac{5!}{1! 3! 1!} \times \frac{7!}{1! 1! 1! 4!}=20 \times 210 =4200
    (0, 1, 1, 2, 3)

      \displaystyle \frac{5!}{1! 2! 1! 1!} \times \frac{7!}{1! 1! 2! 3!}=60 \times 420 =25200
    (0, 1, 2, 2, 2)

      \displaystyle \frac{5!}{1! 1! 3!} \times \frac{7!}{1! 2! 2! 2!}=20 \times 630 =12600

    Total = 4200 + 25200 + 12600 = 42000

Now consider the event Y_{7,5}=4 (all 7 balls go into 5 cells, i.e. no empty cell). There are two representative occupancy sets. The following shows the calculation for each set.

    (1, 1, 1, 1, 3)

      \displaystyle \frac{5!}{4! 1!} \times \frac{7!}{1! 1! 1! 1! 3!}=5 \times 840 =4200
    (1, 1, 1, 2, 2)

      \displaystyle \frac{5!}{3! 2!} \times \frac{7!}{1! 1! 1! 2! 2!}=10 \times 1260 =12600

    Total = 4200 + 12600 = 16800

The following is the distribution for the random variable Y_{7,5}.

    \displaystyle P(Y_{7,5}=1)=\frac{5}{78125}=0.000064

    \displaystyle P(Y_{7,5}=2)=\frac{1260}{78125}=0.016128

    \displaystyle P(Y_{7,5}=3)=\frac{18060}{78125}=0.231168

    \displaystyle P(Y_{7,5}=4)=\frac{42000}{78125}=0.5376

    \displaystyle P(Y_{7,5}=5)=\frac{16800}{78125}=0.21504

3.951424

Remarks
In throwing 7 balls at random into 5 cells, it is not like that the balls are in only one or two cells (about 1.6% chance). The mean number of occupied cells is about 3.95. More than 50% of the times, 4 cells are occupied.

The above example has small numbers of balls and cells and is an excellent example for practicing the calculation. Working such problems can help build the intuition for the occupancy problem. However, when the numbers are larger, the calculation using double multinomial coefficient can be lengthy and tedious. Next we show how to use a formula for occupancy problem using the same example.

_____________________________________________________________________________________

Example – Formula Approach

The formula we use is developed in this post. Recall that X_{k,n} is the number of empty cells when throwing k balls into n cells. The formula calculates the probabilities P(X_{7,5}=j) where
0 \le j \le 5. The first step is to compute the probabilities P(X_{7,m}=0) for m=5,4,3,2,1. Each of these is the probability that all m cells are occupied (when throwing 7 balls into m cells). These 5 probabilities will be used to calculate P(X_{7,5}=j). This is a less direct implementation of the formula but gives a more intuitive explanation.

    \displaystyle \begin{aligned} P(X_{7,5}=0)&=1-P(X_{7,5} \ge 1) \\&=1-\sum \limits_{j=1}^5 (-1)^{j+1} \binom{5}{j} \biggl[ 1-\frac{j}{5} \biggr]^7 \\&=1-\biggl[5 \biggl(\frac{4}{5}\biggr)^7-10 \biggl(\frac{3}{5}\biggr)^7+10 \biggl(\frac{2}{5}\biggr)^7 -5 \biggl(\frac{1}{5}\biggr)^7 + 0 \biggr] \\&=1-\frac{81920-21870+1280-5}{78125} \\&=\frac{16800}{78125} \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,4}=0)&=1-P(X_{7,4} \ge 1) \\&=1-\sum \limits_{j=1}^4 (-1)^{j+1} \binom{4}{j} \biggl[ 1-\frac{j}{4} \biggr]^7 \\&=1-\biggl[4 \biggl(\frac{3}{4}\biggr)^7-6 \biggl(\frac{2}{4}\biggr)^7+4 \biggl(\frac{1}{4}\biggr)^7 - 0 \biggr] \\&=1-\frac{8748-768+4}{16384} \\&=\frac{8400}{16384} \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,3}=0)&=1-P(X_{7,3} \ge 1) \\&=1-\sum \limits_{j=1}^3 (-1)^{j+1} \binom{3}{j} \biggl[ 1-\frac{j}{3} \biggr]^7 \\&=1-\biggl[3 \biggl(\frac{2}{3}\biggr)^7-3 \biggl(\frac{1}{3}\biggr)^7+0 \biggr] \\&=1-\frac{384-3}{2187} \\&=\frac{1806}{2187} \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,2}=0)&=1-P(X_{7,2} \ge 1) \\&=1-\sum \limits_{j=1}^2 (-1)^{j+1} \binom{2}{j} \biggl[ 1-\frac{j}{2} \biggr]^7 \\&=1-\biggl[2 \biggl(\frac{1}{2}\biggr)^7-0 \biggr] \\&=1-\frac{2}{128} \\&=\frac{126}{128} \end{aligned}

    P(X_{7,1}=0)=1

Each of the above 5 probabilities (except the last one) is based on the probability P(X_{7,m} \ge 1), which is the probability that there is at least one cell that is empty when throwing 7 balls into m cells. The inclusion-exclusion principle is used to derive P(X_{7,m} \ge 1). The last of the five does not need calculation. When throwing 7 balls into one cell, there will be no empty cells.

Now the rest of the calculation:

    \displaystyle P(X_{7,5}=0)=\frac{16800}{78125}

    \displaystyle \begin{aligned} P(X_{7,5}=1)&=P(\text{1 empty cell}) \times P(\text{none of the other 4 cells is empty}) \\&=\binom{5}{1} \biggl(1-\frac{1}{5} \biggr)^7 \times P(X_{7,4}=0) \\&=5 \ \frac{16384}{78125} \times \frac{8400}{16384} \\&=\frac{42000}{78125}  \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,5}=2)&=P(\text{2 empty cells}) \times P(\text{none of the other 3 cells is empty}) \\&=\binom{5}{2} \biggl(1-\frac{2}{5} \biggr)^7 \times P(X_{7,3}=0) \\&=10 \ \frac{2187}{78125} \times \frac{1806}{2187} \\&=\frac{18060}{78125}  \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,5}=3)&=P(\text{3 empty cells}) \times P(\text{none of the other 2 cells is empty}) \\&=\binom{5}{3} \biggl(1-\frac{3}{5} \biggr)^7 \times P(X_{7,2}=0) \\&=10 \ \frac{128}{78125} \times \frac{126}{128} \\&=\frac{1260}{78125}  \end{aligned}

    \displaystyle \begin{aligned} P(X_{7,5}=4)&=P(\text{4 empty cells}) \times P(\text{the other 1 cell is not empty}) \\&=\binom{5}{4} \biggl(1-\frac{4}{5} \biggr)^7 \times P(X_{7,1}=0) \\&=5 \ \frac{1}{78125} \times 1 \\&=\frac{5}{78125}  \end{aligned}

Note that these answers agree with the ones from the double multinomial coefficient approach after making the adjustment Y_{7,5}=5-X_{7,5}.

_____________________________________________________________________________________

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }
_____________________________________________________________________________________

Answers

Problem 1
5 balls into 5 cells

    \displaystyle P(X_{5,5}=0)=P(Y_{5,5}=5)=\frac{120}{3125}

    \displaystyle P(X_{5,5}=1)=P(Y_{5,5}=4)=\frac{1200}{3125}

    \displaystyle P(X_{5,5}=2)=P(Y_{5,5}=3)=\frac{1500}{3125}

    \displaystyle P(X_{5,5}=3)=P(Y_{5,5}=2)=\frac{300}{3125}

    \displaystyle P(X_{5,5}=4)=P(Y_{5,5}=1)=\frac{5}{3125}

Problem 2
6 balls into 5 cells

    \displaystyle P(X_{6,5}=0)=P(Y_{6,5}=5)=\frac{1800}{15625}

    \displaystyle P(X_{6,5}=1)=P(Y_{6,5}=4)=\frac{7800}{15625}

    \displaystyle P(X_{6,5}=2)=P(Y_{6,5}=3)=\frac{5400}{15625}

    \displaystyle P(X_{6,5}=3)=P(Y_{6,5}=2)=\frac{620}{15625}

    \displaystyle P(X_{6,5}=4)=P(Y_{6,5}=1)=\frac{5}{15625}

Problem 3
7 balls into 5 cells. See above.

Problem 4
6 balls into 6 cells

    \displaystyle P(X_{6,6}=0)=P(Y_{6,6}=6)=\frac{720}{46656}

    \displaystyle P(X_{6,6}=1)=P(Y_{6,6}=5)=\frac{10800}{46656}

    \displaystyle P(X_{6,6}=2)=P(Y_{6,6}=4)=\frac{23400}{46656}

    \displaystyle P(X_{6,6}=3)=P(Y_{6,6}=3)=\frac{10800}{46656}

    \displaystyle P(X_{6,6}=4)=P(Y_{6,6}=2)=\frac{930}{46656}

    \displaystyle P(X_{6,6}=5)=P(Y_{6,6}=1)=\frac{6}{46656}

Problem 5
7 balls into 6 cells

    \displaystyle P(X_{7,6}=0)=P(Y_{7,6}=6)=\frac{15120}{279936}

    \displaystyle P(X_{7,6}=1)=P(Y_{7,6}=5)=\frac{100800}{279936}

    \displaystyle P(X_{7,6}=2)=P(Y_{7,6}=4)=\frac{126000}{279936}

    \displaystyle P(X_{7,6}=3)=P(Y_{7,6}=3)=\frac{36120}{279936}

    \displaystyle P(X_{7,6}=4)=P(Y_{7,6}=2)=\frac{1890}{279936}

    \displaystyle P(X_{7,6}=5)=P(Y_{7,6}=1)=\frac{6}{279936}

Problem 6
8 balls into 6 cells

    \displaystyle P(X_{8,6}=0)=P(Y_{8,6}=6)=\frac{191520}{1679616}

    \displaystyle P(X_{8,6}=1)=P(Y_{8,6}=5)=\frac{756000}{1679616}

    \displaystyle P(X_{8,6}=2)=P(Y_{8,6}=4)=\frac{612360}{1679616}

    \displaystyle P(X_{8,6}=3)=P(Y_{8,6}=3)=\frac{115920}{1679616}

    \displaystyle P(X_{8,6}=4)=P(Y_{8,6}=2)=\frac{3810}{1679616}

    \displaystyle P(X_{8,6}=5)=P(Y_{8,6}=1)=\frac{6}{1679616}

_____________________________________________________________________________________

\copyright \ 2015 \text{ by Dan Ma}

Advertisements

Practice problems for the Poisson distribution

This post has practice problems on the Poisson distribution. For a good discussion of the Poisson distribution and the Poisson process, see this blog post in the companion blog.

_____________________________________________________________________________________

Practice Problems

Practice Problem 1
Two taxi arrive on average at a certain street corner for every 15 minutes. Suppose that the number of taxi arriving at this street corner follows a Poisson distribution. Three people are waiting at the street corner for taxi (assuming they do not know each other and each one will have his own taxi). Each person will be late for work if he does not catch a taxi within the next 15 minutes. What is the probability that all three people will make it to work on time?

\text{ }

Practice Problem 2
A 5-county area in Kansas is hit on average by 3 tornadoes a year (assuming annual Poisson tornado count). What is the probability that the number of tornadoes will be more than the historical average next year in this area?

\text{ }

Practice Problem 3
A certain airline estimated that 0.8% of its customers with purchased tickets fail to show up for their flights. For one particular flight, the plane has 500 seats and the flight has been fully booked. How many additional tickets can the airline sell so that there is at least a 90% chance that everyone who shows up will have a seat?

\text{ }

Practice Problem 4
A life insurance insured 9000 men aged 45. The probability that a 45-year old man will die within one year is 0.0035. Within the next year, what is the probability that the insurance company will pay between 30 and 33 claims (both inclusive) among these 7000 men?

\text{ }

Practice Problem 5
In a certain manuscript of 1000 pages, 300 typographical errors occur.

  • What is the probability that a randomly selected page will be error free?
  • What is the probability that 10 randomly selected pages will have at most 3 errors?

\text{ }

Practice Problem 6
Trisomy 13, also called Patau syndrome, is a chromosomal condition associated with severe intellectual disability and physical abnormalities in many parts of the body. Trisomy 13 occurs , on the average, once in every 16,000 births. Suppose that in one country, 100,000 babies are born in a year. What is the probability that at most 3 births will develop this chromosomal condition?

\text{ }

Practice Problem 7
Traffic accidents occur along a 50-mile stretch of highway at the rate of 0.85 during the hour from 5 PM to 6 PM. Suppose that the number of traffic accidents in this stretch of highway follows a Poisson distribution. The department of transportation plans to observe the traffic flow in this stretch of highway during this hour in a two-day period. What is the probability that more than three accidents occur in this observation period?

\text{ }

Practice Problem 8
The odds of winning the Mega Million lottery is one in 176 million. Out of 176 million lottery tickets sold, what is the probability of having no winning ticket? What is the exact probability model in this problem?

_____________________________________________________________________________________

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }

\text{ }
_____________________________________________________________________________________

Answers

Practice Problem 1

  • 1-5e^{-2}=0.323323584

Practice Problem 2

  • 1-13e^{-3}=0.352768111

Practice Problem 3

  • Can oversell by 2 tickets.

Practice Problem 4

  • 0.278162459

Practice Problem 5

  • 0.740818221
  • 0.647231889

Practice Problem 6

  • 0.130250355 (using Poisson)
  • 0.130242377 (using Binomial)

Practice Problem 7

  • 0.093189434

Practice Problem 8

  • 0.367879441

_____________________________________________________________________________________

\copyright \ 2015 \text{ by Dan Ma}