## Calculating the skewness of a probability distribution

This post presents exercises on calculating the moment coefficient of skewness. These exercises are to reinforce the calculation demonstrated in this companion blog post.

For a given random variable $X$, the Pearson’s moment coefficient of skewness (or the coefficient of skewness) is denoted by $\gamma_1$ and is defined as follows:

\displaystyle \begin{aligned} \gamma_1&=\frac{E[ (X-\mu)^3 ]}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1) \\&=\frac{E(X^3)-3 \mu E(X^2)+3 \mu^2 E(X)-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu [E(X^2)+\mu E(X)]-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2) \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{(\sigma^2)^{\frac{3}{2}}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

(1) is the definition which is the ratio of the third central moment to the cube of the standard deviation. (2) and (3) are forms that may be easier to calculate. Essentially, if the first three raw moments $E(X)$, $E(X^2)$ and $E(X^3)$ are calculated, then the skewness coefficient can be derived via (3). For a more detailed discussion, see the companion blog post.

_____________________________________________________________________________________

Practice Problems

Practice Problems 1
Let $X$ be a random variable with density function $f(x)=10 x^9$ where $0. This is a beta distribution. Calculate the moment coefficient of skewness in two ways. One is to use formula (3) above. The other is to use the following formula for the skewness coefficient for beta distribution.

$\displaystyle \gamma_1=\frac{2(\beta-\alpha) \ \sqrt{\alpha+\beta+1}}{(\alpha+\beta+2) \ \sqrt{\alpha \ \beta}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

$\text{ }$

Practice Problems 2
Calculate the moment coefficient of skewness for $Y=X^2$ where $X$ is as in Practice Problem 1. It will be helpful to first calculate a formula for the raw moments $E(X^k)$ of $X$.

$\text{ }$

Practice Problems 3
Let $X$ be a random variable with density function $f(x)=8 (1-x)^7$ where $0. This is a beta distribution. Calculate the moment coefficient of skewness using (4).

$\text{ }$

Practice Problems 4
Suppose that $X$ follows a gamma distribution with PDF $f(x)=4 x e^{-2x}$ where $x>0$.

• Show that $E(X)=1$, $E(X^2)=\frac{3}{2}$ and $E(X^3)=3$.
• Use the first three raw moments to calculate the moment coefficient of skewness.

$\text{ }$

Practice Problems 5
Calculate the moment coefficient of skewness for $Y=X^2$ where $X$ is as in Practice Problem 4. It will be helpful to first calculate a formula for the raw moments $E(X^k)$ of $X$.

$\text{ }$

Practice Problems 6
Verify the calculation of $\gamma_1$ and the associated calculation of Example 6 in this companion blog post.

$\text{ }$

Practice Problems 7
Verify the calculation of $\gamma_1$ and the associated calculation of Example 7 in this companion blog post.

$\text{ }$

Practice Problems 8
Verify the calculation of $\gamma_1$ and the associated calculation of Example 8 in this companion blog post.

$\text{ }$
_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Practice Problems 1

• $\displaystyle \gamma_1=\frac{-36 \sqrt{3}}{13 \sqrt{10}}=-1.516770159$

$\text{ }$

Practice Problems 2

• $\displaystyle \gamma_1=\frac{- \sqrt{7}}{\sqrt{5}}=-1.183215957$

$\text{ }$

Practice Problems 3

• $\displaystyle \gamma_1=\frac{7 \sqrt{10}}{11 \sqrt{2}}=1.422952349$

$\text{ }$

Practice Problems 4

• $\displaystyle \gamma_1=\sqrt{2}$

$\text{ }$

Practice Problems 5

• $\displaystyle \gamma_1=\frac{138}{7 \sqrt{21}}=4.302009836$

$\text{ }$

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

## Practice problems for order statistics and multinomial probabilities

This post presents exercises on calculating order statistics using multinomial probabilities. These exercises are to reinforce the calculation demonstrated in this blog post.

_____________________________________________________________________________________

Practice Problems

Practice Problems 1
Draw a random sample $X_1,X_2,\cdots,X_{11}$ of size 11 from the uniform distribution $U(0,4)$. Calculate the following:

• $P(Y_4<2
• $P(Y_4<2

$\text{ }$

Practice Problems 2
Draw a random sample $X_1,X_2,\cdots,X_7$ of size 7 from the uniform distribution $U(0,5)$. Calculate the probability $P(Y_4<2<4.

$\text{ }$

Practice Problems 3
Same setting as in Practice Problem 2. Calculate $P(Y_7>4 \ | \ Y_4<2)$ and $P(Y_7>4)$. Compare the conditional probability with the unconditional probability. Does the answer for $P(Y_7>4 \ | \ Y_4<2)$ make sense in relation to $P(Y_7>4)$?

$\text{ }$

Practice Problems 4
Same setting as in Practice Problem 2. Calculate the following:

• $P(Y_4<2
• $P(2
• $P(2
• Does the answer for $P(2 make sense in relation to $P(2?

$\text{ }$

Practice Problems 5
Draw a random sample $X_1,X_2,\cdots,X_6$ of size 6 from the uniform distribution $U(0,4)$. Consider the conditional distribution $Y_3 \ | \ Y_5<2$. Calculate the following:

• $P(Y_3 \le t \ | \ Y_5<2)$
• $f_{Y_3}(t \ | \ Y_5<2)$
• $E(Y_3 \ | \ Y_5<2)$
• $E(Y_3)$

where $0. Compare $E(Y_3)$ and $E(Y_3 \ | \ Y_5<2)$. Does the answer for the conditional mean make sense?

$\text{ }$

Practice Problems 6
Draw a random sample $X_1,X_2,\cdots,X_7$ of size 7 from the uniform distribution $U(0,5)$. Calculate the following:

• $P(Y_4 > 4 \ | \ Y_2>2)$
• $P(Y_4 > 4)$
• Compare the two probabilities. Does the answer for the conditional probability make sense?

$\text{ }$
_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Practice Problems 1

• $\displaystyle \frac{11550}{177147}$
• $\displaystyle \frac{18480}{177147}$

$\text{ }$

Practice Problems 2

• $\displaystyle \frac{11088}{78125}$

$\text{ }$

Practice Problems 3

• $\displaystyle P(Y_7>4 \ | \ Y_4<2)=\frac{11088}{22640}$
• $\displaystyle P(Y_7>4)=\frac{61741}{78125}$

$\text{ }$

Practice Problems 4

• $\displaystyle P(Y_4<2
• $\displaystyle P(2
• $\displaystyle P(2

$\text{ }$

Practice Problems 5

• $\displaystyle P(Y_3 \le t \ | \ Y_5<2)=\frac{-10t^6+84t^5-300t^4+400t^3}{448}$
• $\displaystyle f_{Y_3}(t \ | \ Y_5<2)=\frac{-60t^5+420t^4-1200t^3+1200t^2}{448}$
• $\displaystyle E(Y_3 \ | \ Y_5<2)=\frac{55}{49}$
• $\displaystyle E(Y_3)=\frac{84}{49}$

$\text{ }$

Practice Problems 6

• $\displaystyle \frac{3641}{12393}$
• $\displaystyle \frac{2605}{78125}$

$\text{ }$

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

## Calculating the probability distributions of order statistics

This post presents exercises on finding the probability distributions of order statistics to complement a discussion of the same topic.

Consider a random sample $X_1,X_2,\cdots,X_n$ drawn from a continuous distribution with common distribution function $F(x)$. The order statistics $Y_1 are obtained by ranking the sample items in increasing order. In this post, we present some exercises to complement this previous post. The thought processes illustrated by these exercises will be helpful in non-parametric inference, specifically in the construction of confidence intervals for unknown population percentiles.

In the problems that follow, $Y_1 are the order statistics that arise from the random sample $X_1,X_2,\cdots,X_n$. There are two ways to work with the probability distribution of an order statistic $Y_j$. One is to find the distribution function $F_{Y_j}(y)=P(Y_j \le y)$. Once this is obtained, the density function $f_{Y_j}(y)$ is derived by taking derivative. Another way is to derive $f_{Y_j}(y)$ directly.

We assume that the random sample $X_1,X_2,\cdots,X_n$ is drawn from a probability distribution with distribution function $F(x)=P(X \le x)$ and with density function $f(x)$. To compute $F_{Y_j}(y)=P(Y_j \le y)$, note that for the event $Y_j \le y$ to occur, at least $j$ many sample items $X_i$ are less than $y$. So the random drawing of each sample item is a Bernoulli trial with probability of success $F(y)=P(X \le y)$. Thus $F_{Y_j}(y)=P(Y_j \le y)$ is the following probability computed from a binomial distribution.

$\displaystyle F_{Y_j}(y)=P(Y_j \le y)=\sum \limits_{k=j}^n \ \binom{n}{k} \ F(y)^k \ \biggl[1-F(y) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

Once the distribution function $F_{Y_j}(y)$ is found, the density function $f_{Y_j}(y)$ can be derived by taking derivative on $F_{Y_j}(y)$. The density function $f_{Y_j}(y)$ can also be obtained directly by this thought process. Think of the density function $f_{Y_j}(y)$ as the probability that the $j$th order statistic $Y_j$ is right around $y$. So there must be $j-1$ sample items less than $y$ and $n-j$ sample items above $y$. One way this can happen is:

$\displaystyle F(y)^{j-1} \ f(y) \ \biggl[1-F(y) \biggr]^{n-j}$

The first term is the probability that $j-1$ sample terms are less than $y$. The second term is the probability that one sample item is right around $y$. The third term is the probability that $n-j$ sample items are above $y$. But this is only one way. To capture all possibilities, we multiply it by the multinomial coefficient.

$\displaystyle f(_{Y_j}(y)=\frac{n!}{(j-1)! \ 1! \ (n-j)!} \ F(y)^{j-1} \ f(y) \ \biggl[1-F(y) \biggr]^{n-j} \ \ \ \ \ \ \ \ \ \ (2)$

_____________________________________________________________________________________

Practice Problems

Practice Problems 1
Draw a random sample $X_1,X_2,\cdots,X_8$ of size 8 from the uniform distribution $U(0,4)$. Calculate the probability $P(Y_3>3)$ where $Y_3$ is the third order statistic.

$\text{ }$

Practice Problems 2
Draw a random sample $X_1,X_2,X_3,X_4,X_5$ of size 5 from a continuous distribution with density function $f(x)=\frac{x}{2}$ where $0. Find the probability that the sample median is less than 1.

$\text{ }$

Practice Problems 3
Draw a random sample $X_1,X_2,X_3,X_4,X_5$ of size 5 from the uniform distribution $U(0,4)$.

1. Calculate the distribution function $P(Y_4 \le y)$ for the fourth order statistic. Then differentiate it to obtain the density function $f_{Y_4}(y)$ of $Y_4$.
2. Use the thought process behind formula (2) above to directly write down the density function $f_{Y_4}(y)$ directly.

$\text{ }$

Practice Problems 4
Draw a random sample $X_1,X_2,\cdots,X_9$ of size 9 from an exponential distribution with mean $\frac{1}{\alpha}$.

1. Calculate the distribution function $P(Y_1 \le y)$ for the first order statistic (the minimum). Then differentiate it to obtain the density function $f_{Y_1}(y)$ of $Y_1$.
2. Use the thought process behind formula (2) above to directly write down the density function $f_{Y_1}(y)$ directly.

$\text{ }$

Practice Problems 5
Draw a random sample $X_1,X_2,\cdots,X_9$ of size 9 from an exponential distribution with mean $\frac{1}{\alpha}$.

1. Calculate the distribution function $P(Y_2 \le y)$ for the second order statistic. Then differentiate it to obtain the density function $f_{Y_2}(y)$ of $Y_2$.
2. Use the thought process behind formula (2) above to directly write down the density function $f_{Y_2}(y)$ directly.

$\text{ }$

Practice Problems 6
Draw a random sample $X_1,X_2,\cdots,X_8$ of size 8 from the uniform distribution $U(0,1)$. Find $E(Y_6)$, the expected value of the sixth order statistic.

$\text{ }$

Practice Problems 7
Draw a random sample $X_1,X_2,\cdots,X_{10}$ of size 10 from the uniform distribution $U(0,1)$. Find $Var(Y_9)$, the variance of the ninth order statistic.

$\text{ }$

Practice Problems 8
Draw a random sample $X_1,X_2,\cdots,X_{6}$ of size 6 from a population whose 25th percentile is 83. Find the probability the third order statistic $Y_3$ is less than 83.

$\text{ }$

Practice Problems 9
Draw a random sample $X_1,X_2,\cdots,X_{6}$ of size 6 from a population whose 75th percentile is 105. Find the probability the third order statistic $Y_3$ is less than 105.

$\text{ }$

Practice Problems 10
Draw a random sample $X_1,X_2,X_3,X_4$ of size 4 from an exponential distribution with mean 10. Calculate $P(Y_4>15 \ | \ Y_4 > 10)$.

$\text{ }$

Practice Problems 11
Draw a random sample $X_1,X_2,X_3,X_4,X_5$ of size 5 from a continuous distribution with density function $f(x)=\frac{x}{2}$ where $0. Find $E(Y_5)$, the expected value of the sample maximum.

$\text{ }$

Practice Problems 12
Draw a random sample of size 12 from an exponential distribution with mean 2. Calculate $P(Y_1>0.5 | Y_1 > 0.25)$.

$\text{ }$

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Practice Problems 1

• $\displaystyle P(Y_3>3)=\frac{277}{65536}$

$\text{ }$

Practice Problems 2

• $\displaystyle P(Y_3<1)=\frac{106}{1024}$

$\text{ }$

Practice Problems 3

• $\displaystyle f_{Y_4}(y)=20 \ \biggl( \frac{y}{4} \biggr)^3 \ \frac{1}{4} \ \biggl(1- \frac{y}{4} \biggr)$

$\text{ }$

Practice Problems 4

• $\displaystyle f_{Y_1}(y)=9 \alpha \ e^{-9 \alpha y}$

$\text{ }$

Practice Problems 5

• $\displaystyle f_{Y_2}(y)=72 \ \biggl(1-e^{-\alpha y} \biggr) \ \alpha e^{-\alpha y} \ \biggl( e^{-\alpha y} \biggr)^7$

$\text{ }$

Practice Problems 6

• $\displaystyle E(Y_6)=\frac{2}{3}$

$\text{ }$

Practice Problems 7

• $\displaystyle Var(Y_9)=\frac{3}{242}$

$\text{ }$

Practice Problems 8

• $\displaystyle P(Y_3<83)=\frac{694}{4096}$

$\text{ }$

Practice Problems 9

• $\displaystyle P(Y_3<105)=\frac{3942}{4096}$

$\text{ }$

Practice Problems 10

• $\displaystyle P(Y_4>15 \ | \ Y_4 > 10)$ = 0.756546693

$\text{ }$

Practice Problems 11

• $\displaystyle E(Y_5)=\frac{20}{11}$

$\text{ }$

Practice Problems 12

• $\displaystyle P(Y_1>0.5 | Y_1 > 0.25)=e^{-1.5}$

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

## Calculating the occupancy problem

The occupancy problem refers to the experiment of randomly throwing $k$ balls into $n$ cells. Out of this experiment, there are many problems that can be asked. In this post we focus on question: what is the probability that exactly $j$ of the cells are empty after randomly throwing $k$ balls into $n$ cells, where $0 \le j \le n-1$?

For a better perspective, there are many other ways to describe the experiment of throwing $k$ balls into $n$ cells. For example, throwing $k$ balls into 6 cells can be interpreted as rolling $k$ dice (or rolling a die $k$ times). Throwing $k$ balls into 365 cells can be interpreted as randomly selecting $k$ people and classifying them according to their dates of birth (assuming 365 days in a year). Another context is coupon collecting – the different types of coupons represent the $n$ cells and the coupons being collected represent the $k$ balls.

_____________________________________________________________________________________

Practice Problems

Let $X_{k,n}$ be the number of cells that are empty (i.e. not occupied) when throwing $k$ balls into $n$ cells. As noted above, the problem discussed in this post is to find the probability function $P(X_{k,n}=j)$ where $j=0,1,2,\cdots,n-1$. There are two elementary ways to do this problem. One is the approach of using double multinomial coefficient (see this post) and the other is to use a formula developed in this post. In addition to $X_{k,n}$, let $Y_{k,n}$ be the number of cells that are occupied, i.e. $Y_{k,n}=n-X_{k,n}$.

Practice Problems
Compute $P(X_{k,n}=j)$ where $j=0,1,2,\cdots,n-1$ for the following pairs of $k$ and $n$.

1. $k=5$ and $n=5$
2. $k=6$ and $n=5$
3. $k=7$ and $n=5$
4. $k=6$ and $n=6$
5. $k=7$ and $n=6$
6. $k=8$ and $n=6$

We work the problem for $k=7$ and $n=5$. We show both the double multinomial coefficient approach and the formula approach. Recall the $k$ here is the number of balls and $n$ is the number of cells.

_____________________________________________________________________________________

Example – Double Multinomial Coefficient

Note that the double multinomial coefficient approach calculate the probabilities $P(Y_{7,5}=j)$ where $Y_{7,5}$ is the number of occupied cells when throwing 7 balls into 5 cells. In throwing 7 balls into 5 cells, there is a total of $5^7=$ 78125 many ordered samples. To calculate $P(Y_{7,5}=j)$, first write down the representative occupancy sets for the event $Y_{7,5}=j$. For each occupancy set, calculate the number of ordered samples (out of 78125) that belong to that occupancy set. Then we add up all the counts for all the occupancy sets for $Y_{7,5}=j$. This is best illustrated with an example. To see the development of this idea, see this post.

Fist, consider the event of $Y_{7,5}=1$ (only one cell is occupied, i.e. all the balls going into one cell). A representative occupancy set is (0, 0, 0, 0, 7), all the balls going into the 5th cell. The first multinomial coefficient is on the 5 cells and the second multinomial coefficient is on the 7 balls.

(0, 0, 0, 0, 7)

$\displaystyle \frac{5!}{4! 1!} \times \frac{7!}{7!}=5 \times 1 =5$

Total = 5

So there are 5 ordered samples out of 16807 that belong to the event $Y_{7,5}=1$. Note that the first multinomial coefficient is the number of to order the 5 cells where 4 of the cells are empty and one of the cells has 7 balls. The second multinomial coefficient is the number of ways to order the 7 balls where all 7 balls go into the 5th cell.

Now consider the event $Y_{7,5}=2$ (all 7 balls go into 2 cells). There are three representative occupancy sets. The following shows the calculation for each set.

(0, 0, 0, 1, 6)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{1! 6!}=20 \times 7 =140$
(0, 0, 0, 2, 5)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{2! 5!}=20 \times 21 =420$
(0, 0, 0, 3, 4)

$\displaystyle \frac{5!}{3! 1! 1!} \times \frac{7!}{3! 4!}=20 \times 35 =700$

Total = 140 + 420 + 700 = 1260

Here’s an explanation for the occupancy set (0, 0, 0, 3, 4). The occupancy set refers to the scenario that 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell. But we want to count all the possibilities such that 3 of the 7 balls go into one cell and 4 of the 7 balls go into another cell. Thus the first multinomial coefficient count the number of ways to order 5 cells where three of the cells are empty and one cell has 3 balls and the remaining cell has 4 balls (20 ways) The second multinomial coefficient is the number of ways to order the 7 balls where 3 of the 7 balls go into the 4th cell and 4 of the 7 balls go into the 5th cell (35 ways). So the total number of possibilities for the occupancy set (0, 0, 0, 3, 4) is 20 times 35 = 700. The sum total for three occupancy sets is 1260.

We now show the remaining calculation without further elaboration.

Now consider the event $Y_{7,5}=3$ (all 7 balls go into 3 cells). There are four representative occupancy sets. The following shows the calculation for each set.

(0, 0, 1, 1, 5)

$\displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{1! 1! 5!}=30 \times 42 =1260$
(0, 0, 1, 2, 4)

$\displaystyle \frac{5!}{2! 1! 1! 1!} \times \frac{7!}{1! 2! 4!}=60 \times 105 =6300$
(0, 0, 1, 3, 3)

$\displaystyle \frac{5!}{2! 1! 2!} \times \frac{7!}{1! 3! 3!}=30 \times 140 =4200$
(0, 0, 2, 2, 3)

$\displaystyle \frac{5!}{2! 2! 1!} \times \frac{7!}{2! 2! 3!}=30 \times 210 =6300$

Total = 1260 + 6300 + 4200 + 6300 = 18060

Now consider the event $Y_{7,5}=4$ (all 7 balls go into 4 cells, i.e. one empty cell). There are three representative occupancy sets. The following shows the calculation for each set.

(0, 1, 1, 1, 4)

$\displaystyle \frac{5!}{1! 3! 1!} \times \frac{7!}{1! 1! 1! 4!}=20 \times 210 =4200$
(0, 1, 1, 2, 3)

$\displaystyle \frac{5!}{1! 2! 1! 1!} \times \frac{7!}{1! 1! 2! 3!}=60 \times 420 =25200$
(0, 1, 2, 2, 2)

$\displaystyle \frac{5!}{1! 1! 3!} \times \frac{7!}{1! 2! 2! 2!}=20 \times 630 =12600$

Total = 4200 + 25200 + 12600 = 42000

Now consider the event $Y_{7,5}=4$ (all 7 balls go into 5 cells, i.e. no empty cell). There are two representative occupancy sets. The following shows the calculation for each set.

(1, 1, 1, 1, 3)

$\displaystyle \frac{5!}{4! 1!} \times \frac{7!}{1! 1! 1! 1! 3!}=5 \times 840 =4200$
(1, 1, 1, 2, 2)

$\displaystyle \frac{5!}{3! 2!} \times \frac{7!}{1! 1! 1! 2! 2!}=10 \times 1260 =12600$

Total = 4200 + 12600 = 16800

The following is the distribution for the random variable $Y_{7,5}$.

$\displaystyle P(Y_{7,5}=1)=\frac{5}{78125}=0.000064$

$\displaystyle P(Y_{7,5}=2)=\frac{1260}{78125}=0.016128$

$\displaystyle P(Y_{7,5}=3)=\frac{18060}{78125}=0.231168$

$\displaystyle P(Y_{7,5}=4)=\frac{42000}{78125}=0.5376$

$\displaystyle P(Y_{7,5}=5)=\frac{16800}{78125}=0.21504$

3.951424

Remarks
In throwing 7 balls at random into 5 cells, it is not like that the balls are in only one or two cells (about 1.6% chance). The mean number of occupied cells is about 3.95. More than 50% of the times, 4 cells are occupied.

The above example has small numbers of balls and cells and is an excellent example for practicing the calculation. Working such problems can help build the intuition for the occupancy problem. However, when the numbers are larger, the calculation using double multinomial coefficient can be lengthy and tedious. Next we show how to use a formula for occupancy problem using the same example.

_____________________________________________________________________________________

Example – Formula Approach

The formula we use is developed in this post. Recall that $X_{k,n}$ is the number of empty cells when throwing $k$ balls into $n$ cells. The formula calculates the probabilities $P(X_{7,5}=j)$ where
$0 \le j \le 5$. The first step is to compute the probabilities $P(X_{7,m}=0)$ for $m=5,4,3,2,1$. Each of these is the probability that all m cells are occupied (when throwing 7 balls into $m$ cells). These 5 probabilities will be used to calculate $P(X_{7,5}=j)$. This is a less direct implementation of the formula but gives a more intuitive explanation.

\displaystyle \begin{aligned} P(X_{7,5}=0)&=1-P(X_{7,5} \ge 1) \\&=1-\sum \limits_{j=1}^5 (-1)^{j+1} \binom{5}{j} \biggl[ 1-\frac{j}{5} \biggr]^7 \\&=1-\biggl[5 \biggl(\frac{4}{5}\biggr)^7-10 \biggl(\frac{3}{5}\biggr)^7+10 \biggl(\frac{2}{5}\biggr)^7 -5 \biggl(\frac{1}{5}\biggr)^7 + 0 \biggr] \\&=1-\frac{81920-21870+1280-5}{78125} \\&=\frac{16800}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,4}=0)&=1-P(X_{7,4} \ge 1) \\&=1-\sum \limits_{j=1}^4 (-1)^{j+1} \binom{4}{j} \biggl[ 1-\frac{j}{4} \biggr]^7 \\&=1-\biggl[4 \biggl(\frac{3}{4}\biggr)^7-6 \biggl(\frac{2}{4}\biggr)^7+4 \biggl(\frac{1}{4}\biggr)^7 - 0 \biggr] \\&=1-\frac{8748-768+4}{16384} \\&=\frac{8400}{16384} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,3}=0)&=1-P(X_{7,3} \ge 1) \\&=1-\sum \limits_{j=1}^3 (-1)^{j+1} \binom{3}{j} \biggl[ 1-\frac{j}{3} \biggr]^7 \\&=1-\biggl[3 \biggl(\frac{2}{3}\biggr)^7-3 \biggl(\frac{1}{3}\biggr)^7+0 \biggr] \\&=1-\frac{384-3}{2187} \\&=\frac{1806}{2187} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,2}=0)&=1-P(X_{7,2} \ge 1) \\&=1-\sum \limits_{j=1}^2 (-1)^{j+1} \binom{2}{j} \biggl[ 1-\frac{j}{2} \biggr]^7 \\&=1-\biggl[2 \biggl(\frac{1}{2}\biggr)^7-0 \biggr] \\&=1-\frac{2}{128} \\&=\frac{126}{128} \end{aligned}

$P(X_{7,1}=0)=1$

Each of the above 5 probabilities (except the last one) is based on the probability $P(X_{7,m} \ge 1)$, which is the probability that there is at least one cell that is empty when throwing 7 balls into $m$ cells. The inclusion-exclusion principle is used to derive $P(X_{7,m} \ge 1)$. The last of the five does not need calculation. When throwing 7 balls into one cell, there will be no empty cells.

Now the rest of the calculation:

$\displaystyle P(X_{7,5}=0)=\frac{16800}{78125}$

\displaystyle \begin{aligned} P(X_{7,5}=1)&=P(\text{1 empty cell}) \times P(\text{none of the other 4 cells is empty}) \\&=\binom{5}{1} \biggl(1-\frac{1}{5} \biggr)^7 \times P(X_{7,4}=0) \\&=5 \ \frac{16384}{78125} \times \frac{8400}{16384} \\&=\frac{42000}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=2)&=P(\text{2 empty cells}) \times P(\text{none of the other 3 cells is empty}) \\&=\binom{5}{2} \biggl(1-\frac{2}{5} \biggr)^7 \times P(X_{7,3}=0) \\&=10 \ \frac{2187}{78125} \times \frac{1806}{2187} \\&=\frac{18060}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=3)&=P(\text{3 empty cells}) \times P(\text{none of the other 2 cells is empty}) \\&=\binom{5}{3} \biggl(1-\frac{3}{5} \biggr)^7 \times P(X_{7,2}=0) \\&=10 \ \frac{128}{78125} \times \frac{126}{128} \\&=\frac{1260}{78125} \end{aligned}

\displaystyle \begin{aligned} P(X_{7,5}=4)&=P(\text{4 empty cells}) \times P(\text{the other 1 cell is not empty}) \\&=\binom{5}{4} \biggl(1-\frac{4}{5} \biggr)^7 \times P(X_{7,1}=0) \\&=5 \ \frac{1}{78125} \times 1 \\&=\frac{5}{78125} \end{aligned}

Note that these answers agree with the ones from the double multinomial coefficient approach after making the adjustment $Y_{7,5}=5-X_{7,5}$.

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Problem 1
5 balls into 5 cells

$\displaystyle P(X_{5,5}=0)=P(Y_{5,5}=5)=\frac{120}{3125}$

$\displaystyle P(X_{5,5}=1)=P(Y_{5,5}=4)=\frac{1200}{3125}$

$\displaystyle P(X_{5,5}=2)=P(Y_{5,5}=3)=\frac{1500}{3125}$

$\displaystyle P(X_{5,5}=3)=P(Y_{5,5}=2)=\frac{300}{3125}$

$\displaystyle P(X_{5,5}=4)=P(Y_{5,5}=1)=\frac{5}{3125}$

Problem 2
6 balls into 5 cells

$\displaystyle P(X_{6,5}=0)=P(Y_{6,5}=5)=\frac{1800}{15625}$

$\displaystyle P(X_{6,5}=1)=P(Y_{6,5}=4)=\frac{7800}{15625}$

$\displaystyle P(X_{6,5}=2)=P(Y_{6,5}=3)=\frac{5400}{15625}$

$\displaystyle P(X_{6,5}=3)=P(Y_{6,5}=2)=\frac{620}{15625}$

$\displaystyle P(X_{6,5}=4)=P(Y_{6,5}=1)=\frac{5}{15625}$

Problem 3
7 balls into 5 cells. See above.

Problem 4
6 balls into 6 cells

$\displaystyle P(X_{6,6}=0)=P(Y_{6,6}=6)=\frac{720}{46656}$

$\displaystyle P(X_{6,6}=1)=P(Y_{6,6}=5)=\frac{10800}{46656}$

$\displaystyle P(X_{6,6}=2)=P(Y_{6,6}=4)=\frac{23400}{46656}$

$\displaystyle P(X_{6,6}=3)=P(Y_{6,6}=3)=\frac{10800}{46656}$

$\displaystyle P(X_{6,6}=4)=P(Y_{6,6}=2)=\frac{930}{46656}$

$\displaystyle P(X_{6,6}=5)=P(Y_{6,6}=1)=\frac{6}{46656}$

Problem 5
7 balls into 6 cells

$\displaystyle P(X_{7,6}=0)=P(Y_{7,6}=6)=\frac{15120}{279936}$

$\displaystyle P(X_{7,6}=1)=P(Y_{7,6}=5)=\frac{100800}{279936}$

$\displaystyle P(X_{7,6}=2)=P(Y_{7,6}=4)=\frac{126000}{279936}$

$\displaystyle P(X_{7,6}=3)=P(Y_{7,6}=3)=\frac{36120}{279936}$

$\displaystyle P(X_{7,6}=4)=P(Y_{7,6}=2)=\frac{1890}{279936}$

$\displaystyle P(X_{7,6}=5)=P(Y_{7,6}=1)=\frac{6}{279936}$

Problem 6
8 balls into 6 cells

$\displaystyle P(X_{8,6}=0)=P(Y_{8,6}=6)=\frac{191520}{1679616}$

$\displaystyle P(X_{8,6}=1)=P(Y_{8,6}=5)=\frac{756000}{1679616}$

$\displaystyle P(X_{8,6}=2)=P(Y_{8,6}=4)=\frac{612360}{1679616}$

$\displaystyle P(X_{8,6}=3)=P(Y_{8,6}=3)=\frac{115920}{1679616}$

$\displaystyle P(X_{8,6}=4)=P(Y_{8,6}=2)=\frac{3810}{1679616}$

$\displaystyle P(X_{8,6}=5)=P(Y_{8,6}=1)=\frac{6}{1679616}$

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

## Practice problems for the Poisson distribution

This post has practice problems on the Poisson distribution. For a good discussion of the Poisson distribution and the Poisson process, see this blog post in the companion blog.

_____________________________________________________________________________________

Practice Problems

Practice Problem 1
Two taxi arrive on average at a certain street corner for every 15 minutes. Suppose that the number of taxi arriving at this street corner follows a Poisson distribution. Three people are waiting at the street corner for taxi (assuming they do not know each other and each one will have his own taxi). Each person will be late for work if he does not catch a taxi within the next 15 minutes. What is the probability that all three people will make it to work on time?

$\text{ }$

Practice Problem 2
A 5-county area in Kansas is hit on average by 3 tornadoes a year (assuming annual Poisson tornado count). What is the probability that the number of tornadoes will be more than the historical average next year in this area?

$\text{ }$

Practice Problem 3
A certain airline estimated that 0.8% of its customers with purchased tickets fail to show up for their flights. For one particular flight, the plane has 500 seats and the flight has been fully booked. How many additional tickets can the airline sell so that there is at least a 90% chance that everyone who shows up will have a seat?

$\text{ }$

Practice Problem 4
A life insurance insured 9000 men aged 45. The probability that a 45-year old man will die within one year is 0.0035. Within the next year, what is the probability that the insurance company will pay between 30 and 33 claims (both inclusive) among these 7000 men?

$\text{ }$

Practice Problem 5
In a certain manuscript of 1000 pages, 300 typographical errors occur.

• What is the probability that a randomly selected page will be error free?
• What is the probability that 10 randomly selected pages will have at most 3 errors?

$\text{ }$

Practice Problem 6
Trisomy 13, also called Patau syndrome, is a chromosomal condition associated with severe intellectual disability and physical abnormalities in many parts of the body. Trisomy 13 occurs , on the average, once in every 16,000 births. Suppose that in one country, 100,000 babies are born in a year. What is the probability that at most 3 births will develop this chromosomal condition?

$\text{ }$

Practice Problem 7
Traffic accidents occur along a 50-mile stretch of highway at the rate of 0.85 during the hour from 5 PM to 6 PM. Suppose that the number of traffic accidents in this stretch of highway follows a Poisson distribution. The department of transportation plans to observe the traffic flow in this stretch of highway during this hour in a two-day period. What is the probability that more than three accidents occur in this observation period?

$\text{ }$

Practice Problem 8
The odds of winning the Mega Million lottery is one in 176 million. Out of 176 million lottery tickets sold, what is the probability of having no winning ticket? What is the exact probability model in this problem?

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

Practice Problem 1

• $1-5e^{-2}=0.323323584$

Practice Problem 2

• $1-13e^{-3}=0.352768111$

Practice Problem 3

• Can oversell by 2 tickets.

Practice Problem 4

• 0.278162459

Practice Problem 5

• 0.740818221
• 0.647231889

Practice Problem 6

• 0.130250355 (using Poisson)
• 0.130242377 (using Binomial)

Practice Problem 7

• 0.093189434

Practice Problem 8

• 0.367879441

_____________________________________________________________________________________

$\copyright \ 2015 \text{ by Dan Ma}$

## Practice Problems for Conditional Distributions, Part 2

The following are practice problems on conditional distributions. The thought process of how to work with these practice problems can be found in the blog post Conditionals Distribution, Part 2.

_____________________________________________________________________________________

Practice Problems

Practice Problem 1

Suppose that $X$ is the lifetime (in years) of a brand new machine of a certain type. The following is the density function.

$\displaystyle f(x)=\frac{1}{8 \sqrt{x}}, \ \ \ \ \ \ \ \ \ 1

You just purchase a 9-year old machine of this type that is in good working condition. Compute the following:

• What is the expected lifetime of this 9-year old machine?
• What is the expected remaining life of this 9-year old machine?

$\text{ }$

Practice Problem 2

Suppose that $X$ is the total amount of damages (in millions of dollars) resulting from the occurrence of a severe wind storm in a certain city. The following is the density function of $X$.

$\displaystyle f(x)=\frac{81}{(x+3)^4}, \ \ \ \ \ \ \ \ \ 0

Suppose that the next storm is expected to cause damages exceeding one million dollars. Compute the following:

• What is the expected total amount of damages for the next storm given that it will exceeds one million dollars?
• The city has a reserve fund of one million dollars to cover the damages from the next storm. Given the amount of damages for the next storm will exceeds one million dollars, what is the expected total amount of damages in excess of the amount in the reserve fund?

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

The thought process of how to work with these practice problems can be found in the blog post Conditionals Distribution, Part 2.

Practice Problem 1

$\displaystyle E(X \lvert X>9)=\frac{49}{3}=16.33 \text{ years}$

$\displaystyle E(X-9 \lvert X>9)=\frac{22}{3}=7.33 \text{ years}$

Practice Problem 2

$\displaystyle E(X \lvert X>1)=3 \text{ millions}$

$\displaystyle E(X-1 \lvert X>1)=2 \text{ millions}$

_____________________________________________________________________________________

$\copyright \ 2013 \text{ by Dan Ma}$

## Practice Problems for Conditional Distributions, Part 1

The following are practice problems on conditional distributions. The thought process of how to work with these practice problems can be found in the blog post Conditionals Distribution, Part 1.

_____________________________________________________________________________________

Description of Problems

Suppose $X$ and $Y$ are independent binomial distributions with the following parameters.

For $X$, number of trials $n=5$, success probability $\displaystyle p=\frac{1}{2}$

For $Y$, number of trials $n=5$, success probability $\displaystyle p=\frac{3}{4}$

We can think of these random variables as the results of two students taking a multiple choice test with 5 questions. For example, let $X$ be the number of correct answers for one student and $Y$ be the number of correct answers for the other student. For the practice problems below, passing the test means having 3 or more correct answers.

Suppose we have some new information about the results of the test. The problems below are to derive the conditional distributions of $X$ or $Y$ based on the new information and to compare the conditional distributions with the unconditional distributions.

Practice Problem 1

• New information: $X.
• Derive the conditional distribution for $X \lvert X.
• Derive the conditional distribution for $Y \lvert X.
• Compare these conditional distributions with the unconditional ones with respect to mean and probability of passing.
• What is the effect of the new information on the test performance of each of the students?
• Explain why the new information has the effect on the test performance?

Practice Problem 2

• New information: $X>Y$.
• Derive the conditional distribution for $X \lvert X>Y$.
• Derive the conditional distribution for $Y \lvert X>Y$.
• Compare these conditional distributions with the unconditional ones with respect to mean and probability of passing.
• What is the effect of the new information on the test performance of each of the students?
• Explain why the new information has the effect on the test performance?

Practice Problem 3

• New information: $Y=X+1$.
• Derive the conditional distribution for $X \lvert Y=X+1$.
• Derive the conditional distribution for $Y \lvert Y=X+1$.
• Compare these conditional distributions with the unconditional ones with respect to mean and probability of passing.
• What is the effect of the new information on the test performance of each of the students?
• Explain why the new information has the effect on the test performance?

_____________________________________________________________________________________

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$

$\text{ }$
_____________________________________________________________________________________

To let you know that you are on the right track, the conditional distributions are given below.

The thought process of how to work with these practice problems can be found in the blog post Conditional Distributions, Part 1.

Practice Problem 1

$\displaystyle P(X=0 \lvert X

$\displaystyle P(X=1 \lvert X

$\displaystyle P(X=2 \lvert X

$\displaystyle P(X=3 \lvert X

$\displaystyle P(X=4 \lvert X

____________________

$\displaystyle P(Y=1 \lvert X

$\displaystyle P(Y=2 \lvert X

$\displaystyle P(Y=3 \lvert X

$\displaystyle P(Y=4 \lvert X

$\displaystyle P(Y=5 \lvert X

Practice Problem 2

$\displaystyle P(X=1 \lvert X>Y)=\frac{5}{3386}=0.0013$

$\displaystyle P(X=2 \lvert X>Y)=\frac{160}{3386}=0.04$

$\displaystyle P(X=3 \lvert X>Y)=\frac{1060}{3386}=0.2728$

$\displaystyle P(X=4 \lvert X>Y)=\frac{1880}{3386}=0.4838$

$\displaystyle P(X=5 \lvert X>Y)=\frac{781}{3386}=0.2$

____________________

$\displaystyle P(Y=0 \lvert X>Y)=\frac{31}{3386}=0.008$

$\displaystyle P(Y=1 \lvert X>Y)=\frac{390}{3386}=0.1$

$\displaystyle P(Y=2 \lvert X>Y)=\frac{1440}{3386}=0.37$

$\displaystyle P(Y=3 \lvert X>Y)=\frac{1620}{3386}=0.417$

$\displaystyle P(Y=4 \lvert X>Y)=\frac{405}{3386}=0.104$

Practice Problem 3

$\displaystyle P(X=0 \lvert Y=X+1)=\frac{15}{8430}=0.002$

$\displaystyle P(X=1 \lvert Y=X+1)=\frac{450}{8430}=0.053$

$\displaystyle P(X=2 \lvert Y=X+1)=\frac{2700}{8430}=0.32$

$\displaystyle P(X=3 \lvert Y=X+1)=\frac{4050}{8430}=0.48$

$\displaystyle P(X=4 \lvert Y=X+1)=\frac{1215}{8430}=0.144$

____________________

$\displaystyle P(Y=1 \lvert Y=X+1)=\frac{15}{8430}=0.002$

$\displaystyle P(Y=2 \lvert Y=X+1)=\frac{450}{8430}=0.053$

$\displaystyle P(Y=3 \lvert Y=X+1)=\frac{2700}{8430}=0.32$

$\displaystyle P(Y=4 \lvert Y=X+1)=\frac{4050}{8430}=0.48$

$\displaystyle P(Y=5 \lvert Y=X+1)=\frac{1215}{8430}=0.144$

_____________________________________________________________________________________

$\copyright \ 2013 \text{ by Dan Ma}$