Tag Archives: Mixture Distributions

Mixing Bowls of Balls

We present problems involving mixture distributions in the context of choosing bowls of balls, as well as related problems involving Bayes’ formula. Problem 1a and Problem 1b are discussed. Problem 2a and Problem 2b are left as exercises.

____________________________________________________________

Problem 1a
There are two identical looking bowls. Let’s call them Bowl 1 and Bowl 2. In Bowl 1, there are 1 red ball and 4 white balls. In Bowl 2, there are 4 red balls and 1 white ball. One bowl is selected at random and its identify is kept from you. From the chosen bowl, you randomly select 5 balls (one at a time, putting it back before picking another one). What is the expected number of red balls in the 5 selected balls? What the variance of the number of red balls?

Problem 1b
Use the same information in Problem 1a. Suppose there are 3 red balls in the 5 selected balls. What is the probability that the unknown chosen bowl is Bowl 1? What is the probability that the unknown chosen bowl is Bowl 2?

____________________________________________________________

Problem 2a
There are three identical looking bowls. Let’s call them Bowl 1, Bowl 2 and Bowl 3. Bowl 1 has 1 red ball and 9 white balls. Bowl 2 has 4 red balls and 6 white balls. Bowl 3 has 6 red balls and 4 white balls. A bowl is chosen according to the following probabilities:

\displaystyle \begin{aligned}\text{Probabilities:} \ \ \ \ \ &P(\text{Bowl 1})=0.6 \\&P(\text{Bowl 2})=0.3 \\&P(\text{Bowl 3})=0.1 \end{aligned}

The bowl is chosen so that its identity is kept from you. From the chosen bowl, 5 balls are selected sequentially with replacement. What is the expected number of red balls in the 5 selected balls? What is the variance of the number of red balls?

Problem 2b
Use the same information in Problem 2a. Given that there are 4 red balls in the 5 selected balls, what is the probability that the chosen bowl is Bowl i, where i = 1,2,3?

____________________________________________________________
Solution – Problem 1a

Problem 1a is a mixture of two binomial distributions and is similar to Problem 1 in the previous post Mixing Binomial Distributions. Let X be the number of red balls in the 5 balls chosen from the unknown bowl. The following is the probability function:

    \displaystyle P(X=x)=0.5 \binom{5}{x} \biggl[\frac{1}{5}\biggr]^x \biggl[\frac{4}{5}\biggr]^{4-x}+0.5 \binom{5}{x} \biggl[\frac{4}{5}\biggr]^x \biggl[\frac{1}{5}\biggr]^{4-x}

where X=0,1,2,3,4,5.

The above probability function is the weighted average of two conditional binomial distributions (with equal weights). Thus the mean (first moment) and the second moment of X would be the weighted averages of the two same items of the conditional distributions. We have:

    \displaystyle E(X)=0.5 \biggl[ 5 \times \frac{1}{5} \biggr] + 0.5 \biggl[ 5 \times \frac{4}{5} \biggr] =\frac{5}{2}
    \displaystyle E(X^2)=0.5 \biggl[ 5 \times \frac{1}{5} \times \frac{4}{5} +\biggl( 5 \times \frac{1}{5} \biggr)^2 \biggr]

      \displaystyle + 0.5 \biggl[ 5 \times \frac{4}{5} \times \frac{1}{5} +\biggl( 5 \times \frac{4}{5} \biggr)^2 \biggr]=\frac{93}{10}
    \displaystyle Var(X)=\frac{93}{10} - \biggl( \frac{5}{2} \biggr)^2=\frac{61}{20}=3.05

See Mixing Binomial Distributions for a more detailed explanation of the calculation.

____________________________________________________________

Solution – Problem 1b
As above, let X be the number of red balls in the 5 selected balls. The probability P(X=3) must account for the two bowls. Thus it is obtained by mixing two binomial probabilities:

    \displaystyle P(X=3)=\frac{1}{2} \binom{5}{3} \biggl(\frac{1}{5}\biggr)^3 \biggl(\frac{4}{5}\biggr)^2+\frac{1}{2} \binom{5}{3} \biggl(\frac{4}{5}\biggr)^3 \biggl(\frac{1}{5}\biggr)^2

The following is the conditional probability P(\text{Bowl 1} \lvert X=3):

    \displaystyle \begin{aligned} P(\text{Bowl 1} \lvert X=3)&=\frac{\displaystyle \frac{1}{2} \binom{5}{3} \biggl(\frac{1}{5}\biggr)^3 \biggl(\frac{4}{5}\biggr)^2}{P(X=3)} \\&=\frac{16}{16+64} \\&=\frac{1}{5} \end{aligned}

Thus \displaystyle P(\text{Bowl 1} \lvert X=3)=\frac{4}{5}

____________________________________________________________

Answers for Problem 2

Problem 2a
Let X be the number of red balls in the 5 balls chosen random from the unknown bowl.

    E(X)=1.2
    Var(X)=1.56

Problem 2b

    \displaystyle P(\text{Bowl 1} \lvert X=4)=\frac{27}{4923}=0.0055

    \displaystyle P(\text{Bowl 2} \lvert X=4)=\frac{2304}{4923}=0.4680

    \displaystyle P(\text{Bowl 3} \lvert X=4)=\frac{2592}{4923}=0.5265

Advertisements

Mixing Binomial Distributions

Consider the following problems. We work Problem 1. Problem 2 is left as exercise.

Problem 1
Suppose that the following is the probability function of the random variable X.

\displaystyle (1) \ \ \ \ \ P(X=x)=0.5 \binom{4}{x} \biggl[\frac{1}{5}\biggr]^x \biggl[\frac{4}{5}\biggr]^{4-x}+0.5 \binom{4}{x} \biggl[\frac{4}{5}\biggr]^x \biggl[\frac{1}{5}\biggr]^{4-x}

where x=0,1,2,3,4.

Evaluate the mean and variance of X.

_____________________________________________________________
Problem 2
Suppose that the following is the probability function of the random variable X.

\displaystyle (2) \ \ \ \ \ P(X=x)=0.6 \times f_1(x)+0.3 \times f_2(x)+0.1 \times f_3(x)

where f_1(x) is the probability function of the binomial distribution with n=5 trials and probability of success p=0.1, f_2(x) is the probability function of the binomial distribution with n=5 trials and probability of success p=0.4, and f_3(x) is the probability function of the binomial distribution with n=5 trials and probability of success p=0.6 .

Evaluate the mean and variance of X.

Answers for Problem 2 are found at the end of the post.
_____________________________________________________________
Discussion of Problem 1
The probability function (1) is the weighted average of two binomial probability functions. The weights are 0.5 and 0.5. Thus the probability distribution of X is said to be the mixture of two binomial distributions with equal mixing weights.

One interpretation of the probability function (1) is that the underlying phenomenon can be one of two phenomena. As an actuarial science example, suppose that a block of insurance policies is divided into two groups, roughly equal in size. One group is a low risk group. It has a low claim frequency (the probability of a policyholder in this group having a claim in a given year is 0.2=\frac{1}{5}). The other group is a high risk group. It has a high claim frequency (the probability of a policyholder having a claim is 0.8=\frac{4}{5}). Suppose you pick a policyholder at random from this block. What is the expected number of claims in a year from this randomly chosen insured? What is the variance of the number of claims?

Since the probability function (1) is a weighted average of binomial distributions, the mean and other higher moments are the weighted average of the binomial means and higher moments. However, as you will see below, the variance of the mixture is not the weighted average of the binomial variances.

Before we work the problem, we need some preliminary facts about binomial distributions. Suppose Y has a binomial distribution with parameters n and p. This fact is denoted by the notation Y \sim \text{binom}(n,p). Then the mean and variance of Y are E(Y)=n p and Var(Y)=n p (1-p), respectively. Since Var(Y)=E(Y^2)-E(Y)^2, it follows that:

\displaystyle \begin{aligned}(3) \ \ \ \ \ E(Y^2)&=Var(Y) + E(Y)^2 \\&=n p (1-p) + (n p)^2  \end{aligned}

Now the calculation:

\displaystyle (4) \ \ \ \ \ E(X)=0.5 \biggl(4 \cdot \frac{1}{5}\biggr) + 0.5 \biggl(4 \cdot \frac{4}{5}\biggr)=2

\displaystyle \begin{aligned}(5) \ \ \ \ \ E(X^2)&=0.5 \biggl[4 \cdot \frac{1}{5} \cdot \frac{4}{5} + \biggl(4 \cdot \frac{1}{5} \biggr)^2 \biggr]\\&+ \ \ \ \ \ 0.5 \biggl[4 \cdot \frac{4}{5} \cdot \frac{1}{5}+\biggl(4 \cdot \frac{4}{5} \biggr)^2 \biggr] \\&=\frac{152}{25}=6.08  \end{aligned}

The idea for (4) and (5) is that the mean and the second moment of X are the weighted average of the means and second moments of the binomial distributions. The following is the variance of X

\displaystyle \begin{aligned}(6) \ \ \ \ \ Var(X)&=\frac{152}{25}-2^2=\frac{52}{25}=2.08  \end{aligned}

We use the insurance example indicated earlier to interpret the unconditional variance in (6). The two binomial distributions in the probability function (1) are conditional distributions (e.g. conditional on what group of insureds the randomly chosen policyholder comes from, high risk or low risk). To put the result (6) into perspective, note that both of these binomial distributions have the same variance, i.e., 4 \cdot \frac{1}{5} \cdot \frac{4}{5}=0.64. Yet the unconditional variance var(X) is much higher than 0.64. The additional variance 2.08-0.64=1.44 is the additional variance due to the uncertainty in the risk parameter of the insured (the uncertainty of what group the randomly chosen policyholder comes from).

The two binomial distributions in the probability function (1) are conditional distributions indexed by a parameter variable that is implicit in (1). For example, when the randomly chosen policyholder is from the low risk group, the parameter is \theta=1 and the number of claims follows \text{binom}(4,\frac{1}{5}). When the randomly chosen policyholder is from the high risk group, the parameter is \theta=2 and the number of claims follows \text{binom}(4,\frac{4}{5}). The uncertainty in the risk parameter \theta has the effect of increasing the unconditional variance of the mixture.

The increase in variance is a key characteristic of mixture distributions. Whenever a probability distribution is the mixture of conditional distributions, the uncertainty in the parameter variable always has the effect of increasing the unconditional variance of the mixture. In the insurance example, the uncertainty of the risk characteristics of the insureds across the entire block is reflected in the higher unconditional variance (as demonstrated in Problem 1).

_____________________________________________________________
See the following blog posts for more detailed discussion of mixture distributions.

An example of a mixture

The variance of a mixture

_____________________________________________________________
Answers for Problem 2
E(X)=1.2
Var(X)=1.56