Calculating bivariate normal probabilities

This post extends the discussion of the bivariate normal distribution started in this post from a companion blog. Practice problems are given in the next post.

Suppose that the continuous random variables X and Y follow a bivariate normal distribution with parameters \mu_X, \sigma_X, \mu_Y, \sigma_Y and \rho. What to make of these five parameters? According to the previous post, we know that

  • \mu_X and \sigma_X are the mean and standard deviation of the marginal distribution of X,
  • \mu_Y and \sigma_Y are the mean and standard deviation of the marginal distribution of Y,
  • and finally \rho is the correlation coefficient of X and Y.

So the five parameters of a bivariate normal distribution are the means and standard deviations of the two marginal distributions and the fifth parameter is the correlation coefficient that serves to connect X and Y. If \rho=0, then X and Y are simply two independent normal distributions.

When calculating probabilities involving a bivariate normal distribution, keep in mind that both marginal distributions are normal. Furthermore, the conditional distribution of one variable given a value of the other is also normal. Much more can be said about the conditional distributions.

The conditional distribution of Y given X=x is usually denoted by Y \lvert X=x or Y \lvert x. In additional to being a normal distribution, it has a mean that is a linear function of x and has a variance that is constant (it does not matter what x is, the variance is always the same). The linear conditional mean and constant variance are given by the following:

    \displaystyle E[Y \lvert X=x]=\mu_Y+\rho \ \frac{\sigma_Y}{\sigma_X} \ (x-\mu_X)

    \displaystyle Var[Y \lvert X=x]=\sigma_Y^2 \ (1-\rho^2)

Similarly, the conditional distribution of X given Y=y is usually denoted by X \lvert Y=y or X \lvert y. In additional to being a normal distribution, it has a mean that is a linear function of x and has a variance that is constant. The linear conditional mean and constant variance are given by the following:

    \displaystyle E[X \lvert Y=y]=\mu_X+\rho \ \frac{\sigma_X}{\sigma_Y} \ (y-\mu_Y)

    \displaystyle Var[X \lvert Y=y]=\sigma_X^2 \ (1-\rho^2)

The information about the conditional distribution of Y on X=x is identical to the information about the conditional distribution of X on Y=y, except for the switching of X and Y. An example is helpful.

Example 1
Suppose that the continuous random variables X and Y follow a bivariate normal distribution with parameters \mu_X=10, \sigma_X=10, \mu_Y=20, \sigma_Y=5 and \rho=0.6. The first two parameters are the mean and standard deviation of the marginal distribution of X. The next two parameters are the mean and standard deviation of the marginal distribution of Y. The parameter \rho is the correlation coefficient of X and Y. Both marginal distributions are normal.

Let’s focus on the conditional distribution of Y given X=x. It is normally distributed. Its mean and variance are:

    \displaystyle \begin{aligned} E[Y \lvert X=x]&=\mu_Y+\rho \ \frac{\sigma_Y}{\sigma_X} \ (x-\mu_X) \\&=20+0.6 \ \frac{5}{10} \ (x-10) \\&=20+0.3 \ (x-10) \\&=17+0.3 \ x  \end{aligned}

    \displaystyle \sigma_{Y \lvert x}^2=Var[Y \lvert X=x]=\sigma_Y^2 (1-\rho^2)=25 \ (1-0.6^2)=16

    \displaystyle \sigma_{Y \lvert x}=4

The line y=17+0.3 \ x is also called the least squares regression line. It gives the mean of the conditional distribution of Y given x. Because X and Y are positively correlated, the least squares line has positive slope. In this case, the larger the x, the larger is the mean of Y. The standard deviation of Y given x is constant across all possible x values.

With mean and standard deviation known, we can now compute normal probabilities. Suppose the realized value of X is 25. Then the mean of Y \lvert 25 is E[Y \lvert 25]=24.5. The standard deviation, as indicated above, is 4. In fact, for any other x, the standard deviation of Y \lvert x is also 4. Now calculate the probability P[20<Y<30 \lvert X=25]. We first calculate it using a normal table found here.

    \displaystyle \begin{aligned} P[20<Y<30 \lvert X=25]&=P\bigg[\frac{20-24.5}{4}<Z<\frac{30-24.5}{4} \biggr] \\&=P[-1.13<Z<1.38] \\&=0.9162-(1-0.8708) \\&=0.7870  \end{aligned}

Using a TI84+ calculator, P[20<Y<30 \lvert X=25]=0.7851396569. In contrast, the probability P[20<Y<30] is (using the table found here):

    \displaystyle \begin{aligned} P[20<Y<30]&=P\bigg[\frac{20-20}{5}<Z<\frac{30-20}{4} \biggr] \\&=P[0<Z<2] \\&=0.9772-0.5 \\&=0.4772  \end{aligned}

Using a TI84+ calculator, P[20<Y<30]=0.4772499375. Note that P[20<Y<30] is for the marginal distribution of Y. It is not conditioned on any realized value of X.

Practice Problems

Statistics Practice Problems

probability Practice Problems

Daniel Ma mathematics

Dan Ma math

Daniel Ma probability

Dan Ma probability

Daniel Ma statistics

Dan Ma statistics

\copyright 2018 – Dan Ma

Advertisements

Tagged: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: