Conditional Probability and the Independent Variable
In this post we learn how to calculate conditional probabilities for both discrete and continuous random variables. Furthermore, we discuss independent events.
Conditional Probability is the probability that one event occurs given that another event has occurred. Closely related to conditional probability is the notion of independence. Events are independent if the probability of one event does not affect the probability of another event.
Marginal Probability and Joint Probability
Before diving into conditional probability, I’d like to briefly define marginal probability and joint probability.
A joint probability is simply the probability of two or more events occurring together or jointly at the same time. It does not account for dependencies between events such as X can only happen given Y has happened. A joint probability is usually denoted as the intersection of X and Y or simply as the probability of X and Y.
P(X \cap Y) \; or \; simply \; P(X,Y)
For discrete random variables, the probability mass function over all discrete values of x and y needs to sum to 1.
\sum_i\sum_jP(X=x_i,Y=y_j) = 1
In the case of continuous random variables, the joint density over x, and y must equal 1.
\int_x\int_y p(x,y)dxdy = 1
If we have a probability distribution over several random variables such as X and Y, we can calculate the probability distribution over just the subset X irrespective of the outcome of Y. This is known as the marginal probability. In the discrete case, if we want to obtain the marginal probability of X taking on a specific value xi, we would take the sum of X equals x over all cases of y.
\forall x\in X, P(X=x_i) = \sum_j P(X=x_i,Y=y_j)
In the case of continuous random variables we just integrate over the area of y instead of summing over all possible discrete values of y.
p(x) = \int p(x,y)dy
Note that small y denotes the set of realized values of the random variable Y.
For a more detailed introduction with an example, check out this video from Khan Academy.
Independent Events
When two events do not affect each other, their joint probability can be expressed as a simple product of the corresponding random variables.
P(X=x, Y=y) = P(X=x)P(Y=y)
For example, let’s say the probability that my car’s engine stops working this month is 0.1, and the probability that I catch the flu this month is 0.15. These events are most likely independent of each other (let’s ignore edge cases like the car’s breakdown, preventing me from visiting another person who currently has the flu).
Accordingly, the probability of both events happening this month is 0.015.
P(X=x)P(Y=y) = 0.1 \times 0.15 = 0.015
Conditional Probability for Discrete Variables
If you throw a standard dice with six numbers, the probability of getting the number 2 is 1/6. Let’s model this event as the probability that the random variable X assumes the concrete value x=2.
P(X=x)= \frac{1}{6} \; where \;x=2
The probability that the number is even would be 1/2 because half of a dice’s numbers are even. Let’s denote the event that the result is even as the probability that the random variable Y assumes an even value y.
P(Y=y) = \frac{1}{2} \;where\; y \in \{2,4,6\}
But what would be the probability of obtaining the number 2 if you knew in advance that the dice would definitely return an even number? In this case, we would have to model the probability of x given that y has occurred. We can express this as follows.
P(X=x|Y=y)
Now, we have essentially reduced the range of possible outcomes from 6 to 3. This means the chances of getting a 2 have increased from one in 6 to one in three.
This was a simple example. But how do you calculate the conditional probability in more complicated cases? The rule of conditional probability says that the probability of x occurring on the condition that y has occurred equals the chance that x and y occur both divided by the chance that only y occurs.
P(X=x|Y=y) = \frac{P(X=x, Y=y)}{P(Y=y)}
Let’s stick with our dice to make this more concrete. If event x (that you throw a 2) occurs, event y (that the number is even) is already implicit because 2 is an even number. So the probability of x and y occurring is essentially the same as x occurring.
P(X=x, Y=y) = \frac{1}{6}
Dividing this by the probability of y occurring results in 1/3.
P(X=x|Y=y) = \frac{P(X=x, Y=y)}{P(Y=y)} = \frac{1/6}{1/2} = \frac{1}{3}
The Chain Rule for Conditional Probabilities
If we rearrange the rule of conditional probability and replace X=x and Y=x with A and B for a more compact notation, we get the following.
P(A, B) = P(A|B)P(B)
The probability of A and B occurring is equivalent to the probability that A occurs given B and that B occurs by itself. We can also write this using the intersection operator.
P(A \cap B) = P(A|B)P(B)
What do you do if you want to calculate the probability that A, B, and C occur? You simply take the same reasoning one step further.
The probability of A, B, and C occurring is equivalent to the probability that A occurs given B and C; that B occurs given C; and that C occurs.
P(A, B, C) = P(A|B,C)P(B|C)P(C)
This principle is known as the chain rule, and it can be extended to link an arbitrary number of conditional events.
P(X_1,...,X_n) = \prod_{i=2}^{n} P(X_1)P(X_i|X_1,...,X_{i-1})
Conditional Probability for Continuous Variables
When dealing with conditional random variables, it doesn’t make sense to determine the probability that X and Y resolve to specific outcomes. Instead, we can calculate the probability that X and Y fall into certain areas. In this case, we say X needs to fall into an area smaller than the concrete value x given that Y falls into the interval between y and y + epsilon, where epsilon is a very small term. Think of it as the differential that we use in calculus.
P(0 \leq X \leq x | Y\in \{ y, y + \epsilon\}
The logic is still the same as for discrete random variables. The conditional probability of X given Y equals the joint probability of X and Y, given the probability of Y.
P(x|y) = \frac{P(0 \leq X \leq x, Y\in \{ y, y + \epsilon\} }{ P(Y\in \{ y, y + \epsilon\}) }
But instead of taking the discrete values, we now have to integrate over our respective areas.
P(x|y) = \frac{ \int_0^x\int_y^{y+\epsilon}f(x,y)dxdy }{\int_y^{y+\epsilon}f(y)dy}
Gosh, this looks complicated. The good news is: you won’t have to calculate this by hand.
In this case, the whole expression can be simplified. As epsilon becomes infinitesimally small, we can treat Y as if it was a concrete real value rather than an interval. This way we save ourselves the hassle of integration over y.
P(0 \leq X \leq x | Y =y) = \frac{ \int_0^xf(x,y)dx }{f(y)}
I won’t discuss this in more detail, because you are probably not going to have to calculate this by hand. But if you are interested in an example, I recommend checking out the following video.
Summary
We’ve learned how to calculate the probability of two events happening independently and the probability of one event happening conditional on another. We explored conditional probabilities for both discrete and continuous random variables.
This post is part of a series on statistics for machine learning and data science. To read other posts in this series, go to the index.