Discrete probabilitySample spacesProbability distributionsEventsBasic facts about probabilities of eventsTheoremConditional probabilityIndependence of two eventsPairwise and mutual independenceBiased coins and Bernoulli trialsThe binomial distributionTheoremProofDefinitionPairwise and mutual independenceBayes' theoremRandom variablesBiased coins and the geometric distributionExpectation of a random variableBetter expression for the expectationTheoremProofExpected number of successes in Bernoulli trialsTheoremLinearity of expectationTheoremVarianceDefinition

Discrete probability

Sample spaces

For any probabilistic experiment or process, the set of all its possible outcomes is called its sample space.

E.g. Consider the following probabilistic (random) experiment:

Flip a fair coin 7 times in a row, and see what happens

The possible outcomes of the example above, are all the sequences of Heads () and Tails (), of length 7. In other words, they are the set of strings .

The cardinality of this sample space is , because there are two choices for each of the seven characters of the string.

Sample spaces may be infinite.

E.g. Flip a fair coin repeatedly until a Heads is flipped.

However, in discrete probability, we focus on finite and countable sample spaces.

Probability distributions

A probability distribution over a finite or countable set , is a function:

such that .

In other words, each outcome is assigned a probability such that and the sum of the probabilities of all outcomes sum to 1.

E.g. Suppose a fair coin is tossed times consecutively. This random experiment defines a probability distribution on , where for all , .

Since , we have

E.g. suppose a fair coin is tossed repeatedly until it lands heads. This random experiment defines a probability distribution , on , such that, ,

Note that


For a countable sample space , and event E, is simply a subset of the set of possible outcomes.

Given a probability distribution , we define the probability of the event to be .

E.g. For , the following are events:


E.g. For , the following are events:

Basic facts about probabilities of events

For event , define the complement event to be .


Suppose are a (finite or countable) sequence of pairwise disjoint events from the sample space . In other words, and . Then

Furthermore, for each event , .

Proof: Follows easily from definitions:

, , thus, since the sets are disjoint, .

Likewise, since ,

Conditional probability

Definition: Let be a probability distribution, and let be two events, such that .

The conditional probability of given , denoted , is defined by:

E.g. A fair coin is flipped three times. Suppose we know that the event occurs.

What is the probability that the event occurs?

For the example above, there are 8 flip sequences, , all with probability . The event that is . The event . So,

Independence of two events

Intuitively, two events are independent if knowing whether one occurred does not alter the probability of the other. Formally:

Definition: Events and are called independent if .

Note that if then and are independent if and only if

Thus, the probability of is not altered by knowing occurs.

E.g. A fair coin is flipped three times. Are the events and , independent?

Yes, because , , and , so .

Pairwise and mutual independence

Definition: Events are called pairwise independent if for every pair and are independent. (i.e. )

Events are called mutually independent, if for every subset .

Clearly, mutual independence implies pairwise independence. However, pairwise independence does not imply mutual independence.

Biased coins and Bernoulli trials

A Bernoulli trial is a probabilistic experiment that has two outcomes: success or failure (e.g. heads or tails).

We suppose that is the probability of success, and is the probability of failure.

We can of course have repeated Bernoulli trials. We typically assume the different trials are mutually independent.

E.g. A biased coin, which comes up heads with probability , is flipped times consecutively. What is the probability that it comes up heads exactly times?

The binomial distribution


The probability of exactly successes in mutually independent Bernoulli trials, with probability of success and of failure in each trial, is:


We can associate Bernoulli trials with outcomes . Each sequence ) with exactly heads and tails occurs with probability . There are such sequences with exactly heads.


The binomial distribution with parameters and , denoted , defines a probability distribution on , given by:

Pairwise and mutual independence

A finite set of events is pairwise independent if every pair of events is independent - that is, if and only if for all distinct pairs of indices :

A finite set of events is mutually independent if every event is independent of any intersection of the other events - that is, if and only if for every -element subset of :

Bayes' theorem

Let and be events such that and .

Random variables

Definition: A random variable, is a function , that assigns a real value to each outcome in a sample space .

E.g. Suppose a biased coin is flipped times. The sample space is . The function that assigns to each outcome the number of coin tosses that came up heads, is one random variable.

For a random variable we write as shorthand for the probability ). The distribution of a random variable is given by the set of pairs .

Note: These definitions of a random variable and its distribution are only adequate in the context of discrete probability distributions.

Biased coins and the geometric distribution

Suppose a biased coin comes up heads with probability each time it is tossed. Suppose we repeatedly flip this coin until it comes up heads.

What is the probability that we flip the coin times, for ?

Answer: The sample space is . Assuming mutual independence of coin flips, the probability of is . Note that this does define a probability distribution on because:

A random variable , is said to have a geometric distribution with parameter , if for all positive integers .

Expectation of a random variable

Recall: A random variable is a function , that assigns a real value to each outcome in a sample space .

The expected value or expectation or mean, of a random variable , denoted by , is defined by:

Here is the underlying probability distribution on .

Question: Let be the random variable outputting the number that comes up when a fair die is rolled. What is the expected value, , of ?

However, sometimes this way can be inefficient for calculating . Take the sample space of 11 coin flips, for example: . Our summation will have terms!

Better expression for the expectation

Recall denotes the probability .

Recall that for a function ,


For a random variable ,


, but for each , if we sum all terms such that , we get as their sum. So, summing over all we get .

So, if is small, and if we can compute , then we need to sum a lot fewer terms to calculate .

Expected number of successes in Bernoulli trials


The expected number of successes in (independent) Bernoulli trials, with probability of success in each, is .

Linearity of expectation


For any random variables on ,

Furthermore, for any ,

In other words, the expectation function is a linear function.


The variance and standard deviation of a random variable give us ways to measure (roughly) on average, how far off the value of the random variable is from its expectation.


For a random variable on a sample space , the variance of , denoted by , is defined by:

The standard deviation of , denoted by , is defined by

E.g. Consider the random variable , s.t. , and the random variable , s.t. .

Then , but , whereas and .


For any random variable ,