Discrete probabilityDefinitionsElementary principlesEventsDe Morgan's LawAxiomsInclusion-exclusion principleExampleRandom variablesDiscrete random variablesContinuous random variablesNotationExampleStirling's approximationDistributionsProbability mass functionExampleConditionsCumulative distribution functionComplementary cumulative distribution functionUniform distributionBinomial distributionPoisson distributionNegative binomial distributionDifferent forms of the distributionGeometric distributionWhen to use?Hypergeometric distributionJoint probabilityJoint probability distributionJoint cumulative distribution functionIndependence of random variablesConditional probabilityConditioning of an eventAxiomatic definitionIndependent eventsConsequencesGeneral casePairwise independenceMutual independenceLaw of total probabilityBayes' theoremDerivationDefinitionChain ruleExampleMutual independenceExampleOther propertiesExpectationExampleMomentsPropertiesLinearityExampleProof of linearity of expectationOther propertiesVariance and standard deviationPropertiesCovarianceProperties

Discrete probability

Definitions

For some statistical experiment being performed:

Elementary principles

Events

Let and be events from some sample space .

De Morgan's Law

For events :

Axioms

For each event , we assign a probability satisfying:

Inclusion-exclusion principle

For a finite sequence of arbitrary events where :

Example

Random variables

A random variable is a function that maps each outcome of the sample space to some numerical value.

Given a sample space , a random variable with values in some set is a function:

Where is typically or in discrete probability and in continuous probability.

Discrete random variables

Continuous random variables

Notation

Random variables often make it easier to ask questions such as:

How likely is it that the value of is equal to ?

This is the same as the probability of the event , which is often denoted as and read "the probability of the random variable taking on the value ".

Example

Let our statistical experiment be the toss of a fair coin. We will perform this experiment times, giving us:

Let be the random variable denoting the number of heads after coin flips.

Note that for the collection , we have:

As we will see later, this represents a probablity distribution, and these are properties that all probability distributions must have.

Stirling's approximation

Stirling's approximation is an approximation for the factorial operation. It is an accurate estimation, even for smaller values of .

The approximation is:

Where the sign means that the two quantities are asymptotic. This means that their ratio tends to as tends to .

Alternatively, there is a version of Stirling's formula with bounds valid for all positive integers , rather than asymptotics:

Distributions

A probability distribution is a mathematical function that maps each outcome of a statistical experiment to its probability of occurrence.

Probability mass function

A probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. It defines a discrete probability distribution.

Suppose that is a discrete random variable. Then the probability mass function for is defined as:

Example

This is the probability mass function of a discrete probability distribution.

In this case, we have a random variable and a probability mass function .

Consider the following probabilities as examples:

Conditions

For any probability distribution (with some random variable ), its probability mass function must satisfy both of the following conditions:

Cumulative distribution function

The cumulative distribution function of a random variable evaluated at is the probability that will take a value less than or equal to .

If is a discrete random variable that maps to values , then the cumulative distribution function is defined as:

Complementary cumulative distribution function

Sometimes, it is useful to study the opposite question — how often the random variable is above a particular value. This is called the complementary cumulative distribution function or simply the tail distribution, and is denoted , and is defined as:

Uniform distribution

A random variable is uniformly distributed if every possible outcome is equally likely to be observed. In other words, for some statistical experiment, suppose there are different outcomes. Then the probability of each outcome is .

Therefore, the probability mass function for a uniformly distributed discrete random variable for possible outcomes would be:

ParameterMeaning
Number of possible outcomes

Binomial distribution

The binomial distribution with parameters and is the discrete probability distribution of the number of successes () in a sequence of Bernoulli trials.

The probability mass function for a binomially distributed discrete random variable for Bernoulli trials (each with probability of success ) would be:

ParameterMeaning
Number of trials
Probability of success in each trial
Quantity (or function)Formula
Mean (expected value)
Variance
Moment-generating function

Poisson distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occuring in a fixed interval of time or space if these events occur with a known constant rate and independently of time since the last event.

The probability mass function for a Poisson distributed discrete random variable with some constant rate would be:

ParameterMeaning
Rate
Quantity (or function)Formula
Mean (expected value)
Variance
Moment-generating function

Negative binomial distribution

The negative binomial distribution is a discrete probability distribution of the number of trials in a sequence of independent and identically distributed Bernoulli trials before a specified number of successes occurs.

The probability mass function for a negative binomially distributed discrete random variable with trials given successes, would be:

ParameterMeaning
(but can be extended to )Number of successes until the experiment is stopped
Success probability in each experiment
Quantity (or function)Formula
Mean (expected value)
Variance
Moment-generating function

Different forms of the distribution

X countsPMFFormulaSupport
trials, given successes
failures, given successes

Geometric distribution

The geometric distribution is a special case of the negative binomial distribution, with the parameter .

The geometric distribution gives the probability that the first occurence of success requires independent Bernoulli trials, each with success probability .

The probability mass function for a geometrically distributed discrete random variable with the first success being the trial, would be:

ParameterMeaning
Success probability in each experiment
Quantity (or function)Formula
Mean (expected value)
Variance
Moment-generating function
When to use?

Hypergeometric distribution

The hypergeometric distribution is a discrete probability distribution that describes the probability of successes (random draws for which the object drawn has a specified feature) in draws, without replacement, from a finite population of size that contains exactly objects with that feature, where each draw is either a success or failure.

The probability mass function for a hypergeometrically distributed discrete random variable with successes, would be:

ParameterMeaning
Population size
Number of objects with a specific feature
Number of draws
Quantity (or function)Formula
Mean (expected value)

Joint probability

Previously, we introduced as the probability of the intersection of the events and .

If instead, we let these events be described by the random variables:

Then we can write:

Typically we write , and this is referred to as the joint probability of and .

Joint probability distribution

If and are discrete random variables, the function given by for each pair of values , is called the joint probability distribution of and .

Joint cumulative distribution function

If and are discrete random variables, the definition of the joint cumulative distribution function of and is given by:

where is the joint probability distribution of and at .

Independence of random variables

Consider two discrete random variables and . We say that and are independent if:

The definition of independence can be extended to random variables:

Consider discrete random variables . We say that are mutually independent if:

Conditional probability

Conditional probability is a measure of the probability of an event, given that some other event has occurred.

If the event of interest is and the event is known to have occurred, the conditional probability of given is written as:

Conditioning of an event

Given two events and , the conditional probability of given is defined as:

This may be visualised as restricting the sample space to .

Axiomatic definition

Sometimes the definition of conditional probability is treated as an axiom of probability:

This is simply a rearrangement of the equation previously shown.

Independent events

Events and are said to be statistically independent if their joint probability equals the product of the probability of each event:

Consequences

General case

Pairwise independence

A finite set of events is pairwise independent if every pair of events is independent — that is, iff:

Mutual independence

A finite set of events is mutually independent if every event is independent of any intersection of the other events — that is, iff for every -element subset of :

Law of total probability

The law of total probability is the proposition that if is a finite partition of a sample space (in other words, a set of pairwise disjoint events whose union is the entire sample space), then for any event of the same probability space:

Bayes' theorem

Bayes' theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Derivation

Bayes' theorem shows that:

In other words, there exists some constant such that:

If we add these two formulas, we deduce that:

Therefore, the constant can be expressed as:

Definition

Bayes' theorem is then mathematically defined as:

Or alternatively:

Chain rule

The chain rule (or multiplication rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities.

Consider an indexed collection of events , then we can apply the definition of conditional probability to calculate the joint probability:

Repeating this process with each final term creates the product:

Example

With four variables, the chain rule produces this product of conditional probabilities:

Mutual independence

Two events are mutually independent (or disjoint) if they cannot both occur. In other words, events and are mutually independent iff .

This has a consequence to the inclusion-exclusion principle. If and are mutually independent, then:

Example

If our statistical experiment is the toss of a fair coin and:

Then , but since a coin cannot show heads and tails simultaneously (unless it is some kind of coin that exists in quantum superposition).

Therefore .

Other properties

Expectation

The expectation of a random variable is the probability-weighted average of all possible values.

The expectation of a random variable is:

Where the notation