# Continuous probability

## Continuous random variables

Random variables were previously defined in the discrete probability notes as:

A random variable is a function that maps each outcome of the sample space to some numerical value.

Given a sample space $\Omega$, a random variable $X$ with values in some set $\cal{R}$ is a function:

Where $\cal{R}$ was typically $\N$ or $\N_0$ for discrete RVs.

However in continuous probability, the codomain $\cal{R}$ is always $\R$.

Therefore, a continuous random variable is a random variable which can take on infinitely many values (has an uncountably infinite range).

Given a sample space $\Omega$, a continuous random variable $X$ is a function:

### Examples

• The continuous random variable $X_1$ could be the length of a randomly selected telephone call in seconds.
• The continuous random variable $X_2$ could be the volume of water in a bucket.

Note: Random variables can be partly continuous and partly discrete!

## Probability density function

### Why can't we use the PMF anymore?

A continuous random variable $X$ has what could be thought of as infinite precision.

More specifically, a continuous random variable can realise an infinite amount of real number values within its range, as there are an infinite amount of points in a line segment.

So we have an infinite amount of values whose sum of probabilities must equal one. This means that these probabilities must each be infinitesimal. and therefore:

It is clear from this result that the probability mass function which we previously used in discrete probability will no longer provide any useful information.

### Definition

A probability density function is a function whose integral over an interval gives the probability that the value of a random variable falls within the interval.

$X:\Omega\mapsto\R$ is a continuous random variable if there is a function $f_X(x)$ such that:

The function $f_X(x)$ is called the probability density function (PDF).

For better reasoning as to why $\p{X=x}=0\quad\forall x\in\R$, we can now use the definition above.

### Properties

The following properties follow from the axioms:

• $\i{-\infty}{\infty}{f_X(x)}{x}=1$
• $f_X(x)\geq0$

## Cumulative distribution function

Sometimes also called cumulative density function (to differentiate with between cumulative distribution of a discrete random variable), the cumulative distribution function of a continuous random variable $X$ evaluated at $x$ is the probability that $X$ will take a value less than or equal to $x$.

The cumulative distribution function is denoted $F_X(x)$, and defined as:

Additionally, if $f_X(x)$ is continuous at $x$:

The definition of the probability density function given earlier can be expressed in terms of the cumulative distribution function, by the fundamental theorem of calculus:

### Properties

• The cumulative distribution function is an increasing function.
• $F_X(\infty):=\ds\lim_{x\to\infty}\p{X\leq x}=1$
• $F_X(-\infty):=\ds\lim_{x\to-\infty}\p{X\leq x}=0$

### Example

Suppose the lifetime $X$ of a car battery has a probability $\p{X>x}=2^{-x}$ of lasting more than $x$ days. Find the probability density function of $X$.

We are given the complementary cumulative distribution function:

And we can determine the cumulative distribution function:

## Expectation

If a continuous random variable $X$ is given, and its distribution is given by a probability density function $f_X$, then the expected value of $X$ (if the expected value exists) can be calculated as:

### Moments

The $n$-th moment of a continuous random variable $X\in\R$ is given by:

### Properties

In general, the properties of expectation for continuous random variables are the same as that of discrete random variables, but switching sums with integrals:

• Linearity — for a set of tuples $\set{(X_i,c_i)}_{i=1}^n$, each consisting of a continuous random variable $X_i:\Omega\mapsto\R$ and a corresponding constant $c_i\in\R$:

• In general, if $g(X)$ is a function of $X$ (e.g. $X^2$, $\ln(X)$), then $g(X)$ is also a random variable.

If $g(X)\in\R$, its expectation is given by:

• Plus the rest of the properties from discrete random variable expectations

## Variance

If the random variable $X$ represents samples generated by a continuous distribution with probability density function $f_X$, then the population variance is given by:

All properties from the variance of discrete random variables still hold for continuous random variables.

## Distributions

### Uniform distribution

The uniform distribution with parameters $a,b\in\R:-\infty is a distribution where all intervals of the same length on the distribution's support $[a,b]$, for a random variable $X:\Omega\mapsto[a,b]\subset\R$ are equally probable.

The support is defined by the two parameters $a$ and $b$.

The probability density function for a uniformly distributed random variable $X:\Omega\mapsto[a,b]\subset\R$ would be:

Additionally, the cumulative distribution function is given by:

### Exponential distribution

The exponential distribution is the probability distribution that describes the time between events in a process in which events occur continuously and independently at a constant average rate.

An exponentially distributed random variable $X:\Omega\mapsto\R$ with rate parameter $\lambda\in\R:\lambda>0$ has the probability density function:

Additionally, the cumulative distribution function is given by:

### Gaussian distribution

To denote a random variable $X:\Omega\mapsto\R$ which is distributed according to the Gaussian distribution, we write $X\sim\cal{N}(\mu,\sigma^2)$, with standard deviation $\sigma$, variance $\sigma^2$ and mean/expectation $\mu$.

The probability density function for a Gaussian distributed random variable $X:\Omega\mapsto\R$ would be:

Additionally, the cumulative distribution function is given by the integral:

Note: We must use an evaluation table to determine the CDF evaluated at $x$, since $\mathrm{erf}$ is not an elementary function.

#### Standard normal distribution

The standard normal distribution (sometimes normal distribution, though this is ambiguous naming) is a special case of the Gaussian distribution, when $\mu=0$ and $\sigma^2=1$.

To denote a random variable $X:\Omega\mapsto\R$ which is (standard) normally distributed, we write $X\sim\mathcal{N}(0,1)$.

Additionally, the cumulative distribution function is given by the integral:

Note: This integral doesn't evaluate to any simple expression as it cannot be expressed in terms of elementary functions, and instead relies on the special $\mathrm{erf}$ function. Instead, we must use an evaluation table - specifically Table 5.1 in Section 5.4.

### Approximations of the binomial distribution

Recall that the binomial distribution is a discrete probability distribution representing the number of successes in a sequence of $n$ independent experiments, with each experiment being a Bernoulli trial (success/failure experiment) with probability of success $p$.

For a binomially distributed random variable $X_{n,p}$, the probability mass function is given by:

Where $X_{n,p}$ is the number of successes in $n$ trials.

#### Poisson approximation

Recall that for a Poisson distributed random variable $X_\lambda$, the probability mass function is given by:

Where $X_\lambda$ is the number of successes if they occur at rate $\lambda$.

We can approximate the binomially distribution with the Poisson distribution reasonably well when $n\to\infty$ and $p$ is small (with $np<10$). This is true because $\lim_{n\to\infty}f_{X_{n,p}}(x)=f_{X_{\lambda}}$ when $\lambda=np$ — that is:

#### Gaussian/normal approximation

Note that a binomially distributed random variable such as $X_{n,p}$ can be expressed as a sum of $n$ Bernoulli random variables — that is:

• $\e{Y_i}=p$ and $\var{Y_i}=p(1-p)$
• $\e{X_{n,p}}=np$ and $\var{X_{n,p}}=np(1-p)$

We then have $\sd{X_{n,p}}=\sqrt{np(1-p)}$.

This section may not be examinable, but is useful for deriving the Gaussian approximation

A standard score (denoted $Z$) is the number of standard deviations by which a data point is above or below the mean value of what is being observed or measured.

To standardise a data point $x$, we can use the normal standardisation formula:

If we use the normal standardisation formula for $X_{n,p}$, we get:

By using the fact that $X_{n,p}$ can be expressed as a sum of Bernoulli random variables $\sum_{i=1}^nY_i$ (as discussed earlier), and the central limit theorem (which will be discussed a bit later), we can see that:

• $Z\sim\mathcal{N}(0,1)$
• $X_{n,p}\sim\mathcal{N}\left[\mu=np,\sigma^2=np(1-p)\right]$

Note: The normal approximation of the binomial is reasonable when $np(1-p)$ is large, or more specifically when $p$ and $1-p$ are not too small relative to $n$ — that is:

• $np\geq10$
• $n(1-p)>10$

## De Moivre-Laplace theorem

For the sequence $\set{X_j}_{n\in\Z}$ of Bernoulli random variables, we have (for $a\leq b$):

Or alternatively, with $E=\e{X_i}=p$ and $V=\var{X_i}=p(1-p)$:

This theorem essentially states that the probability mass function of the centred and normalised binomial random variable converges (for $n\to\infty$ and $p=\text{const}$) to the probability density function of the normal random variable.

### Continuity correction

Sometimes when using the De Moivre-Laplace theorem, or approximating a discrete probability distribution with a continuous probability distribution, we must use continuity correction. For a discrete random variable $X\in\Z$, we can write:

#### Example

Consider a fair coin being tossed $40$ times.

Let the random variable $X_{40}$ represent the number of heads.

Then $\e{X_{40}}=20$.

Approximate $\p{X_{40}=20}$ using the Gaussian random variable.

First, we can start by correcting the discrete random variable for continuity:

We can compare this to the result of letting $X_{40}$ be a binomially distributed random variable.

Recall that $\p{X=k}=\binom{n}{k}p^k(1-p)^{n-k}$. Therefore:

As you can see, approximating with a Gaussian random variable led to a reasonably accurate probability, but remember that we get a better estimate when $np(1-p)$ is large.

## Relating probability density functions

Suppose we have a continuous random variable $X:\Omega\mapsto\R$ and some continuous function $g:\R\mapsto\R$. Note that $g(X)$ is also a random variable.

We will look at relating the two probability density functions $f_X$ and $f_{g(X)}$ by considering two different cases for $g$ — when $g$ is an increasing function and when it is a decreasing function.

### $g$ is an increasing function

By the definition of increasing functions, we must have:

If we look at the cumulative distribution function for $g(X)$, we can determine a relationship between $f_X$ and $f_{g(X)}$:

### $g$ is a decreasing function

By the definition of decreasing functions, we must have:

Once again, if we consider the cumulative distribution function for $g(X)$, we can determine a relationship between $f_X$ and $f_{g(X)}$:

## Hazard rate function

The hazard rate function is the frequency with which a component fails, expressed in failures per unit of time.

Although the hazard rate function $\lambda(t)$ is often thought of as the probability that a failure occurs in a specified interval given no failure before time $t$, it is not actually a probability because it can exceed one.

The hazard rate function for a continuous random variable $X:\Omega\mapsto\R$ is given by:

Where:

• $f_X(t)$ is called the failure density function, and is the probability that the failure will fall in a specified interval.
• $F_X(t)$ is called the failure distribution function, and is the probability of the failure of a component, up to and including a certain time $t$.
• $R_X(t)=1-F_X(t)$ is called the survival function, and is the complementary cumulative distribution function — the probability of survival of a component past a certain time $t$.

### Example 1

Consider an exponentially distributed random variable $X:\Omega\mapsto\R$.

Recall that for $t\geq0,\lambda>0$: